While I'm certain we could push through all these tests to support parallelism, I think it will end up requiring continual work since there is a class of tests that won't always work under concurrency, but also that won't be immediately obvious until the damage is done.
I'm +1 on punting to docker to parallelize. On Mon, Apr 12, 2021 at 1:17 PM Mick Semb Wever <m...@apache.org> wrote: > > Cassandra's build.xml supports parallel test runners. This > functionality is available through `-Dtest.runners` and the > `testparallel` ant macro. > > It's always been there, but hasn't been active recently since both > ci-cassandra and circleci call testclasslist instead of test. > > Recently testclasslist was updated to enable multiple runners too. > Since then we witnessed a lot more test failures… The distributed > in-jvm tests just don't work with parallel runners, and currently they > need `-Dtest.runners=1` specified to work. And plenty of flakies where > tests use fixed ports (StorageServiceServerTest), byteman (eg > BMUnitRunner), and around conf files on disk. > > From here, I can see two ways forward, a) fix everything to be > parallel ready or b) remove test.runners and parallelise with docker > instead. > > All in all, I think this is kinda odd to do (a) when docker is readily > available, especially on the CI servers where we are concerned about > build times. > > For (b)… to remove everything related to 'testparallel' and > 'test.runners' from the build.xml an example patch is here: > https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/16587-2/trunk > > Then replacing 'ant task parallelism' with docker containers would be > done something like this: > https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:mck/16587-2/trunk > (this is just a quick PoC, aimed at the ci-cassandra agents that have > 4 cores and 16gb ram available to each executor, but I imagine instead > something that spawns a number of containers based on system > resources, like we currently do with get-cores and get-mem). Also > worth noting the overhead here, compared with the ant approach, docker > builds everything in each container from scratch, but this too can be > improved easily enough. > > What are folks' opinions? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org