Cassandra's build.xml supports parallel test runners. This
functionality is available through `-Dtest.runners` and the
`testparallel` ant macro.

It's always been there, but hasn't been active recently since both
ci-cassandra and circleci call testclasslist instead of test.

Recently testclasslist was updated to enable multiple runners too.
Since then we witnessed a lot more test failures… The distributed
in-jvm tests just don't work with parallel runners, and currently they
need `-Dtest.runners=1` specified to work. And plenty of flakies where
tests use fixed ports (StorageServiceServerTest), byteman (eg
BMUnitRunner), and around conf files on disk.

>From here, I can see two ways forward, a) fix everything to be
parallel ready or b) remove test.runners and parallelise with docker
instead.

All in all, I think this is kinda odd to do (a) when docker is readily
available, especially on the CI servers where we are concerned about
build times.

For (b)… to remove everything related to 'testparallel' and
'test.runners' from the build.xml an example patch is here:
https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/16587-2/trunk

Then replacing 'ant task parallelism' with docker containers would be
done something like this:
https://github.com/apache/cassandra-builds/compare/trunk...thelastpickle:mck/16587-2/trunk
(this is just a quick PoC, aimed at the ci-cassandra agents that have
4 cores and 16gb ram available to each executor, but I imagine instead
something that spawns a number of containers based on system
resources, like we currently do with get-cores and get-mem). Also
worth noting the overhead here, compared with the ant approach, docker
builds everything in each container from scratch, but this too can be
improved easily enough.

What are folks' opinions?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to