After completing GEODE-1233, all currently known flickering tests are now
annotated with our FlakyTest JUnit Category.

In an effort to divide our build up into multiple build pipelines that are
sequential and dependable, we could consider excluding FlakyTests from the
primary integrationTest and distributedTest tasks. An additional build task
would then execute all of the FlakyTests separately. This would hopefully
help us get to a point where we can depend on our primary testing tasks
staying green 100% of the time. We would then prioritize fixing the
FlakyTests and one by one removing the FlakyTest category from them.

I would also suggest that we execute the FlakyTests with "forkEvery 1" to
give each test a clean JVM or set of DistributedTest JVMs. That would
hopefully decrease the chance of a GC pause or test pollution causing
flickering failures.

Having reviewed lots of test code and failure stacks, I believe that the
primary causes of FlakyTests are timing sensitivity (thread sleeps or
nothing that waits for async activity, timeouts or sleeps that are
insufficient on busy CPU or I/O or during due GC pause) and random ports
via AvailablePort (instead of using zero for ephemeral port).

Opinions or ideas? Hate it? Love it?

-Kirk

Reply via email to