After completing GEODE-1233, all currently known flickering tests are now annotated with our FlakyTest JUnit Category.
In an effort to divide our build up into multiple build pipelines that are sequential and dependable, we could consider excluding FlakyTests from the primary integrationTest and distributedTest tasks. An additional build task would then execute all of the FlakyTests separately. This would hopefully help us get to a point where we can depend on our primary testing tasks staying green 100% of the time. We would then prioritize fixing the FlakyTests and one by one removing the FlakyTest category from them. I would also suggest that we execute the FlakyTests with "forkEvery 1" to give each test a clean JVM or set of DistributedTest JVMs. That would hopefully decrease the chance of a GC pause or test pollution causing flickering failures. Having reviewed lots of test code and failure stacks, I believe that the primary causes of FlakyTests are timing sensitivity (thread sleeps or nothing that waits for async activity, timeouts or sleeps that are insufficient on busy CPU or I/O or during due GC pause) and random ports via AvailablePort (instead of using zero for ephemeral port). Opinions or ideas? Hate it? Love it? -Kirk
