Re: Next steps with flickering tests

Kirk Lund Tue, 26 Apr 2016 10:24:42 -0700

There are quite a few test classes that have multiple test methods which
are annotated with the FlakyTest category.


More thoughts:

In general, I think that if any given test fails intermittently then it is
a FlakyTest. A good test should either pass or fail consistently. After
annotating a test method with FlakyTest, the developer should then add the
Flaky label to corresponding Jira ticket. What we then do with the Jira
tickets (ie, fix them) is probably more important than deciding if a test
is flaky or not.

Rather than try to come up with some flaky process for determining if a
given test is flaky (ie, "does it have thread sleeps?"), it would be better
to have a wiki page that has examples of flakiness and how to fix them ("if
the test has thread sleeps, then switch to using Awaitility and do
this...").

-Kirk


On Mon, Apr 25, 2016 at 10:51 PM, Anthony Baker <[email protected]> wrote:

> Thanks Kirk!
>
> ~/code/incubator-geode (develop)$ grep -ro "FlakyTest.class" . | grep -v
> Binary | wc -l | xargs echo "Flake factor:"
> Flake factor: 136
>
> Anthony
>
>
> > On Apr 25, 2016, at 9:45 PM, William Markito <[email protected]>
> wrote:
> >
> > +1
> >
> > Are we also planning to automate the additional build task somehow ?
> >
> > I'd also suggest creating a wiki page with some stats (like how many
> > FlakyTests we currently have) and the idea behind this effort so we can
> > keep track and see how it's evolving over time.
> >
> > On Mon, Apr 25, 2016 at 6:54 PM, Kirk Lund <[email protected]> wrote:
> >
> >> After completing GEODE-1233, all currently known flickering tests are
> now
> >> annotated with our FlakyTest JUnit Category.
> >>
> >> In an effort to divide our build up into multiple build pipelines that
> are
> >> sequential and dependable, we could consider excluding FlakyTests from
> the
> >> primary integrationTest and distributedTest tasks. An additional build
> task
> >> would then execute all of the FlakyTests separately. This would
> hopefully
> >> help us get to a point where we can depend on our primary testing tasks
> >> staying green 100% of the time. We would then prioritize fixing the
> >> FlakyTests and one by one removing the FlakyTest category from them.
> >>
> >> I would also suggest that we execute the FlakyTests with "forkEvery 1"
> to
> >> give each test a clean JVM or set of DistributedTest JVMs. That would
> >> hopefully decrease the chance of a GC pause or test pollution causing
> >> flickering failures.
> >>
> >> Having reviewed lots of test code and failure stacks, I believe that the
> >> primary causes of FlakyTests are timing sensitivity (thread sleeps or
> >> nothing that waits for async activity, timeouts or sleeps that are
> >> insufficient on busy CPU or I/O or during due GC pause) and random ports
> >> via AvailablePort (instead of using zero for ephemeral port).
> >>
> >> Opinions or ideas? Hate it? Love it?
> >>
> >> -Kirk
> >>
> >
> >
> >
> > --
> >
> > ~/William
>
>

Re: Next steps with flickering tests

Reply via email to