Re: Policy for disabling tests which run on TBPL

Gregory Szorc Wed, 09 Apr 2014 11:02:53 -0700

On 4/8/14, 6:51 AM, James Graham wrote:

On 08/04/14 14:43, Andrew Halberstadt wrote:

On 07/04/14 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek <t...@mielczarek.org>
wrote:

If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


I think this proposal would make more sense if the state of our
infrastructure and tooling was able to handle it properly. Right now,
automatically marking known intermittents would cause the test to lose
*all* value. It's sad, but the only data we have about intermittents
comes from the sheriffs manually starring them. There is also currently
no way to mark a test KNOWN-RANDOM and automatically detect if it starts
failing permanently. This means the failures can't be starred and become
nearly impossible to discover, let alone diagnose.


So, what's the minimum level of infrastructure that you think would be
needed to go ahead with this plan? To me it seems like the current
system already isn't working very well, so the bar for moving forward
with a plan that would increase the amount of data we had available to
diagnose problems with intermittents, and reduce the amount of manual
labour needed in marking them, should be quite low.

The simple solution is to have a separate in-tree manifest annotationfor intermittents. Put another way, we can describe exactly why we arenot running a test. This is kinda/sorta the realm of bug 922581.

The harder solution is to have some service (like orange factor) keeptrack of the state of every test. We can have a feedback loop wherebytest automation queries that service to see what tests should run andwhat the expected result is. Of course, we will want that integration towork locally so we have consistent test execution between automation anddeveloper machines.

I see us inevitably deploying the harder solution. We'll eventually getto a point where we're able to do "crazy" things such as intelligentlyrun only the sub-set of tests impacted by a check-in or attempting torun disabled tests to see if they magically started working again. Ithink we'll eventually realize that tracking this in a central servicemakes more sense than doing it in-tree (mainly because of the amount ofdata required to make some advanced determinations).

For the short term, I think we should enumerate the reasons we don't runa test (distinguishing between "test isn't compatible" and "test isn'tworking" is important) and annotate these separately in our testmanifests. We can then modify our test automation to treat thingsdifferently. For example, we could:

1) Run failed tests multiple times. If it is intermittent but not markedas such, we fail the test run.2) Run marked intermittent tests multiple times. If it works all 25times, fail the test run for inconsistent metadata.

3) Integrate intermittent failures into TBPL/Orange Factor better.

To address David Baron's concern about silently passing intermittentlyfailing tests, yes, silently passing is wrong. But I would argue it isthe lesser evil of disabling tests outright.

I think we can all agree that the current approach of disabling failingtests (the equivalent of sweeping dust under the rug) isn't sustainable.But if it's the Sherriffs' job to keep the trees green and their onlyavailable recourse is to disable tests, well, they are going to disabletests. We need more metadata and tooling around disabled tests and weneeded it months ago.

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

Reply via email to