On 4/8/14, 6:51 AM, James Graham wrote:
On 08/04/14 14:43, Andrew Halberstadt wrote:
On 07/04/14 11:49 AM, Aryeh Gregor wrote:
On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek <t...@mielczarek.org>
wrote:
If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.
To some degree, yes, marking a test as expected intermittent causes it
to lose value. If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so. That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test"). Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.
I think this proposal would make more sense if the state of our
infrastructure and tooling was able to handle it properly. Right now,
automatically marking known intermittents would cause the test to lose
*all* value. It's sad, but the only data we have about intermittents
comes from the sheriffs manually starring them. There is also currently
no way to mark a test KNOWN-RANDOM and automatically detect if it starts
failing permanently. This means the failures can't be starred and become
nearly impossible to discover, let alone diagnose.
So, what's the minimum level of infrastructure that you think would be
needed to go ahead with this plan? To me it seems like the current
system already isn't working very well, so the bar for moving forward
with a plan that would increase the amount of data we had available to
diagnose problems with intermittents, and reduce the amount of manual
labour needed in marking them, should be quite low.
The simple solution is to have a separate in-tree manifest annotation
for intermittents. Put another way, we can describe exactly why we are
not running a test. This is kinda/sorta the realm of bug 922581.
The harder solution is to have some service (like orange factor) keep
track of the state of every test. We can have a feedback loop whereby
test automation queries that service to see what tests should run and
what the expected result is. Of course, we will want that integration to
work locally so we have consistent test execution between automation and
developer machines.
I see us inevitably deploying the harder solution. We'll eventually get
to a point where we're able to do "crazy" things such as intelligently
run only the sub-set of tests impacted by a check-in or attempting to
run disabled tests to see if they magically started working again. I
think we'll eventually realize that tracking this in a central service
makes more sense than doing it in-tree (mainly because of the amount of
data required to make some advanced determinations).
For the short term, I think we should enumerate the reasons we don't run
a test (distinguishing between "test isn't compatible" and "test isn't
working" is important) and annotate these separately in our test
manifests. We can then modify our test automation to treat things
differently. For example, we could:
1) Run failed tests multiple times. If it is intermittent but not marked
as such, we fail the test run.
2) Run marked intermittent tests multiple times. If it works all 25
times, fail the test run for inconsistent metadata.
3) Integrate intermittent failures into TBPL/Orange Factor better.
To address David Baron's concern about silently passing intermittently
failing tests, yes, silently passing is wrong. But I would argue it is
the lesser evil of disabling tests outright.
I think we can all agree that the current approach of disabling failing
tests (the equivalent of sweeping dust under the rug) isn't sustainable.
But if it's the Sherriffs' job to keep the trees green and their only
available recourse is to disable tests, well, they are going to disable
tests. We need more metadata and tooling around disabled tests and we
needed it months ago.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform