I want to express my thanks to everyone who contributed to this thread.  We 
have a lot of passionate and smart people who care about this topic- thanks 
again for weighing in so far.

Below is a slightly updated policy from the original, and following that is an 
attempt to summarize the thread and turn what makes sense into actionable items.

= Policy for handling intermittent oranges = 

This policy will define an escalation path for when a single test case is 
identified to be leaking or failing and is causing enough disruption on the 
trees. Disruption is defined as:
1) Test case is on the list of top 20 intermittent failures on Orange Factor 
(http://brasstacks.mozilla.com/orangefactor/index.html)
2) It is causing oranges >=8% of the time
3) We have >100 instances of this failure in the bug in the last 30 days

Escalation is a responsibility of all developers, although the majority will 
fall on the sheriffs.

Escalation path:
1) Ensure we have a bug on file, with the test author, reviewer, module owner, 
and any other interested parties, links to logs, etc.
2) We need to needinfo? and expect a response within 2 business days, this 
should be clear in a comment.
3) In the case we don't get a response, request a needinfo? from the module 
owner
with the expectation of 2 days for a response and getting someone to take 
action.
4) In the case we go another 2 days with no response from a module owner, we 
will disable the test.

Ideally we will work with the test author to either get the test fixed or 
disabled depending on available time or difficulty in fixing the test.  If a 
bug has activity and work is being done to address the issue, it is reasonable 
to expect the test will not be disabled.  Inactivity in the bug is the main 
cause for escalation.

This is intended to respect the time of the original test authors by not 
throwing emergencies in their lap, but also strike a balance with keeping the 
trees manageable.

Exceptions:
1) If this test has landed (or been modified) in the last 48 hours, we will 
most likely back out the patch with the test
2) If a test is failing at least 30% of the time, we will file a bug and 
disable the test first
3) When we are bringing a new platform online (Android 2.3, b2g, etc.) many 
tests will need to be disabled prior to getting the tests on tbpl.
4) In the rare case we are disabling the majority of the tests (either at once 
or slowly over time) for a given feature, we need to get the module owner to 
sign off on the current state of the tests.


= Documentation =
We have thousands of tests disabled, many are disabled for different build 
configurations or platforms. This can be dangerous as we slowly reduce our 
coverage. By running a daily report (bug 996183) to outline the total tests 
available vs each configuration (b2g, debug, osx, e10s, etc.) we can bring 
visibility to the state of each platform and if we are disabling more than we 
fix.

We need to have a clear guide on how to run the tests, how to write a test, how 
to debug a test, and use metadata to indicate if we have looked at this test 
and when.

When an intermittent bug is filed, we need to clearly outline what information 
will aid the most in reproducing and fixing this bug.  Without a documented 
process for fixing oranges, this falls on the shoulders of the original test 
authors and a few determined hackers.


= General Policy =
I have adjusted the above policy to mention backing out new tests which are not 
stable, working to identify a regression in the code or tests, and adding 
protection so we do not disable coverage for a specific feature completely. In 
addition, I added a clearer definition of what is a disruptive test and 
clarified the expectations around communicating in the bug vs escalating.

What is more important is the culture we have around commiting patches to 
Mozilla repositories. We need to decide as an organization if we care about 
zero oranges (or insert acceptable percentage). We also need to decide what is 
acceptable coverage levels and what our general policy is for test reviews (at 
checkin time and in the future). These need to be answered outside of this 
policy- but the sooner we answer these questions, the better we can all move 
forward towards the same goal.


= Tools =
Much of the discussion was around tools. As a member of the Automation and 
Tools team, I should be advocating for more tools, in this case I am leaning 
more towards less tools and better process.

One common problem is dealing with the noise around infrastructure and changing 
environments and test harnesses. Is this documented, how can we filter that 
out? Having our tools support ways to detect this and annotate changes 
unrelated to tests or builds will go a long way.  Related is updating our 
harnesses and the way we run tests so they are more repeatable.  I have filed 
bug 996504 to track work on this.

Another problem we can look at with tooling is annotating the expected outcome 
of the tests with metadata (suggestions of manifest as well as external 
server). Once we get there we have options such as:
* rerunning tests (until they pass, or to document failure patterns)
* putting all oranges in their own suite
* ignoring results of known oranges

Of course no discussion would be complete without talking about what we could 
do if this problem were solved.  Honorable mentions are:
* Autoland
* Orange Factor / Test Statistics
* Auto Bisection


Happy hacking,
Joel
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to