On Tuesday, March 7, 2017 at 11:45:38 PM UTC-5, Chris Pearce wrote:
> I recommend that instead of classifying intermittents as tests which fail > 
> 30 times per week, to instead classify tests that fail more than some 
> threshold percent as intermittent. Otherwise on a week with lots of checkins, 
> a test which isn't actually a problem could clear the threshold and cause 
> unnecessary work for orange triage people and developers alike.
> 
> The currently published threshold is 8%:
> 
> https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy#Identifying_problematic_tests
> 
> 8% seems reasonable to me.
> 
> Also, whenever a test is disabled, not only should a bug be filed, but please 
> _please_ need-info the test owner or at least someone on the affected team.
> 
> If a test for a feature is disabled without the maintainer of that feature 
> knowing, then we are flying blind and we are putting the quality of our 
> product at risk.
> 
> 
> cpearce.
>

Thanks cpearce for the concern here.  Regarding disabling tests, all tests that 
we have disabled as part of the stockwell project have started out with a 
triage where we ni the responsible party and the bug is filed in the component 
where the test is associated with.  I assume if the bug is filed in the right 
component others from the team will be made aware of this.  Right now I assume 
the triage owner of a component is the owner of the tests and can proxy the 
request to the correct person on the team (many times the original author is on 
PTO, busy with a project, left the team, etc.).  Please let me know if this is 
a false assumption and what we could do to better get bugs in front of the 
right people.

I agree 8% is a good number, the sheriff policy has other criteria (top 20 on 
orange factor, 100 times/month).  We picked 30 times/week as that is where bugs 
start becoming frequent enough to easily reproduce (locally or on try) and it 
would be reasonable to expect a fix.  There is ambiguity when using a %, on a 
low volume week (as most of december was) we see <500 pushes/week, also the % 
doesn't indicate the amount of times the test was run- this is affected by SETA 
(reducing tests for 4/5 commits to save on load) and by people doing 
retriggers/backfills.  If last week the test was 8%, and it is 7% this week- do 
we ignore it?

Picking a single number like 30 times/7days removes ambiguity and ensures that 
we can stay focused on things and don't have to worry about recalculations.  It 
is true on lower volume weeks that 30 times/7days doesn't happen as frequently, 
yet we have always had many bugs to work on with that threshold.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to