On Tuesday, March 7, 2017 at 11:45:38 PM UTC-5, Chris Pearce wrote: > I recommend that instead of classifying intermittents as tests which fail > > 30 times per week, to instead classify tests that fail more than some > threshold percent as intermittent. Otherwise on a week with lots of checkins, > a test which isn't actually a problem could clear the threshold and cause > unnecessary work for orange triage people and developers alike. > > The currently published threshold is 8%: > > https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy#Identifying_problematic_tests > > 8% seems reasonable to me. > > Also, whenever a test is disabled, not only should a bug be filed, but please > _please_ need-info the test owner or at least someone on the affected team. > > If a test for a feature is disabled without the maintainer of that feature > knowing, then we are flying blind and we are putting the quality of our > product at risk. > > > cpearce. >
Thanks cpearce for the concern here. Regarding disabling tests, all tests that we have disabled as part of the stockwell project have started out with a triage where we ni the responsible party and the bug is filed in the component where the test is associated with. I assume if the bug is filed in the right component others from the team will be made aware of this. Right now I assume the triage owner of a component is the owner of the tests and can proxy the request to the correct person on the team (many times the original author is on PTO, busy with a project, left the team, etc.). Please let me know if this is a false assumption and what we could do to better get bugs in front of the right people. I agree 8% is a good number, the sheriff policy has other criteria (top 20 on orange factor, 100 times/month). We picked 30 times/week as that is where bugs start becoming frequent enough to easily reproduce (locally or on try) and it would be reasonable to expect a fix. There is ambiguity when using a %, on a low volume week (as most of december was) we see <500 pushes/week, also the % doesn't indicate the amount of times the test was run- this is affected by SETA (reducing tests for 4/5 commits to save on load) and by people doing retriggers/backfills. If last week the test was 8%, and it is 7% this week- do we ignore it? Picking a single number like 30 times/7days removes ambiguity and ensures that we can stay focused on things and don't have to worry about recalculations. It is true on lower volume weeks that 30 times/7days doesn't happen as frequently, yet we have always had many bugs to work on with that threshold. _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform