The rate of intermittent failures (which is called the orange factor, and the effort to diminish it is called The War on Orange) in our automation has skyrocketed. Starting around February 17 on mozilla-inbound, we took off on an exponential curve that is making it extremely hard to sheriff the tree, land patches, and otherwise get work done at Mozilla [1]. If anyone knows what started happening on the 17th that caused this sudden change, please let us know; however, it looks like it was several changes over several days. Therefore, it might not be one single push.

We track this through a metric called the Orange Factor which is simply the average number of intermittent failures encountered on each push. This means, right now when you push, on *average* you are getting 8 failures. On February 17, you were averaging 2. Something has gone horribly, horribly wrong.

If we solve the current top 10 intermittent issues [2] we will be back down to 4.47, which, while almost double where we were on Feburary 17 is a far, far better than where we are now (8.32).

I'm begging for volunteers to step forward and do everyone a favor and dig into one of these bugs below and for some set of brave souls to look critically at what landed during our exponential uptick for possible culprits.

* Bug 761987 - - The worst offender. If anyone can help out, please do. * Bug 833769 - - Memory leak that has recently spiked, Andrew McCreight is on the case * Bug 711725 - - Jmaher and dividehex are digging into this. It started because tegras would reboot intermittently, we fixed those, and now the pandas are. We suspect pandas are overheating.
* Bug 835658 - - Needs an owner
* Bug 824069 - - Needs an owner
* Bug 807230 - - Jmaher looking into this next
* Bug 764369 - - Needs an owner
* Bug 754860 - - Needs an owner
* Bug 818103 - - Needs an owner
* Bug 663657 - - Needs an owner, probably someone from my team or releng

And when you're weighing whether or not you want to jump in, remember we do have a goal to clean up the technical debt we've left ourselves in the rush to ship two 1.0 products, and this work falls (in my mind) squarely in line with that goal. Please help out where you can.

Many thanks,


dev-platform mailing list

Reply via email to