On 19/12/2013 23:56, Jason Orendorff wrote:
On 12/19/13 4:55 PM, David Burns wrote:
On 19/12/2013 18:48, Jason Orendorff wrote:
Con:
- more work for sheriffs (mostly merges)
If mostly merges, are you suggesting there will be little traffic on
the branch or the JS team will watch the tree for failures?
Neither, I'm just saying the overall rate of broken patches wouldn't
increase much, which I think shouldn't be controversial.

That is, sheriffing is not watching trees, it's fighting bustage. Each
busted patch and each intermittent orange creates a ton of work. It
stands to reason that diverting some patches to a separate tree won't
increase the volume of patches, except to the degree it actually
improves developer efficiency (and let's have that problem, please).

For context, I manage the sheriffs so want to be sure what I am signing them up for. If "the overall rate of broken patches wouldn't increase much", why can't we keep things on inbound and when the tree is closed just using the "checkin-needed" keyword and let the sheriffs manage continue to manage the bustage and start landing patches again?


2013-07 : 6 days, 13:46:11
2013-08 : 4 days, 5:42:17
2013-09 : 4 days, 20:59:41
2013-10 : 4 days, 21:22:40
2013-11 : 8 days, 4:58:30
2013-12 : 2 days, 16:47:42
I know the point of including these numbers was, "hey look it's not that
bad", but this is really shocking.

I know its bad and this is why I am tracking this information! I am watching how many backouts are affecting closures[1] and what the backout to push ratio[2] is. Currently these figures scare me and the default stance that I get from platform engineers is "It's probably cheaper to push and get backed out than push to try". This comes back to my papering over the cracks be spreading things around.

We're looking at an average of
something like 125 hours per month that developers can't check stuff in.
Even if the breakage is evenly distributed across time zones
(optimistic) we're looking at zero 9s of availability.

I know that RelEng are looking into how to do scheduling better, I am not sure where they are with this or if it is started but its a good first step. The whole "a push can take hours to build/test" is the thing that we need to be pushing against. I think if we solve that problem their will be a significant drop in bad pushes. A bad push is 3 times more expensive than a good push just in compute hours (we have 1 backout in every 15 pushes on average), never mind the cost of someone doing a pull after a bad push and them trying to solve why things don't build.


We've all gotten used to it, but it's kind of nuts.

Couldnt agree more!


-j


David

[1] https://secure.theautomatedtester.co.uk/owncloud/public.php?service=files&t=f54a3e2edabb70771d64e473b30780ac [2] https://secure.theautomatedtester.co.uk/owncloud/public.php?service=files&t=ca3312fa7e0914e8352e96d44a48569f
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to