TL;DR:
* Inbound is closed 25% of the time
* Turning off coalescing could increase resource usage by up to 60% (but probably less than this). * We spend 24% of our machine resources on changes that are later backed out, or changes that are doing the backout * The vast majority of changesets that are backed out from inbound are detectable on a try push

Because of the large effect from coalescing, any changes to the current process must not require running the full set of tests on every push. (In my proposal this is easily accomplished with trychooser syntax, but other proposals include rotating through T-runs on pushes, etc.).

--- Long verion below ---

Following up from the infra load meeting we had last week, I spent some time this weekend crunching various pieces of data on mozilla-inbound to get a sense of how much coalescing actually helps us, how much backouts hurt us, and generally to get some data on the impact of my previous proposal for using a multi-headed tree. I didn't get all the data that I wanted but as I probably won't get back to this for a bit, I thought I'd share what I found so far and see if anybody has other specific pieces of data they would like to see gathered.

-- Inbound uptime --

I looked at a ~9 day period from April 7th to April 16th. During this time:
* inbound was closed for 24.9587% of the total time
* inbound was closed for 15.3068% of the total time due to "bustage".
* inbound was closed for 11.2059% of the total time due to "infra".

Notes:
1) "bustage" and "infra" were determined by grep -i on the data from treestatus.mozilla.org.
2) There is some overlap so bustage + infra != total.
3) I also weighted the downtime using checkins-per-hour histogram from joduinn's blog at [1], but this didn't have a significant impact: the total, bustage, and infra downtime percentages moved to 25.5392%, 15.7285%, and 11.3748% respectively.

-- Backout changes --

Next I did an analysis of the changes that landed on inbound during that time period. The exact pushlog that I looked at (corresponding to the same April 7 - April 16 time period) is at [2]. I removed all of the merge changesets from this range, since I wanted to look at inbound in as much isolation as possible.

In this range:
* there were a total of 916 changesets
* there were a total of 553 "pushes"
* 74 of the 916 changesets (8.07%) were backout changesets
* 116 of the 916 changesets (12.66%) were backed out
* removing all backouts and changes backed out removed 114 pushes (20.6%)

Of the 116 changesets that were backed out:
* 37 belonged to single-changeset pushes
* 65 belonged to multi-changeset pushes where the entire pushed was backed out * 14 belonged to multi-changeset pushes where the changesets were selectively backed out

Of the 74 backout changesets:
* 4 were for commit message problems
* 25 were for build failures
* 36 were for test failures
* 5 were for leaks/talos regressions
* 1 was for premature landing
* 3 were for unknown reasons

Notes:
1) There were actually 79 backouts, but I ignored 5 of them because they backed out changes that happened prior to the start of my range).
2) Additional changes at the end of my range may have been backed out,
but the backouts were not in my range so I didn't include them in my
analysis.
3) The 14 csets that were selectively backed out is interesting to me because it implies that somebody did some work to identify which changes in the push were bad, and this naturally means that there is room to save on doing that work.

-- Merge conflicts --

I also wanted to determine how many of these changes conflicted with each other, and how far away the conflicting changes were. I got a partial result here but I need to do more analysis before I have numbers worth posting.

-- Build farm resources --

Finally, I used a combination of gps' mozilla-build-analyzer tool [3] and some custom tools to determine how much machine time was spent on building all of these pushes and changes.

I looked at all the build.json files [4] from the 6th of April to the 17th of April and pulled out all the jobs that corresponding to the "push" changesets in my range above. For this set of 553 changesets, there were 500 (exactly!) distinct "builders". 111 of these had "-pgo" or "_pgo" in the name, and I excluded them. I created a 553x389 matrix with the remaining builders and filled in how much time was spent on each changeset for each builder (in case of multiple jobs, I added the times).

Then I assumed that any empty field in the 553x389 matrix was a result of coalescing. This is a grossly simplifying assumption that I would like to revisit - I know for Android changes we can detect that in some cases and only run the relevant tests; my assumption means the rest of the platforms are considered "coalesced" for these changes. I filled in these fields in the matrix with the average time spent on all the other builds for that builder in the matrix.

* A total of 228717299 seconds were spent on the 128777 entries in the matrix * After de-coalescing, a total of 373751505 seconds would have been spent on the 215117 entries in the matrix (an increase of 63%) * With coalescing, but removing all the backout pushes and pushes that were completely backed out, a total of 173027517 seconds were spent on 97623 entries (down 24% from actual usage) * With de-coalescing AND stripping backouts, a total of 292634211 seconds would have been spent on the 168437 entries (an increase of 27% over actual usage)

Notes:
1) I tried a minor variation where I excluded the 21 builders that ran on less than 50% of the changes, on the assumption that these were not coalesced out, but are actually run on demand. This brought down the increase from 63% to 58%.

-- Conclusions --

See TL;DR up top.

Cheers,
kats

[1] http://oduinn.com/images/2013/blog_2013_02_pushes_per_hour.png
[2] https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=74354f979ea8&tochange=cad82c3b69bc [3] http://gregoryszorc.com/blog/2013/04/01/bulk-analysis-of-mozilla%27s-build-and-test-data/
[4] http://builddata.pub.build.mozilla.org/buildjson/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to