Here was my methodology. First, I did a listproc search on the hackystat-l archive for the string "build report is available". This is part of the email that's generated when the daily build fails. I received a very large email back with several hundred matches. A couple of example lines:
Build report is available at http://xenia.ics.hawaii.edu/hackyDevSite/configurationBuildReport.do?year=2003&month=8&day=6&configuration=Hackystat-UHFile hackystat-l.0308:
Build report is available at http://xenia.ics.hawaii.edu/hackyDevSite/configurationBuildReport.do?year=2003&month=8&day=6&configuration=Hackystat-JPL
I then saved this file, edited out the lines that didn't fit this pattern, fired up Emacs, and wrote a quick keyboard macro to strip out everything but the year, month, day, and configuration. An excerpt of the revised data looked like this:
2003 8 6 Hackystat-UH 2003 8 6 Hackystat-JPL 2003 8 6 Hackystat-All 2003 8 7 Hackystat-UH 2003 8 7 Hackystat-JPL 2003 8 7 Hackystat-All
I then imported this file into Excel, did a little more post-processing, and have attached the resulting spreadsheet. Here are some of my findings:
* The hackystat-l list provides data from August 2003 to August 2004. We moved to hackystat-dev-l in September, 2004, so there is a couple of months missing. However, the current data set provides a full year's worth of data.
* I stripped out many occurrences of commented 'build report is available', which appear to refer to emails in which I or others explained why the build failed. I have generally tried to send out email each time the build failed (sometimes including the failure report, sometimes not). So, it would be worth looking through the archives carefully to see what emails followed each build failure report; in many cases, I bet that the cause of the build failure can be determined from the email followup messages.
* In this year's worth of data, there were approximately 270 build failure emails. However, many of these (as the above example excerpts show) are duplicates, in the sense that a single problem can cause more than one configuration to fail. So, I added a new column to the spreadsheet where I put a '1' for each occurrence of a failure on a single day. The resulting total indicates the number of days in which the build failed (one or more times) in a day. That number is 111.
* Out of a year, to have 111 days where the build failed at least once is quite surprising to me. That's an average of a build failure every three days, which is at least twice the frequency that I would have guesstimated.
* I then added another column to the spreadsheet to total up the number of times the build failed in a given month. The results range from 0 (for May, 2004, where there were no reported build failures) to 19 (for April, 2004). The average was 9.25 build failures per month. Again, that's almost one build failure per three days.
* Stating it as 1 failure per three days actually understates the level of build failures, because some days clearly have multiple build failures: for example, 29-July-2004 has 10 build failures, all for the hackystat-UH configuration.
* There are lots of 'runs', in which the build fails for multiple days in a row. For example, the build failed every day between 16-April-2004 and 26-April-2004.
* While the bad news is that the build fails a lot more frequently than I thought, the good news is that Cedric has a fairly sizable set of data to analyze for his build project. Some interesting questions:
- Why did the build fail on each of these days?
- What is an interesting classification scheme for build failure? (i.e. compile, checkstyle, junit, run-time exception, Ant, etc.)
- What percentages of builds failed for each category in the classification scheme?
- Any trends in categories? Any categories of failure becoming rarer? Any becoming more frequent?
- Are there differences between 'runs' of build failures and 'isolated' build failures?
- Any trends with respect to the modules involved in failures? Which modules fail more frequently? Does this change over time?
- Can we trace the failure back to a responsible person? If so, what are the trends/frequencies?
- What are some hypotheses regarding how to lower the rate of build failure?
Once these questions are answered, we can figure out how to improve our build sensor to capture more of the relevant data, and what kinds of analyses we can put in place to see what happens if we carry out any hypothesized process/product changes intended to reduce the rate of daily build failure.
Cheers, Philip
builddata.xls
Description: MS-Excel spreadsheet
