Greetings, all,
Since Cedric is going to be starting to use the BuildAnalysis mechanism in earnest within
a few weeks, I wanted to provide some feedback about the current version of the system
and some suggestions for improvement.
First off, lets think about the basic questions that the BuildAnalysis mechanism should
help us answer in the event of a daily build failure. I think there are three:
(1) Who was responsible for breaking the build?
Example answer: johnson
(2) Why did the build break?
Example answer: junit failure in hackyCore_Kernel
(3) What might have prevented the build from breaking?
Example answer: ant freshStart hackyCore_Kernel.junit
The reason these three answers (i.e. data) are useful is because they give rise naturally
to three telemetry streams that we can use to understand any progress that we're making.
The first question can show us "personal" trends in how often we break the build; the
second question can show us trends in how the build broke, and the third shows us trends
in what preventative measure could have been used but wasn't.
So, my first suggestion is to reorganize the output of the build analysis so that the
email alert consists of answers to these three questions. The URL that you click on
brings up a page containing a representation of all of the data that was used to derive
the answers. Put another way, the build analysis mechanism has two components:
- a data gathering component that looks at all of the sensor data for a project for a
given day (or more!) and creates a "BuildAnalysis" instance.
- a reasoning mechanism that operates on that data to determine its best guess at the
answers to the three questions above. The reasoning mechanism augments the BuildAnalysis
instance with information about how it reached its conclusions.
When you click on the URL in the email, you get taken to a page that displays both the
data that was collected and used in the reasoning process, as well as some kind of
description of what reasoning was applied and how it resulted in the answers to the three
questions.
OK, so far, so good. But there's another issue that must be resolved in order to make
this mechanism useful in our environment. And that is the fact that the reasoner may not
always come to the right answer for one or more of the three questions, or in some cases
it may not come up with an answer at all. Therefore, there needs to be a way for a
developer to "correct" the answers provided by the reasoner. Otherwise, the telemetry
streams are useless because they won't reflect the true answers to the questions.
Because of this, the URL that you click on to see the details on build analysis must do
more than simply show you the data; it must also provide you with the ability to manually
change the answers associated with the three questions (by way of a pull-down list of
possible answers). And these answers are persisted and used when generating the telemetry
streams. Through this technique, one can even keep track of when an answer provided by
the developer is different than the answer provided by the reasoner, and thus calculate
the accuracy of the mechanism and how that changes over time.
How does this sound?
Cheers,
Philip