First, you will notice that there are no FileMetric (as I wrote this email someone other than hackystat-l sent FileMetric Data), UnitTest, Coverage, Issues, etc Data that usually are collected using hackystat-l. I assume that Cruise Control did not run. Why is that? I remember hearing something about Cruise Control only runs when no changes were made to any of the modules. Is that true?
It's true. That's the way Cruise Control is built. It makes sense for the development task that Cruise Control was designed for (continuous integration).
If so, why not just build the system any way? If we don't run a build every day, there will be missing data points in analyses. By looking at the streams, it will be difficult to determine why there is no data. Possible reasons could be, the build failed, the sensor wasn't working, the system didn't build etc. Lets take one of those variables out of the picture by just building the system regardless if there are any changes. It isn't like we are saving resources by not building.
If this turns out to be easy (Cedric--is this easy?) then I see no reason not to build the system every day. However, if it's hard, I don't see a huge problem with not building it when we don't make any changes. Besides, if we're not working on the system every day, doesn't that mean we're not working hard enough? :-)
However, it won't solve the underlying problem. If the system has a compilation error in it, then we will have a dropout in the sensor data. This is just something that people interpreting the measurements have to deal with--it's really a fairly fundamental lesson learned from Hackystat.
Second, you will notice that Takuya has about 1.17 hours of active time and no Builds (or his sensor is not enabled). That's a strange development process. Furthermore, he hasn't ran any Unit Tests (or his sensor is not enabled).
Third, you will notice that Hongbing just built the system 14 times and no ActiveTime. That is also strange. (I assume he is working on Zoro). Hongbing also hasn't ran any Unit Tests (or his sensor is not enabled).
These are actually very profound observations. They are examples of a wide variety of situations where the data streams are inconsistent from a semantic point of view.
In the experimental domain, there is the notion of "triangulation", where you gather data from several different sources and you have a set of rules about how you expect the data from one stream to look given certain kinds of data from another stream. In other words, you use multiple individual streams to get a sense of whether the overall collective data stream "makes sense".
Your examples above show the power of the technique. Why didn't Takuya ever build the system? Why did Hongbing build 14 times in a row whether editing anything? These are interesting questions and their answers would reveal interesting things about our development process.
I've made this an issue so we don't forget about it: <http://hackydev.ics.hawaii.edu:8080/browse/HACK-203>
Forth, apparently I ran a build today but, I'm pretty sure I didn't run a real "Build". A build, in my mind, is a compilation or testing of the system. Actually, I did a "ant cvsUpdateAll". Why is that considered as a "Build"? It really is an Ant Target invocation, not a Build. When would that be interesting? (Maybe Hongbing did 14 cvsUpdateAll's?)
Well, there can be a few different varieties of build. You're entitled to your own opinion of what is "real".
Fifth, the CVS sensor isn't working. I know Cedric is aware of this, I know he is swamped with other stuff as we all are, but luckily we have a retrospective CVS sensor. Its a little strange that the sensor can stop working and the Unit Test all pass, assuming of course that the sensor is up-to-date.
Is that still true? I just ran a DailyProjectDetails for March 2 and it all seemed good to me.
To conclude, I propose that all Hackystat developers enable all sensors, except LOCC, Coverage, Jira, and Performace. At least, this will eliminate questions like, I wonder if he didn't build the system or doesn't have the sensor enabled. We talk about decreasing the build failure rate but we don't have all the information possible to make that happen. We should also be building using Cruise Control everyday. I can only see positives and no negatives.
Agreed.
Also, I'd like to throw out a concern of mine.. LOCC an Coverage are snap-shot type product measures. In our Hackystat process, we have a single user (hackystat-l) that is responsible for sending these measures. We do that partially because we don't want to clobber the "real snap-shot" with something we do for own local configurations. For example, say we all enabled LOCC and/or Coverage and sent data the information in those DailyProjectData representations will be fairly unreliable. By the way, I'm experiencing this problem right now in the Clew2-UH project. I act as a manual hackystat-l by building the Full System as much as possible. Then comes along a developer who has to have their sensor enabled for their other project, which contains a smaller set of workspaces, and clobbers the "good" data. In our Hackystat process we made a "hack", with hackystat-l, to fix that problem. In other situations this will be annoying. One solution could be a project level declaration of the "daily build user". Another, solution is to make a runtime map per workspace/file pattern. Therefore, even though a developer clobbers the data with a smaller set of workspaces the original workspaces "shine" through. That has some potential data integrity problems, however so does clobbering.
Yes, as the FileMetric data for late February shows where Hongbing's use of LOCC on hackyZorro led to a temporary drop in size for Hackystat from 60K to 30K, this is a real problem.
I don't think the runtime map proposal will work, since if someone runs LOCC, then deletes the entire package, then runs LOCC again, we won't see the deletion.
The viable options AFAIK are: 1. Tell Hongbing not to run LOCC. :-) More generally, the development team simply needs to understand that only one 'agent' user should be computing size info. Then all of the subprojects will get their size data computed by the agent.
2. Add a preference setting that tells the analysis which user to use for what kind of analysis. This can quickly get a bit out of hand--you have the potential for each project to have a different user for each type of sensor data.
Cheers, Philip
