Greetings, Sebastian,
[I am cc'ing this thread to hackystat-dev-l, the hackystat developers list, since I think it will be interesting to them and they may have some additional insight for you. Hopefully what I've quoted is sufficient for people to recover the thread.]
The main focus of our work is error prevention in software development and programming. Simply stated, I'd like to give an answer to the question "What are typical situations which indicate that a programmer is about to insert a defect in the code?" To this end, my interest is in testing any correlation between some programming habit (as well as circumstances) and defects inserted, for example: * Any time a programmer changed some specific part of the code more than three times the probability of this part containing a defect raises significantly. (Trial-and-error episode) * The probability of code containing a defect which has been copied (and changed afterwards) from some other code location is higher than for non-copied code. (Copy-Paste-Change episode) * Being interrupted many times while writing some part of code results in defective code with raised probability.
Very interesting!
The key, as I see it, is that people engage in "defect prone behavior" at one point in time, but in most cases the defect only becomes known at some later point in time. So, there are two questions:
(a) Can we establish that your hypothesized "defect prone behavior" is real? In other words, if we observe "defect prone behavior", does it actually result in higher probability of defects vis a vis "nondefect prone behavior"? This is an interesting question: while I'm sure that "random hacking" (i.e. trial-and-error episode) does often result in defective code, I'm not sure that it results in defective code significantly more often than "non-random but nevertheless bogus hacking".
(b) Can we use this information effectively for defect prevention? Let's say that "random hacking" results in a defect 1 out of 10 times, and "nonrandom hacking" only results in a defect 1 out of 20 times. Even though that difference might be statistically significant, it's not clear how to use that information when the frequency of occurrence is so low (after all, 9 out of 10 times, random hacking appears to work! Indeed, I use that strategy myself when a library API is not documented well and the only way for me to figure out how it works is to try a bunch of different combinations.
I think there could be a useful role for Hackystat to play in this research question. Basically, I see the following kinds of abstractions:
* Suspect Coding Behaviors (SCB). They include "Trial-and-Error", "Copy-Paste", etc. * Ordinary Coding Behaviors (OCB). They include, I guess, everything else. :-) * Code Locations (CL). Maybe this is a File, or Class, or Method. * Programmer. This is the hacker. We probably want some demographic info about him/her. (Experience level, etc.) * Suspect Events (SE). This is a tuple (<TimeStamp>, <Programmer>, <SCB>, <CL>) * Ordinary Events (OE). This is a tuple (<TimeStamp>, <Programmer>, <OCB>, <CL>) * Defect Event (DE). This is a tuple (<TimeStamp>, Defect Type/Description, <CL>)
When you look at things this way, a few things pop out.
1. This provides a natural boundary between what is represented in Hackystat and what isn't. Except for "Programmer", each of these abstractions naturally corresponds to a Sensor Data Type in Hackystat. What that means is that the client-side (i.e. Eclipse, etc.) is responsible for taking the "micro-process" data and grinding on it to discover SCBs, OCBs, CLs, SEs, OEs, and DEs. It's those things that get sent to Hackystat for server-side analysis.
2. It's the server side that would be trying to figure out if there are correlations between event streams that are statistically significant. The good news here is that we will be working on two modules for Hackystat this summer to support experimentation: one which allows you to specify a set of Subjects, and another which provides an interface between Hackystat and R <http://www.r-project.org/>. This should greatly simplify the analyses required to see if an SE does lead to a DE with greater probability than an OE.
3. It might be interesting to explore the "latency" between an SCB and a related DE. One hypothesis is that changes to the environment/process that reduce the time interval between doing something bad and finding out that it was bad would be helpful. (At the language level, that's one motivation for static type checking. At the process level, that's one motivation for daily builds.) So, you could do an interesting study where you establish a baseline for the latency between SCBs and related DEs, and then introduce a change and see if that reduces the latency. That's one way to provide useful support while getting around the problem of notifying the developer each time they do an SCB when only a small fraction of them actually result in DEs.
Some of these proposed findings have already been empirically supported by work on the Psychology of Programming or by some archeology studies. What's new is that we try to examine real fine grained events (down to the dynamics of typing program statements). I call this the "micro-process" of software development. This is even more "micro" than what Hackystat is usually logging (i.e. file changes, most active file, building, active time, etc).
True, and as I note above, it may be best to think of two levels of abstraction here: the "micro" (handled on the Eclipse side) and the "macro" (handled on the Hackystat side).
In fact, the best tool for error prevention would be a just-in-time warning messenger as Philip mentioned, but I guess, the probability of false positives will be too high. But other means could help as well: * The tool could give non-disruptive hints on anomalies in the micro-process while coding. That would be a kind of static check of the process (just like static analysis on the code, i.e. the product). * After a developer found a defect she might ask "Hey, once again a stupid bug. What happened when I wrote this?" A log of micro-process data could help answering it.
Frank is the first working on the tool. His task is to provide the micro-process "grabber" with an eclipse sensor. The main requirement can be fulfilled by Hackystat: Collecting event data (though currently not as fine-grained as provided by us) from many sources. As you might see, some requirements are not met: * We'd like to have free access to the data for some later analysis. I'm not sure how to get API-like access to Hackystat data.
That's quite easy. You can send HTTP requests to the server and get XML back. Or a zip file of an entire user's data. But the R interface might be better than either of these two.
The amount of data in the micro-process is higher than in usual Hackystat sessions, I guess. * On the other hand, in the first step it's not important for us to collect data from more than one developer. It should work locally, and should be easily installable. But a central server might especially support this. * We'd like to partly "replay" the micro-process on some part of the code (which would also be a great demo) for further investigation and for later annotation of the stream, automatically as well as manually. * But first of all, we need some episode recognizer (or preprocessor, or annotator) which provides a just-in-time (as well as later) analysis of the event stream. I thought of a state mechine reacting on the events, and I guess that's pretty equivalent to a rule-based approach which you seem to prefer.
The latter point is what catches my interest concerning your work. I'm looking forward to some fruitful discussion!
How's this for a start? :-)
Cheers, Sebastian
