Greetings, Sebastian,

[I am cc'ing this thread to hackystat-dev-l, the hackystat developers list, 
since I think
it will be interesting to them and they may have some additional insight for 
you.
Hopefully what I've quoted is sufficient for people to recover the thread.]

The main focus of our work is error prevention in software development and 
programming.
Simply stated, I'd like to give an answer to the question "What are typical 
situations
which indicate that a programmer is about to insert a defect in the code?" To 
this end,
my interest is in testing any correlation between some programming habit (as 
well as
circumstances) and defects inserted, for example: * Any time a programmer 
changed some
specific part of the code more than three times the probability of this part 
containing
a defect raises significantly. (Trial-and-error episode) * The probability of 
code
containing a defect which has been copied (and changed afterwards) from some 
other code
location is higher than for non-copied code. (Copy-Paste-Change episode) * Being
interrupted many times while writing some part of code results in defective 
code with
raised probability.

Very interesting!

The key, as I see it, is that people engage in "defect prone behavior" at one 
point in
time, but in most cases the defect only becomes known at some later point in 
time. So,
there are two questions:

(a) Can we establish that your hypothesized "defect prone behavior" is real? In 
other
words, if we observe "defect prone behavior", does it actually result in higher
probability of defects vis a vis "nondefect prone behavior"?   This is an 
interesting
question: while I'm sure that "random hacking" (i.e. trial-and-error episode) 
does often
result in defective code, I'm not sure that it results in defective code 
significantly
more often than "non-random but nevertheless bogus hacking".

(b) Can we use this information effectively for defect prevention?  Let's say 
that
"random hacking" results in a defect 1 out of 10 times, and "nonrandom hacking" 
only
results in a defect 1 out of 20 times. Even though that difference might be 
statistically
significant, it's not clear how to use that information when the frequency of 
occurrence
is so low (after all, 9 out of 10 times, random hacking appears to work!  
Indeed, I use
that strategy myself when a library API is not documented well and the only way 
for me to
figure out how it works is to try a bunch of different combinations.

I think there could be a useful role for Hackystat to play in this research 
question.
Basically, I see the following kinds of abstractions:

* Suspect Coding Behaviors (SCB).  They include "Trial-and-Error", 
"Copy-Paste", etc.
* Ordinary Coding Behaviors (OCB). They include, I guess, everything else. :-)
* Code Locations (CL).  Maybe this is a File, or Class, or Method.
* Programmer. This is the hacker. We probably want some demographic info about 
him/her.
(Experience level, etc.)
* Suspect Events (SE).  This is a tuple (<TimeStamp>, <Programmer>, <SCB>, <CL>)
* Ordinary Events (OE). This is a tuple (<TimeStamp>, <Programmer>, <OCB>, <CL>)
* Defect Event (DE). This is a tuple (<TimeStamp>, Defect Type/Description, 
<CL>)

When you look at things this way, a few things pop out.

1. This provides a natural boundary between what is represented in Hackystat 
and what
isn't.  Except for "Programmer", each of these abstractions naturally 
corresponds to a
Sensor Data Type in Hackystat. What that means is that the client-side (i.e. 
Eclipse,
etc.) is responsible for taking the "micro-process" data and grinding on it to 
discover
SCBs, OCBs, CLs, SEs, OEs, and DEs. It's those things that get sent to 
Hackystat for
server-side analysis.

2. It's the server side that would be trying to figure out if there are 
correlations
between event streams that are statistically significant. The good news here is 
that we
will be working on two modules for Hackystat this summer to support 
experimentation: one
which allows you to specify a set of Subjects, and another which provides an 
interface
between Hackystat and R <http://www.r-project.org/>. This should greatly 
simplify the
analyses required to see if an SE does lead to a DE with greater probability 
than an OE.

3. It might be interesting to explore the "latency" between an SCB and a 
related DE. One
hypothesis is that changes to the environment/process that reduce the time 
interval
between doing something bad and finding out that it was bad would be helpful. 
(At the
language level, that's one motivation for static type checking. At the process 
level,
that's one motivation for daily builds.) So, you could do an interesting study 
where you
establish a baseline for the latency between SCBs and related DEs, and then 
introduce a
change and see if that reduces the latency.  That's one way to provide useful 
support
while getting around the problem of notifying the developer each time they do 
an SCB when
only a small fraction of them actually result in DEs.

Some of these proposed findings have already been empirically supported by work 
on the
Psychology of Programming or by some archeology studies. What's new is that we 
try to
examine real fine grained events (down to the dynamics of typing program 
statements). I
call this the "micro-process" of software development. This is even more 
"micro" than
what Hackystat is usually logging (i.e. file changes, most active file, 
building,
active time, etc).

True, and as I note above, it may be best to think of two levels of abstraction here: the "micro" (handled on the Eclipse side) and the "macro" (handled on the Hackystat side).

In fact, the best tool for error prevention would be a just-in-time warning 
messenger
as Philip mentioned, but I guess, the probability of false positives will be 
too high.
But other means could help as well: * The tool could give non-disruptive hints 
on
anomalies in the micro-process while coding. That would be a kind of static 
check of
the process (just like static analysis on the code, i.e. the product). * After a
developer found a defect she might ask "Hey, once again a stupid bug. What 
happened
when I wrote this?" A log of micro-process data could help answering it.

Frank is the first working on the tool. His task is to provide the micro-process
"grabber" with an eclipse sensor. The main requirement can be fulfilled by 
Hackystat:
Collecting event data (though currently not as fine-grained as provided by us) 
from
many sources. As you might see, some requirements are not met: * We'd like to 
have free
access to the data for some later analysis. I'm not sure how to get API-like 
access to
Hackystat data.

That's quite easy. You can send HTTP requests to the server and get XML back. Or a zip file of an entire user's data. But the R interface might be better than either of these two.

The amount of data in the micro-process is higher than in usual
Hackystat sessions, I guess. * On the other hand, in the first step it's not 
important
for us to collect data from more than one developer. It should work locally, 
and should
be easily installable. But a central server might especially support this. * 
We'd like
to partly "replay" the micro-process on some part of the code (which would also 
be a
great demo) for further investigation and for later annotation of the stream,
automatically as well as manually. * But first of all, we need some episode 
recognizer
(or preprocessor, or annotator) which provides a just-in-time (as well as later)
analysis of the event stream. I thought of a state mechine reacting on the 
events, and
I guess that's pretty equivalent to a rule-based approach which you seem to 
prefer.

The latter point is what catches my interest concerning your work. I'm looking 
forward
to some fruitful discussion!

How's this for a start? :-)


Cheers, Sebastian

Reply via email to