[HACKYSTAT-DEV-L] Hackystat issue type demographics, system stability/volatility, and measurement dysfunction

Philip Johnson Thu, 30 Mar 2006 17:30:28 -0800

Cedric questioned the utility of the 'type' field for Jira issues recently, and thatstarted me thinking (always a dangerous thing, as it typically leads to a very longemail). I ran a few reports today in Jira and found out the following:


Release    Bugs    Improvements   New Features   Total
7.1          9           8             2           19
7.2          9           8             1           18
7.3         31          55            13           99
7.4 so far   9          33             2           44


Some initial conclusions:

[1] We only need the Bug, Improvement, and New Feature issue types. There is a Tasktype, but that turns out to be not so useful for us. Indeed, there were a few 'task'issues in 7.3 that I reassigned to either Bug, Improvement, or New Feature as a result ofthis analysis.

[2] These three issue types can actually be quite revealing about the course ofdevelopment, if we try to be reasonably good about (a) creating Jira issues fornon-trivial system work, and (b) assigning their type field appropriately. As of 7.3, Ithink we starting succeeding reasonably well at both (a) and (b). I don't think we weresucceeding at (a) well enough in 7.1 or 7.2 (or the releases prior to it) for that datato be very meaningful.

[3] We can start to make some useful conclusions when we get to around release 7.8---bythat point, we'll be up to around a half dozen releases where we kept good Jira records.Just for fun, let's pretend that the above data was accurate, and what do we find?

 - Bugs and Improvements swamp New Features for all four releases

- 7.1 and 7.2 had approximately equal amounts Bugs and Improvements, while 7.3 had halfas many Bugs as Improvements.- At the beginning of a release cycle, Improvements swamp Bugs. Will Bugs catch up bythe end? One would assume that the natural nature of bugs is to catch up over time.

[4] Understanding the 'baseline' for the proportion of Bugs, Improvements, and NewFeatures in a release can tell you some interesting things over time. First, the numberof Bugs vs. Improvements+NewFeatures can tell you how much of your developer energy youhave to allocate to cleaning up mistakes vs the amount of developer energy you canallocate to working on new stuff. In 7.3, about a third of the closed issues were bugs,which means two thirds of the closed issues were for new stuff. If we found in laterreleases that this percentage rose considerably, such as over half of the reported issuesfor a release were Bugs, then this would tell us that the system is getting less stable,and less amenable to enhancement and improvement. (All other things being equal). Ofcourse, if the percentage of bugs drops, then we are presumably doing something better.

[5] Interpreting this kind of data in isolation the way I'm doing it in [4] rests on twoassumptions. First, that the data is collected consistently across releases. (And wealready know that isn't necessarily true---look at the jump in total issues between 7.2and 7.3 caused by SVN/Jira issue linking and the process change of adding the Jira issueID to the SVN log message!) Second, that the 'cost' of an issue on average is the sameover time and the cost is the same regardless of type. I have no real idea about that.One good thing about our issue methodology--the "Quick Fix" issue takes care of thealmost-zero effort Jira issues, so that those get filtered out of the equation.

[6] The problem of the required assumptions described in [5] poses big problems for thesoftware engineering community. There is more than one member of this mailing list withfirst hand experience of the "measurement dysfunction" that can be induced by doing thiskind of analysis. For example, it's tremendously easy for a manager, if they feel theywill be rewarded (or at least avoid punishment) by showing improvement, to change theyway they gather/represent Jira issues over time so that the percentage of "Bugs" appearsto go down. This wouldn't happen in CSDL, since (a) my chair won't fire me if Hackystatgets more buggy, and (b) I am in close enough touch with the work of my students that Idon't need or use this particular metric to evaluate their status.

The opportunity with Hackystat is to do triangulation in order to mitigate the risk ofdysfunction. In other words, what if we had a whole bunch of different kinds ofindicators of release stability (Dan Port talks about something similar as "volatility";it would be interesting to see if he's thinking along the same lines). It might bepossible for a manager to intentionally or unintentionally skew one of the metrics in the'wrong' direction over the course of several releases, but it might be quite hard to get_all_ of the various indicators to move in a dysfunctional direction.

So, going back to [5], what we really want to do is _not_ have to interpret this data inisolation, but rather interpret it as part of a rich set of orthogonal proxies for thenotion of system 'stability' or 'enhancement availability' or whatever.

[7] This issue of system stability, and/or its "availability" to new enhancement, is notan idle question for me. In fact, this hits directly at one of my central worriesconcerning Hackystat. When I started this project five years ago, I had no idea how "big"one should expect a "generic" automated software engineering collection and analysisframework to be. If you'd asked me then, I probably would have guessed something like 50KLOC. Well, we're at 275 KLOC right now. What keeps me awake at night (metaphoricallyspeaking) is the question of whether Hackystat is becoming so big, and so complicated,that (a) it will begin exhibiting bugs that consume all our development resources, and/or(b) new features or improvements become hopelessly expensive to add.

Interestingly enough, monitoring the numbers and types of Jira issues over the course ofrelease cycles, in conjunction with triangulated measures, might be exactly what I needto answer this question.


Cheers,
Philip

[HACKYSTAT-DEV-L] Hackystat issue type demographics, system stability/volatility, and measurement dysfunction

Reply via email to