Cedric questioned the utility of the 'type' field for Jira issues recently, and that
started me thinking (always a dangerous thing, as it typically leads to a very long
email). I ran a few reports today in Jira and found out the following:
Release Bugs Improvements New Features Total
7.1 9 8 2 19
7.2 9 8 1 18
7.3 31 55 13 99
7.4 so far 9 33 2 44
Some initial conclusions:
[1] We only need the Bug, Improvement, and New Feature issue types. There is a Task
type, but that turns out to be not so useful for us. Indeed, there were a few 'task'
issues in 7.3 that I reassigned to either Bug, Improvement, or New Feature as a result of
this analysis.
[2] These three issue types can actually be quite revealing about the course of
development, if we try to be reasonably good about (a) creating Jira issues for
non-trivial system work, and (b) assigning their type field appropriately. As of 7.3, I
think we starting succeeding reasonably well at both (a) and (b). I don't think we were
succeeding at (a) well enough in 7.1 or 7.2 (or the releases prior to it) for that data
to be very meaningful.
[3] We can start to make some useful conclusions when we get to around release 7.8---by
that point, we'll be up to around a half dozen releases where we kept good Jira records.
Just for fun, let's pretend that the above data was accurate, and what do we find?
- Bugs and Improvements swamp New Features for all four releases
- 7.1 and 7.2 had approximately equal amounts Bugs and Improvements, while 7.3 had half
as many Bugs as Improvements.
- At the beginning of a release cycle, Improvements swamp Bugs. Will Bugs catch up by
the end? One would assume that the natural nature of bugs is to catch up over time.
[4] Understanding the 'baseline' for the proportion of Bugs, Improvements, and New
Features in a release can tell you some interesting things over time. First, the number
of Bugs vs. Improvements+NewFeatures can tell you how much of your developer energy you
have to allocate to cleaning up mistakes vs the amount of developer energy you can
allocate to working on new stuff. In 7.3, about a third of the closed issues were bugs,
which means two thirds of the closed issues were for new stuff. If we found in later
releases that this percentage rose considerably, such as over half of the reported issues
for a release were Bugs, then this would tell us that the system is getting less stable,
and less amenable to enhancement and improvement. (All other things being equal). Of
course, if the percentage of bugs drops, then we are presumably doing something better.
[5] Interpreting this kind of data in isolation the way I'm doing it in [4] rests on two
assumptions. First, that the data is collected consistently across releases. (And we
already know that isn't necessarily true---look at the jump in total issues between 7.2
and 7.3 caused by SVN/Jira issue linking and the process change of adding the Jira issue
ID to the SVN log message!) Second, that the 'cost' of an issue on average is the same
over time and the cost is the same regardless of type. I have no real idea about that.
One good thing about our issue methodology--the "Quick Fix" issue takes care of the
almost-zero effort Jira issues, so that those get filtered out of the equation.
[6] The problem of the required assumptions described in [5] poses big problems for the
software engineering community. There is more than one member of this mailing list with
first hand experience of the "measurement dysfunction" that can be induced by doing this
kind of analysis. For example, it's tremendously easy for a manager, if they feel they
will be rewarded (or at least avoid punishment) by showing improvement, to change they
way they gather/represent Jira issues over time so that the percentage of "Bugs" appears
to go down. This wouldn't happen in CSDL, since (a) my chair won't fire me if Hackystat
gets more buggy, and (b) I am in close enough touch with the work of my students that I
don't need or use this particular metric to evaluate their status.
The opportunity with Hackystat is to do triangulation in order to mitigate the risk of
dysfunction. In other words, what if we had a whole bunch of different kinds of
indicators of release stability (Dan Port talks about something similar as "volatility";
it would be interesting to see if he's thinking along the same lines). It might be
possible for a manager to intentionally or unintentionally skew one of the metrics in the
'wrong' direction over the course of several releases, but it might be quite hard to get
_all_ of the various indicators to move in a dysfunctional direction.
So, going back to [5], what we really want to do is _not_ have to interpret this data in
isolation, but rather interpret it as part of a rich set of orthogonal proxies for the
notion of system 'stability' or 'enhancement availability' or whatever.
[7] This issue of system stability, and/or its "availability" to new enhancement, is not
an idle question for me. In fact, this hits directly at one of my central worries
concerning Hackystat. When I started this project five years ago, I had no idea how "big"
one should expect a "generic" automated software engineering collection and analysis
framework to be. If you'd asked me then, I probably would have guessed something like 50
KLOC. Well, we're at 275 KLOC right now. What keeps me awake at night (metaphorically
speaking) is the question of whether Hackystat is becoming so big, and so complicated,
that (a) it will begin exhibiting bugs that consume all our development resources, and/or
(b) new features or improvements become hopelessly expensive to add.
Interestingly enough, monitoring the numbers and types of Jira issues over the course of
release cycles, in conjunction with triangulated measures, might be exactly what I need
to answer this question.
Cheers,
Philip