Greetings, hackers, Cedric's totally cool telemetry page on www.hackystat.org got me thinking (always a dangerous thing).
One of the prime benefits of software project telemetry is that it simplifies the task of looking for co-varying trends, such as "when test coverage goes below a certain value, reported defects increase", or "long sequences of daily build failures indicate non-superficial (i.e. design-level) problems with the system", indicating that overall productivity is going to drop. The stumbling block is that looking for 'co-varying' trends almost always takes the form of comparing trend data from different kinds of streams (i.e. LOC vs. Active Time, Coverage Percentage vs. Open Issues, etc.) The way we solve that currently is to create a report, where we display a set of telemetry charts (each one restricted to a single type of 'unit') and try to mentally juxtapose the different trend lines. I've been fine with that so far, because we've had more important things on our plate (such as stabilizing the telemetry language, build reduction function API, and so forth.) Now that things are really looking pretty good that way, I'd like to propose a hopefully simple enhancement that I believe will dramatically improve the usability of software project telemetry: multi-axis (or multi-unit) charts. For an example of what I mean, here's a demo chart from the JFreeChart site: <http://www.jfree.org/jfreechart/images/MultipleAxisDemo1.png> What's cool about this is that one can have as many different Y-Axis scales as you want, although I imagine that we would typically want just two or three. OK, so how would this work? I see the following steps: (1) [Hongbing] Enhancement of hackyReport to support multi-axis charts. I think this should begin with an update of our jfreechart binary to 1.0.0-pre2. Hopefully the ripple effect does not move beyond hackyReport. Then, extend the hackyReport API in whatever way seems most natural to support multi-axis charts. (Probably want a design review of the API change.) Produce example in the hackyReportExample package, JUnit tests, and so forth. (2) [Cedric] Enhancement of Telemetry Language to support multi-axis charts. This is where, of course, things get interesting. My proposal is to (a) change the stream definition language to require an explicit specification of the <units> associated with the stream(s) returned by a streams declaration; (b) change the chart definition language to remove the specification of the Y-axis label, and (c) have the telemetry language interpreter automatically figure out how many Y-axis types there are by checking the <units> associated with the set of streams in the chart and create a single axis or multiple axis chart appropriately. For example, consider the following telemetry specification for drawing the first chart in the telemetry compendium page at <http://hackystat.ics.hawaii.edu/hackystat/docbook/ch05s07.html>: streams ActiveTimeStream(filePattern, cumulative) = { "Active Time", ActiveTime(filePattern, cumulative) }; chart ActiveTimeChart() = { "Overall Active Time", "Hours", ActiveTimeStream("**", "false") }; draw ActiveTimeChart(); As you can see, we define a stream, then a chart containing that stream, then we draw it. The displayed chart has "Hours" as the Y-axis, which as you can see is specified in the Chart definition. That makes sense, currently, since all charts are single axis charts. Now, let's say we want to chart Active Time and SLOC on the same chart. Well, we could actually do that right now, as follows: streams ActiveTimeStream(filePattern, cumulative) = { "Active Time", ActiveTime(filePattern, cumulative) }; streams JavaFileMetricStream(filePattern, cumulative) = { "Java SLOC", JavaFileMetric("Sloc", filePattern, cumulative) }; chart BogusActiveTimeAndSlocChart() = { "Active Time vs. SLOC", "HoursAndSloc", ActiveTimeStream("**", "false"), JavaFileMetricStream("**", "false") }; draw BogusActiveTimeAndSlocChart(); The reason this chart is bogus is that the chart will display both Active Time and Sloc along the same axis, which makes no sense, particularly in the case of Hackystat where Active Time is generally in the range of 0-100 and Sloc is generally in the range of 90,000-100,000. As we all know, this scale difference is going to destroy any ability to see covariances in the displayed chart. So, to solve this problem, I suggest the following simple change to the language: instead of specifying a "Y-axis label" in the chart definition, we instead specify "units" in the streams definition. These string values are used by the telemetry language interpreter to determine how many axes should be displayed in the chart. If two streams are defined with the same "units" string, then all of those steams are displayed using the same Y-axis. So, here's what our fixed ActiveTimeAndSlocChart definition would look like: streams ActiveTimeStream(filePattern, cumulative) = { "Active Time", "Hours", ActiveTime(filePattern, cumulative) }; streams JavaFileMetricStream(filePattern, cumulative) = { "Java SLOC", "SLOC" JavaFileMetric("Sloc", filePattern, cumulative) }; chart ActiveTimeAndSlocChart() = { "Overall Active Time", ActiveTimeStream("**", "false"), JavaFileMetricStream("**", "false") }; draw ActiveTimeAndSlocChart(); As you can see, we've simply added the specification of units (i.e. "Hours" and "SLOC") to the streams definition, and removed the units specification from the chart definition. When the "draw" command is invoked, the system must see that ActiveTimeStream and JavaFileMetricStream have different units values and thus invoke the multiple-axis chart to display them. Interestingly, while this change appears to make no net difference to the complexity of the telemetry definition language, I expect it to dramatically increase the usability of telemetry. First, it enables substantial "compression" of our visualizations---in the best case scenario, a single chart can now display what currently requires 2, 3, or four charts. Second, if you have a hypothesis about co-variance, you can now in many cases represent that hypothesis in terms of a single chart. Of course, there are lots of relationships between process and product data that can't be captured this way, so I'm not suggesting that this is the final solution. What I do know is that we have long argued that tracking single data streams was not the answer; the power of software metrics in general (and hackystat in particular) comes from the ability to inspect and reason about different kinds of data and discover relationships between them. I believe that if we can add multiple-axis charts to the user's toolkit, it makes the discovery and exploitation of those relationships that much simpler. I've defined this as an enhancement request in Jira <http://hackydev.ics.hawaii.edu:8080/browse/HACK-272> and assigned it to Hongbing initially (since the hackyReport changes need to be implemented before any changes can be made to hackyTelemetry). Please provide feedback to the list and we can update the Jira issue as appropriate. Cheers, Philip
