Greetings, hackers,

Cedric's totally cool telemetry page on www.hackystat.org got me thinking 
(always a
dangerous thing).

One of the prime benefits of software project telemetry is that it simplifies 
the task of
looking for co-varying trends, such as "when test coverage goes below a certain 
value,
reported defects increase", or "long sequences of daily build failures indicate
non-superficial (i.e. design-level) problems with the system", indicating that 
overall
productivity is going to drop.

The stumbling block is that looking for 'co-varying' trends almost always takes 
the form
of comparing trend data from different kinds of streams (i.e. LOC vs. Active 
Time,
Coverage Percentage vs. Open Issues, etc.)  The way we solve that currently is 
to create
a report, where we display a set of telemetry charts (each one restricted to a 
single
type of 'unit') and try to mentally juxtapose the different trend lines.

I've been fine with that so far, because we've had more important things on our 
plate
(such as stabilizing the telemetry language, build reduction function API, and 
so forth.)
Now that things are really looking pretty good that way, I'd like to propose a 
hopefully
simple enhancement that I believe will dramatically improve the usability of 
software
project telemetry: multi-axis (or multi-unit) charts.

For an example of what I mean, here's a demo chart from the JFreeChart site:

<http://www.jfree.org/jfreechart/images/MultipleAxisDemo1.png>

What's cool about this is that one can have as many different Y-Axis scales as 
you want,
although I imagine that we would typically want just two or three.

OK, so how would this work? I see the following steps:

(1) [Hongbing] Enhancement of hackyReport to support multi-axis charts.  I 
think this
should begin with an update of our jfreechart binary to 1.0.0-pre2.  Hopefully 
the ripple
effect does not move beyond hackyReport.  Then, extend the hackyReport API in 
whatever
way seems most natural to support multi-axis charts. (Probably want a design 
review of
the API change.) Produce example in the hackyReportExample package, JUnit 
tests, and so
forth.

(2) [Cedric] Enhancement of Telemetry Language to support multi-axis charts.  
This is
where, of course, things get interesting.  My proposal is to (a) change the 
stream
definition language to require an explicit specification of the <units> 
associated with
the stream(s) returned by a streams declaration; (b) change the chart 
definition language
to remove the specification of the Y-axis label, and (c) have the telemetry 
language
interpreter automatically figure out how many Y-axis types there are by 
checking the
<units> associated with the set of streams in the chart and create a single 
axis or
multiple axis chart appropriately.

For example, consider the following telemetry specification for drawing the 
first chart
in the telemetry compendium page at
<http://hackystat.ics.hawaii.edu/hackystat/docbook/ch05s07.html>:

streams ActiveTimeStream(filePattern, cumulative) = {
  "Active Time", ActiveTime(filePattern, cumulative)
};

chart ActiveTimeChart() = {
 "Overall Active Time", "Hours", ActiveTimeStream("**", "false")
};

draw ActiveTimeChart();

As you can see, we define a stream, then a chart containing that stream, then 
we draw it.
The displayed chart has "Hours" as the Y-axis, which as you can see is 
specified in the
Chart definition.  That makes sense, currently, since all charts are single 
axis charts.

Now, let's say we want to chart Active Time and SLOC on the same chart.  Well, 
we could
actually do that right now, as follows:

streams ActiveTimeStream(filePattern, cumulative) = {
  "Active Time", ActiveTime(filePattern, cumulative)
};

streams JavaFileMetricStream(filePattern, cumulative) = {
  "Java SLOC",
  JavaFileMetric("Sloc", filePattern, cumulative)
};

chart BogusActiveTimeAndSlocChart() = {
 "Active Time vs. SLOC", "HoursAndSloc",
 ActiveTimeStream("**", "false"),
 JavaFileMetricStream("**", "false")
};

draw BogusActiveTimeAndSlocChart();

The reason this chart is bogus is that the chart will display both Active Time 
and Sloc
along the same axis, which makes no sense, particularly in the case of 
Hackystat where
Active Time is generally in the range of 0-100 and Sloc is generally in the 
range of
90,000-100,000.  As we all know, this scale difference is going to destroy any 
ability to
see covariances in the displayed chart.

So, to solve this problem, I suggest the following simple change to the 
language: instead
of specifying a "Y-axis label" in the chart definition, we instead specify 
"units" in the
streams definition.  These string values are used by the telemetry language 
interpreter
to determine how many axes should be displayed in the chart.  If two streams 
are defined
with the same "units" string, then all of those steams are displayed using the 
same
Y-axis.

So, here's what our fixed ActiveTimeAndSlocChart definition would look like:

streams ActiveTimeStream(filePattern, cumulative) = {
  "Active Time", "Hours",
  ActiveTime(filePattern, cumulative)
};

streams JavaFileMetricStream(filePattern, cumulative) = {
  "Java SLOC", "SLOC"
  JavaFileMetric("Sloc", filePattern, cumulative)
};

chart ActiveTimeAndSlocChart() = {
 "Overall Active Time",
 ActiveTimeStream("**", "false"),
 JavaFileMetricStream("**", "false")
};

draw ActiveTimeAndSlocChart();

As you can see, we've simply added the specification of units (i.e. "Hours" and 
"SLOC")
to the streams definition, and removed the units specification from the chart 
definition.
When the "draw" command is invoked, the system must see that ActiveTimeStream 
and
JavaFileMetricStream have different units values and thus invoke the 
multiple-axis chart
to display them.

Interestingly, while this change appears to make no net difference to the 
complexity of
the telemetry definition language, I expect it to dramatically increase the 
usability of
telemetry. First, it enables substantial "compression" of our 
visualizations---in the
best case scenario, a single chart can now display what currently requires 2, 
3, or four
charts.  Second, if you have a hypothesis about co-variance, you can now in 
many cases
represent that hypothesis in terms of a single chart.

Of course, there are lots of relationships between process and product data 
that can't be
captured this way, so I'm not suggesting that this is the final solution.  What 
I do know
is that we have long argued that tracking single data streams was not the 
answer; the
power of software metrics in general (and hackystat in particular) comes from 
the ability
to inspect and reason about different kinds of data and discover relationships 
between
them.  I believe that if we can add multiple-axis charts to the user's toolkit, 
it makes
the discovery and exploitation of those relationships that much simpler.

I've defined this as an enhancement request in Jira
<http://hackydev.ics.hawaii.edu:8080/browse/HACK-272> and assigned it to 
Hongbing
initially (since the hackyReport changes need to be implemented before any 
changes can be
made to hackyTelemetry).

Please provide feedback to the list and we can update the Jira issue as 
appropriate.

Cheers,
Philip

Reply via email to