Hey Guys,

Expanding Active Time to include other activities sound like a great
addition to Hackystat. As always, I have a couple of comments.

1) While I agree that adding other Active Time variations will broaden our
understandings of what is going on in a project. Do we have hypotheses on
how the "other" Active Time representations will help improving the
productivity of a project? While, I think we can definitely do what is
proposed, I'm a little unsure about the add benefits of something like a
"tool active time".  For example, one would assume that an increasing "File
Editing Active Time" in both production and test code while coverage is
increasing or staying the same is a good thing. However, what would it mean
when "Tool Active Time" increases?

What sets "File Editing Active Time" apart from the other proposed Active
Time variations seems to be that the "File Editing Active Time" actually
measure "additions" to the project. The time spent in a browser, for
example, doing research in javadoc, does not measure "additions" to a
project. Therefore, there seems to be a difference in its
interpretation.  But, I think they are equally important with different
meanings.

2) "Command Line Active Time" and "Tool Active Time" sound a little
worrisome. Although, they weren't explained much, therefore, I'm kind of
making an assumption about its functionality. For example, would using a
slower computer, have longer "Command Line Active Time" than on a fast
computer?

Is there a difference between counting the number of Command invocations
and "Command Line Active Time". For example, I would much rather know that
a Developer ran 20 "ant -q quickStart junitAll" commands instead of knowing
that he spent 1 hour of executing "ant -q quickStart junitAll".

I suppose the question is: Do we have to attach a time to Command Line and
Tool activities to make the data useful?

3) Tool Active Time sounds like the idea presented here
<http://www.mail-archive.com/hackystat-dev-l%40hawaii.edu/msg00658.html>.


thanks, aaron

At 01:30 PM 5/3/2005, you wrote:
Greetings, all,

I would like to make a rather significant enhancement to the "Active Time"
metric in
Hackystat this summer.  The following (long and verbose) RFC provides
background on this
issue and a proposal.  I am circulating this to various stakeholders to
solicit feedback
and suggestions.

Summary:

The current measure of Active Time provides a repeatable and consistent
measure for one
aspect of development time: the time spent editing files related to a
project.  I propose
to redefine Active Time as an umbrella term for a set of more specific
types of Active
Time.  The current Active Time would become "File Editing Active
Time".  Other candidates
include "Tool Active Time", "Command Line Active Time", and (potentially)
"Browser Active
Time".  When doing analysis, it will be possible to generate "unions" of
these Active
Time measures that provides a broader (though still partial)
representation of developer
effort.

Motivation:

Although it is clear that some measure for effort is essential to many
kinds of insights
into software development, obtaining such a measure has confounded
software engineering
researchers for decades.  The main approaches to measuring effort involve: (1)
self-reporting, using logs or online tools, (2) manual measurement via an
external
researcher who watches developers, or (3) automated measurement via
instrumentation of
tools.  All of these approaches have significant problems associated with
them.

An essential problem with self-reported data is that it tends to be
inaccurate.  Cook and
Campbell (1979) have pointed out that subjects (a) tend to report what
they believe the
researcher expects to see, or (b) report what reflects positively on their
own abilities,
knowledge, beliefs, or opinions [1]. Schacter warned that the human memory
is fallible
and thus the reliability of self-reported data is tenuous [2]. Austen
(1996) describes
the wide range of organizational factors that can distort self-reported
data [3].

An essential problem with manual measurement via an external researcher is
that it tends
to be expensive: one must have the resources to allow a researcher to
spend substantial
amounts of time simply observing the developers at work.  While this
approach tends to
produce the most comprehensive and accurate data, it cannot scale to large
numbers of
developers at large numbers of sites.

An essential problem with automated measurement is that it tends to be
partial: it
measures only a subset of what the developers actually do.  Hackystat in
its current form
exemplifies this issue: the Active Time measure only captures the time
spent by
developers editing files associated with a project.  Measurement
infrastructure such as
the PERCS program at the University of Pittsburgh is partial in a
different way: while it
can obtain a reasonably complete picture of developer activity in the
instrumented lab
setting, it requires developers to be in that lab and to be working on
only one task
during that time.

Another way to think about the issue of developer effort measurement is in
terms of
internal and external validity. "Internal validity" basically means that
the value
computed by the measure is accurate--it reflects the measure's definition
in various
contexts.  "External validity" means that the measure is useful--in our
context, that it
can be applied to compute higher level abstractions like "productivity".

For example, assume we ask developers to keep logs of "all the time they
spend working on
a project". This would result in a measure high external validity; in
other words, this
definition of "effort" is obviously useful to determining things like
productivity.
Unfortunately, it tends to have very low internal validity.  For one
thing, there is a
lot of ambiguity in this definition, such that different developers may
make different
choices as to how to record "effort".  This kind of effort measure is also
quite
susceptible to measurement dysfunction---there are a lot of situations in
which a person
may feel the need to distort such a value due to the organizational
context in which they
work.  It's also quite hard to verify without an independent observer--how
to you check
whether or not the values are accurate? Finally, from significant personal and
professorial experience, it's just darn difficult to keep doing for any
length of time
with a reasonable level of accuracy.

On the other hand, Active Time has very high internal validity---assuming
we implement
the sensor right, it will collect Active Time the same way for any
developer using any
editor with an Active Time sensor (currently Eclipse, Emacs, Vim, Visual
Studio, and
JBuilder).  It also allows multi-tasking: developers can work on several
projects in any
given period of time, and as long as the project files can be described in
terms of
directories, the Active Time measure can "credit" effort to the correct
project(s).
Since the developer doesn't have any involvement in metric collection
(other than editing
files), there is little ambiguity, the probability of measurement
dysfunction is reduced,
and, as someone who has three years of Active Time data and counting, it's
quite painless
for the subject.   Of course, the big question is external validity---to
what extent and
in what contexts is this explicitly partial measure of overall effort
still useful for
determining things like 'productivity'?

Of course, you don't necessarily have to choose one or the
other.  Consider a hybrid
approach: you collect (possibly with the aid of an external observer) a
reasonably
accurate "effort" log for _short_ but representative periods of time in
the development
contexts of interest.  You also implement Active Time sensors.  Then you
perform a
calibration to essentially determine what portion of 'overall' effort the
Active Time
sensor is capturing.  For the unobserved periods, Active Time can now
serve as a proxy.

Proposal: Active Time, 2.0

The essential goals for Active Time 2.0 are:

(1) Preserve internal validity. In other words, Active Time 2.0 should not
require
developers to manually record effort, and it should collect data in the
same way for any
developer.

(2) Preserve multi-tasking.  In other words, Active Time 2.0 should allow
the developer
to work on several different projects during a given time period and have
Active Time
"credited" to the correct project(s).

(3) Increase external validity.  In other words, Active Time 2.0 should
provide a more
comprehensive measure for developer effort that is more useful for
determining higher
level abstractions like "productivity".

To accomplish this, "Active Time" in the next version will become an
umbrella term for a
collection of more specific measures.  The original Active Time measure
will be renamed
"File Editing Active Time" to more precisely indicate the kind of
developer effort being
measured.  Hackystat currently implements a measure called "Review Active
Time", which is
the time spent reviewing files in the Jupiter Code Review plug-in to
Eclipse.  Another
Active Time flavor is Command Line Active Time, which represents the time
spent invoking
commands at the command line in a subdirectory associated with a
Project.  Tool Active
Time represents the time spent inside a tool but not necessarily editing a
file.  There
must be some way to identify usage with a Project, however. A final
candidate is Browser
Active Time, which would capture the time spent in a browser, again as
long as the time
spent can be reliably associated with the Project. (I don't yet understand
how or if we
can satisfy that constraint.)

We can create "composite" Active Time measures by combining together
individual Active
Times, which can paint a more comprehensive picture of the time developers
spend using
tools on project-related activities. An additional advantage of this new
approach is
extensibility: as we discover new ways to define and represent "Active"
time, we can more
easily integrate them into our analysis framework.

Your thoughts are welcomed.  Pointers to additional literature related to
effort
measurement would be welcomed.

Literature Cited

[1] Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design
and analysis
issues. Boston, MA: Houghton Mifflin Company.

[2] Schacter, D. L. (1999). The seven sins of memory: Insights from
psychology and
cognitive neuroscience. American Psychology, 54, 182-203.

[3] Austen, R. (1996) Measuring and Managing Performance in Organizations,
Dorset House.

Reply via email to