Hey Guys,
Expanding Active Time to include other activities sound like a great addition to Hackystat. As always, I have a couple of comments.
1) While I agree that adding other Active Time variations will broaden our understandings of what is going on in a project. Do we have hypotheses on how the "other" Active Time representations will help improving the productivity of a project? While, I think we can definitely do what is proposed, I'm a little unsure about the add benefits of something like a "tool active time". For example, one would assume that an increasing "File Editing Active Time" in both production and test code while coverage is increasing or staying the same is a good thing. However, what would it mean when "Tool Active Time" increases?
What sets "File Editing Active Time" apart from the other proposed Active Time variations seems to be that the "File Editing Active Time" actually measure "additions" to the project. The time spent in a browser, for example, doing research in javadoc, does not measure "additions" to a project. Therefore, there seems to be a difference in its interpretation. But, I think they are equally important with different meanings.
2) "Command Line Active Time" and "Tool Active Time" sound a little worrisome. Although, they weren't explained much, therefore, I'm kind of making an assumption about its functionality. For example, would using a slower computer, have longer "Command Line Active Time" than on a fast computer?
Is there a difference between counting the number of Command invocations and "Command Line Active Time". For example, I would much rather know that a Developer ran 20 "ant -q quickStart junitAll" commands instead of knowing that he spent 1 hour of executing "ant -q quickStart junitAll".
I suppose the question is: Do we have to attach a time to Command Line and Tool activities to make the data useful?
3) Tool Active Time sounds like the idea presented here <http://www.mail-archive.com/hackystat-dev-l%40hawaii.edu/msg00658.html>.
thanks, aaron
At 01:30 PM 5/3/2005, you wrote:
Greetings, all,
I would like to make a rather significant enhancement to the "Active Time" metric in Hackystat this summer. The following (long and verbose) RFC provides background on this issue and a proposal. I am circulating this to various stakeholders to solicit feedback and suggestions.
Summary:
The current measure of Active Time provides a repeatable and consistent measure for one aspect of development time: the time spent editing files related to a project. I propose to redefine Active Time as an umbrella term for a set of more specific types of Active Time. The current Active Time would become "File Editing Active Time". Other candidates include "Tool Active Time", "Command Line Active Time", and (potentially) "Browser Active Time". When doing analysis, it will be possible to generate "unions" of these Active Time measures that provides a broader (though still partial) representation of developer effort.
Motivation:
Although it is clear that some measure for effort is essential to many kinds of insights into software development, obtaining such a measure has confounded software engineering researchers for decades. The main approaches to measuring effort involve: (1) self-reporting, using logs or online tools, (2) manual measurement via an external researcher who watches developers, or (3) automated measurement via instrumentation of tools. All of these approaches have significant problems associated with them.
An essential problem with self-reported data is that it tends to be inaccurate. Cook and Campbell (1979) have pointed out that subjects (a) tend to report what they believe the researcher expects to see, or (b) report what reflects positively on their own abilities, knowledge, beliefs, or opinions [1]. Schacter warned that the human memory is fallible and thus the reliability of self-reported data is tenuous [2]. Austen (1996) describes the wide range of organizational factors that can distort self-reported data [3].
An essential problem with manual measurement via an external researcher is that it tends to be expensive: one must have the resources to allow a researcher to spend substantial amounts of time simply observing the developers at work. While this approach tends to produce the most comprehensive and accurate data, it cannot scale to large numbers of developers at large numbers of sites.
An essential problem with automated measurement is that it tends to be partial: it measures only a subset of what the developers actually do. Hackystat in its current form exemplifies this issue: the Active Time measure only captures the time spent by developers editing files associated with a project. Measurement infrastructure such as the PERCS program at the University of Pittsburgh is partial in a different way: while it can obtain a reasonably complete picture of developer activity in the instrumented lab setting, it requires developers to be in that lab and to be working on only one task during that time.
Another way to think about the issue of developer effort measurement is in terms of internal and external validity. "Internal validity" basically means that the value computed by the measure is accurate--it reflects the measure's definition in various contexts. "External validity" means that the measure is useful--in our context, that it can be applied to compute higher level abstractions like "productivity".
For example, assume we ask developers to keep logs of "all the time they spend working on a project". This would result in a measure high external validity; in other words, this definition of "effort" is obviously useful to determining things like productivity. Unfortunately, it tends to have very low internal validity. For one thing, there is a lot of ambiguity in this definition, such that different developers may make different choices as to how to record "effort". This kind of effort measure is also quite susceptible to measurement dysfunction---there are a lot of situations in which a person may feel the need to distort such a value due to the organizational context in which they work. It's also quite hard to verify without an independent observer--how to you check whether or not the values are accurate? Finally, from significant personal and professorial experience, it's just darn difficult to keep doing for any length of time with a reasonable level of accuracy.
On the other hand, Active Time has very high internal validity---assuming we implement the sensor right, it will collect Active Time the same way for any developer using any editor with an Active Time sensor (currently Eclipse, Emacs, Vim, Visual Studio, and JBuilder). It also allows multi-tasking: developers can work on several projects in any given period of time, and as long as the project files can be described in terms of directories, the Active Time measure can "credit" effort to the correct project(s). Since the developer doesn't have any involvement in metric collection (other than editing files), there is little ambiguity, the probability of measurement dysfunction is reduced, and, as someone who has three years of Active Time data and counting, it's quite painless for the subject. Of course, the big question is external validity---to what extent and in what contexts is this explicitly partial measure of overall effort still useful for determining things like 'productivity'?
Of course, you don't necessarily have to choose one or the other. Consider a hybrid approach: you collect (possibly with the aid of an external observer) a reasonably accurate "effort" log for _short_ but representative periods of time in the development contexts of interest. You also implement Active Time sensors. Then you perform a calibration to essentially determine what portion of 'overall' effort the Active Time sensor is capturing. For the unobserved periods, Active Time can now serve as a proxy.
Proposal: Active Time, 2.0
The essential goals for Active Time 2.0 are:
(1) Preserve internal validity. In other words, Active Time 2.0 should not require developers to manually record effort, and it should collect data in the same way for any developer.
(2) Preserve multi-tasking. In other words, Active Time 2.0 should allow the developer to work on several different projects during a given time period and have Active Time "credited" to the correct project(s).
(3) Increase external validity. In other words, Active Time 2.0 should provide a more comprehensive measure for developer effort that is more useful for determining higher level abstractions like "productivity".
To accomplish this, "Active Time" in the next version will become an umbrella term for a collection of more specific measures. The original Active Time measure will be renamed "File Editing Active Time" to more precisely indicate the kind of developer effort being measured. Hackystat currently implements a measure called "Review Active Time", which is the time spent reviewing files in the Jupiter Code Review plug-in to Eclipse. Another Active Time flavor is Command Line Active Time, which represents the time spent invoking commands at the command line in a subdirectory associated with a Project. Tool Active Time represents the time spent inside a tool but not necessarily editing a file. There must be some way to identify usage with a Project, however. A final candidate is Browser Active Time, which would capture the time spent in a browser, again as long as the time spent can be reliably associated with the Project. (I don't yet understand how or if we can satisfy that constraint.)
We can create "composite" Active Time measures by combining together individual Active Times, which can paint a more comprehensive picture of the time developers spend using tools on project-related activities. An additional advantage of this new approach is extensibility: as we discover new ways to define and represent "Active" time, we can more easily integrate them into our analysis framework.
Your thoughts are welcomed. Pointers to additional literature related to effort measurement would be welcomed.
Literature Cited
[1] Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues. Boston, MA: Houghton Mifflin Company.
[2] Schacter, D. L. (1999). The seven sins of memory: Insights from psychology and cognitive neuroscience. American Psychology, 54, 182-203.
[3] Austen, R. (1996) Measuring and Managing Performance in Organizations, Dorset House.
