Greetings Andy, Lorin, Patrick, and the hackystat dev crew,

I am writing to summarize recent work on command line sensor data collection and analysis and to propose a future direction.

The background: understanding HPC software development involves gaining insight into the behaviors and outcomes of developer activities, which in this domain involves a lot of (unix) command line interaction. In previous work, we created a command line sensor that utilizes the "history" mechanism to obtain timestamped data consisting of the developer's commands. This sensor was quite easy to write, but has one substantial shortcoming: it does not obtain any information about the results of the command invoked.

In the "next generation" command line sensor, we would like to capture both (a) what command was entered and at what time, and (b) what the results of the command were (and at what time these results appeared).

To satisfy these requirements, we would like to propose the following design decomposition into three tools:

[1] A CLI logging mechanism. This tool would be similar, but not identical, to the "script" command in Unix. The difference is that 'script' does not provide timestamps on each command, the command results, or the current working directory associated with the command. The user could invoke the logging mechanism similar to the way the script command is invoked, and our tool would create a logging file (preferably, but not necessarily, in XML format) similar to the following:

<cli-logger>

<cli-logger-entry>
<invocation time="1134730020809" current-directory="/user/home/johnson/" machine="bertha.ics.hawaii.edu">
ls -la
</invocation>
<result time="11347300245673">
drwxr-xr-x  36 johnson  csdl         4096 Dec 15 16:13 ./
drwxr-xr-x  40 root     root         1024 Feb 11  2005 ../
-rw-------   1 root     other          76 Feb  3  2001 .TTauthority
-rw-------   1 johnson  csdl          916 Jan 22  2004 .Xauthority
-rw-r-----   1 johnson  csdl          612 Jul 22  1999 .Xdefaults
</result>
</cli-logger-entry>

</cli-logger-entry>
<invocation time="1134732837465" current-directory="/user/home/johnson/svn/hackyCore_Build" machine="bertha.ics.hawaii.edu">
ant -q quickStart
</invocation>
<result time="11347567893">
    [echo] (12:40:04) Completed hackyCore_Build.checkModuleAvailability
    [echo] (12:40:11) Completed all.compile
    [echo] (12:40:43) Completed all.install.pre-sensorshell
    [echo] (12:41:01) Completed hackyCore_Build.installSensorShell
    [echo] (12:41:44) Completed all.install.post-sensorshell
    [echo] (12:41:44) Completed hackyCore_Build.deployTestData

BUILD FAILED
C:\svn\hackyCore_Build\tomcat.build.xml:14: Tomcat does not appear to be running on http://localhost:8080/
Total time: 1 minute 45 seconds
</result>
</cli-logger-entry>
</cli-logger>

One should be able to build this tool by slight modifications to an open source distribution of 'script'. Another idea would be to modify the "sudoscript" <http://www.egbok.com/sudoscript/> tool.

[2] A command logger file post-processor. Tool [1] is intended to be quite generic and just capture everything. However, we wouldn't want to send literally everything in the command shell off to Hackystat for a variety of reasons. Thus, Tool [2] would be a post-processor for the output of Tool [1], which would figure out what's worth saving from the log depending upon the specific needs of the research, and generate another XML file containing the actual sensor data, such as:

<sensor>
<entry tstamp="1134730020809" tool="Cli-Logger" machine="bertha.ics.hawaii.edu" command="ls -la" results=""/> <entry tstamp="1134730020809" tool="Cli-Logger" machine="bertha.ics.hawaii.edu" command="ant -q quickStart" results="failed"/>
</sensor>

The details will clearly turn out to be different, but the idea is that in a particular research context like HPC, we might not care at all about the output from "ls -la", and only care whether the build failed or not when invoking "ant".

[3] A Hackystat sensor for the output of the post-processor. This basically takes the post-processed data and sends it to Hackystat.

Once the data is in Hackystat, it can be merged with other developer data such as from an IDE like Eclipse, used as input to a markov model generator or workflow analysis engine like PROM, exported to some other environment along with other sensor data, etc.

If this seems reasonable to you all, then I would like to propose that we split up the work, with Lorin/Patrick/Andy taking responsibility for the 'front end' (i.e. Tool [1]), and the hackystat dev team taking responsibility for the 'back end' (i.e. Tool [3]). We can work together on Tool [2].

How does this sound?

Cheers,
Philip






Reply via email to