Lorin writes:
Whenever you ask me what analysis I'd like in Hackystat, I'm never quite sure, because much of the analysis I have done so far is exploratory, and Hackystat just isn't the platform I envision for doing that (I use "R", which is like an open-source, Matlab-like language/environment: www.r-project.org.). I think what's been throwing me is my view of the word "analysis".
What I would really like out of Hackystat is an easy way to track all of the students in a class. I think Hackystat is great, but I'm clearly trying to shoehorn it into a task it wasn't designed for. It thinks in "projects", where multiple programmers are working on the same source files. What I want to work with is the concept of an "assignment" (or, even better, "experiment"), where the students are working independently, on the same task. I'd like to see graphs that show me data for all of the students at once, to see how much active time each student has spent so far.
I imagine this would require some significant additional functionality on the server side. Actually, we're getting a couple of German undergrads to come over and work with us for a couple of months (I think around March), and the HPCS project gets one of them. This might be a good project for that student. There's also Mike Paulding, if this fits into his research interest.
If the Vanderbilt people advance with their HPC plugin, and we can start capturing data on execution time and program correctness, then I might have some more ideas for some HPCS-specific Hackystat analysis. (Maybe the Vanderbilt folks would be interested in implementing this functionality...).
An Eclipse-based-Hackystat-HPCS-experimental-development-environment would be a wonderful thing.
Hi Lorin,
I'm cc'ing the hackystat-dev-l list on this response because I think your ideas are really great.
I completely agree that Hackystat in its current incarnation does not fulfill its potential as experimental infrastructure, and I think that you suggest two really excellent ideas for increasing its usability.
First, I don't think the problem is the Project representation (which is useful in an experimental context for delineating which sensor data, over what interval, should be used for analysis). The problem is that we do not provide any higher level aggregation on top of Projects that says "Let's look at the data from a set of users (specifying a single Project for each)". We, in fact, already have a module (hackyCourse) that provides most of what you need. The limitation is that it implements a very narrow range of analyses over the set of users. The hackyCourse module preceded our telemetry infrastructure, and what would be very cool would be to enhance hackyCourse into a new module (hackyExperiment?) that brings the power of our telemetry infrastructure to the multiple user analyses provided by hackyCourse.
Second, I agree with your preference for R for certain kinds of exploration. What I would love to see is tighter integration between R and hackystat. (hackyR?) What I'm envisioning is a way to directly connect the telemetry reduction functions (and/or the daily analysis functions) to R. This adds a lot of usability by enabling you to leverage the 'data cleaning' aspects that the reduction functions and/or daily analyses provide. For example, it turns out that extracting size information about a system takes a little bit of post processing from the sensor data---you need among other things to make sure that if you send size sensor data twice during a day, you don't double count as a result. If you just feed the raw sensor data into R, you have to discover and deal with these issues in R. On the other hand, if we can create a way to interface to R from our higher level representations for data, then you're one step closer to what you need. (And, looked at it another way, it creates the possibility of you writing new reduction functions to clean the raw data appropriately for your purposes, which makes new telemetry streams available to us, so the benefits go both ways.)
Creating the hackyExperiment and hackyR modules might be an excellent goal for Mike and you to work on this summer. Sounds like no more than a long weekend or two of hacking to me. :-)
Cheers, Philip
