[HACKYSTAT-DEV-L] RFC: Project data membership expressions

Philip Johnson Wed, 07 Dec 2005 21:49:05 -0800

Greetings, all,

I've been studiously avoiding any Big Thoughts until after 7.0 was out the door, but nowthe shackles have been thrown off!

So. I've been thinking about Projects. First off, let's recall that Projects are a wayof defining related sets of raw sensor data in a Hackystat repository. We currentlydefine a related set of raw sensor data with an implicit "AND" of three conditions: (1) aset of developers (who must confirm membership); (2) a time interval (within which thesensor data must have been received; and (3) a set of Workspaces (which provide a"location" for the sensor data).

Let's also recall that Workspaces serve a very honorable purpose in Hackystat: they allowgroups of developers to work together on different platforms with different installationsof source directories and have the system be able to tell when developers are working onthe same file. There's nothing wrong with Workspaces per se.

There is, however, a problem with the way we define Projects as the AND of (1), (2), and(3). The problem is that while this approach worked fine in the beginning when we hadrelatively simple forms of raw sensor data, we are increasingly running into morecomplicated kinds of sensor data. Two examples:

(1) The famous Unit Test sensor data problem. When running unit tests from a jar orbinary distribution, we no longer know the source directory that the code came from, sowe no longer have a Workspace. The solution was Workspace maps, which have been found tobe (a) brittle, and (b) complex. Currently, for example, someone sending Unit Test datacan't get that data associated with a Project unless they run a size counter! Thattotally sucks.

(2) The less famous BrowserURL sensor data problem. Some folks have wanted a sensor fortheir browser that could record when they were looking at documentation. While one couldimagine a sensor data type with "URL" as a required field, it is not at all clear how totransmogrify that into a Workspace so that the data could be associated with a Project.

In the past, we've toyed with solutions involving specifying the project name on theclient side and sending it along with the raw data. That has proven to be a very badsolution. For example, it does not that sensor data to be associated with any otherprojects that might be defined in the future.

At an abstract level, what our current Project definition mechanism does is create a"Project Data Membership Expression" of something like the following:


(and
 (or (sensor-data-owner = "[EMAIL PROTECTED]")
     (sensor-data-owner = "[EMAIL PROTECTED]"))
 (sensor-data-start-date = "10-Nov-2005")
 (sensor-data-end-date = "undefined")
 (or (sensor-data-workspace = "hackyCore_Build")
     (sensor-data-workspace = "hackyCore_Kernel")))

Abstractly, each sensor data record in the repository is tested against that expression,and if the expression evaluates to true, then that sensor data is part of that project.Of course, we are smart about the way we "evaluate" this expression so that we don'tactually traverse the entire repository!

What I'm proposing is to enhance the Project definition mechanism with the ability todefine "Membership Expressions" that would enable us to indicate that a given piece ofsensor data should be considered part of a Project using properties of the sensor dataentry other than its owner, timestamp, and workspace. Given the right set of operators,we should be able to provide a simple, yet expressive way of associating sensor data toProject that overcomes our current problems. My idea would be to retain the currentmember definition approach (since we need to do the whole confirmation email routine),retain the start/end specification (since that's the nicest way to do it), make workspaceselection _optional_, and then add a textarea in which someone could type in a "ProjectData Membership Expression" (very similar to the "Expert" telemetry analysis mode).There is an implicit "OR" between the Workspace and PDME fields--if the sensor datasatisfies the Workspace test, it's in regardless of whether it satisfies the PDME test.

So, for example, how would this approach solve the famous Unit Test sensor data problem?Well, for the case of the Hackystat project, we could supply the following expression:


(and (isSensorDataType("UnitTest"))
    (fieldStartsWith("classname", "org.hackystat")))

The syntax probably needs some work, but the basic idea is that we have an operatorcalled "isSensorDataType" which evaluates to true if the data item is of that type, andanother called "fieldStartsWith" that takes two arguments, the name of the field, and thestring to match against the string.

I claim this solves the problem of Unit Tests by stating that a unit test sensor dataentry is part of the Hackystat project if it contains a field (either required oroptional) called "classname" and if its String value has the prefix "org.hackystat".


In the case of the Browser URL, we could supply something like the following:

(and (isSensorDataType("BrowserUrl"))
    (fieldStartsWith("url", "http://java.sun.com/";)))

Or whatever.

A final idea: with this kind of approach, it probably requires some way to get feedbackon the sensor data that is 'matched' by an expression. I am imagining an analysis inwhich you can specify a sensor data type, and an interval, and the analysis will list allof the sensor data for that time interval with that type and for each entry, whichProjects were matched against that data. This would allow you to create an expression,then run this analysis to see if the appropriate sensor data was matched against it, thenedit the definition, and so forth.


So, some questions for discussion:

- Does this seem like a good idea to pursue? What issues can you see?

- Can you provide any other scenarios in which the current Project definition mechanismdoesn't work well, so that we can see if this approach would address the difficulties?


Cheers,
Philip

[HACKYSTAT-DEV-L] RFC: Project data membership expressions

Reply via email to