Hi Cedric,
When I logged in to the server as hackystat-l to see if build sensor data on last night's failure was sent to the server, I noticed that you've gone in a direction with respect to your sensor data representation that is an important topic for discussion.
In a nutshell, what appears to have happened is that you've bailed on the SDT representation of Build. In other words, there's no values being provided for the required fields "result", "failureType", or "failureMessage". Instead, all of the information is being provided as a XML string in the "data" field. This is a violation of the specification, which states that the Build SDT "data" field is supposed to consist of key-value pairs, not XML:
<http://hackystat.ics.hawaii.edu/hackystat/docbook/apbs04.html>
There are several problems with the direction you're heading. The reason for having SDTs in the first place is to define a consistent structure for the data so that anyone analyzing the data (either inside of hackystat or as an external tool) knows what to expect and where to get it. Your approach defeats this. Instead of a well-defined structure, there is instead an arbitrary XML string without any constraints or semantics attached to it. (There are semantics attached to the Build SDT--it's the link above. There are also constraints, but you've worked around them :-).
You could add some syntax constraints to your approach by providing the DTD, but that's essentially implementing a brand new, parallel structure for sensor data to the current one. That seems redundent, complicated, and confusing.
It's also important to note that XML is nothing more than a hierarchical set of key-value strings with a commonly accepted syntactic sugar. The SDT is a nothing more than a two-level hierarchical set of key-value strings. In other words, they are representationally extremely close to each other.
In looking at your current representation, you don't have appear to have any deep hierarchies, so the question is, what motivated you to bail on the current SDT specification?
I am going to conjecture that there are two short-comings of the current SDT implementation that you're trying to work around:
(1) It's quite hard to evolve an SDT. If you change an SDT's structure, all of the current data becomes unusable, and you have to start from scratch.
(2) Putting/retrieving data from a "data" field is not trivial. You need to basically hand-write the code to parse the key-value pair string, generate the HashMap, put the HashMap back into its serialized form, etc.
I've known about these inadequacies for a couple of years now, but have left them on the back burner.
What I want to suggest is that your cure, while perhaps convenient for you in the short-term, is ultimately worse than the disease in the long term, because the result is a representational mishmash that will be confusing and irrational to new people trying to understand the data in the system.
In the best of all possible worlds, the way SDTs are supposed to work is as follows:
- The defined fields are supposed to specify the "required" data that every sensor for every tool should provide and generate in a comparable fashion.
- The "data" field is supposed to support "optional" data that is tool or context specific.
The problem is that when a new SDT is under development, as with the Build data, it is hard to always be exactly right at the beginning about what should be required and what should be optional. As Burt has been finding out with the Issue SDT, it's painful to make a change, you have to essentially delete all of the old data.
So, what I'd like to propose is as follows:
(1) Cedric backs out of this direction, reverting to the standard form for representing sensor data.
(2) I will start work on enhancements to the SDT implementation to better support evolution. The basic idea will be to allow people to add and delete fields in the SDT definition without invalidating data already on the server. It will also include a new, implicitly defined field (like tstamp and tool) called something like "keyvaluepairs" that will provide an API for easy setting/getting of optional values. This will eliminate the need for every SDT to have a "data" field.
Let me know what you think about this. Cedric, please let me know if there are other issues that I should be considering in this discussion.
Cheers, Philip
