It's not that I want to invent something newer or cooler, instead I have
 some problem persisting all required information in a flat scheme.

The major problem I encountered is that for each build, there could be
more than one failures, and each failure might have different failure
type, different failure message, and associated with different modules.

An example would be:
 * hackyKernel/file1.java checkstyle failure: line longer than 100.
 * hackyKernel/file2.java checkstyle failure: missing javadoc.
 * hackyKerenl/file3.java junit failure: assertion failed.
 * hackyStdExt/file4.java junit failure: security voilation.

The information I need to record in the above example is:

<BuildReport>
  <BuildContext StartTimeMillis="1107607869107"
                EndTimeMillis="1107609708935"
                Project="hacky2004-all"
                Configuration="Hackystat-JPL"
                StartType="CruiseControl-Auto" />
  <BuildResult BuildFailed="true" CheckstyleRunned="true"
               CompilationRunned="true" UnittestRunned="true">
     <Failure ModuleName="hackyKernel" FailureType="Checkstyle"
              Message="file 1 line longer than 100"/>
     <Failure ModuleName="hackyKernel" FailureType="Checkstyle"
              Message="file 2 missing javadoc"/>
     <Failure ModuleName="hackyKernel" FailureType="JUnit"
              Message="file 3 assertion failed"/>
     <Failure ModuleName="hackyStdExt" FailureType="JUnit"
              Message="file 4 security violation"/>
  </BuildResult>
</BuildReport>

It's really hard to figure out what should go to "failure type",
"failure message" fields, and what key-value pairs should stay in
"additional information" field. I though the persistence details are
hidden by SDT, we are safe so long as it knows how to handle them.

I want to use the existing flat persistence scheme (much simpler), but I
just don't know how. Can somebody help me?

Thanks.

Cedric







Philip Johnson wrote:
Hi Cedric,

When I logged in to the server as hackystat-l to see if build sensor data
on last night's failure was sent to the server, I noticed that you've gone
in a direction with respect to your sensor data representation that is an
important topic for discussion.

In a nutshell, what appears to have happened is that you've bailed on the
SDT representation of Build.  In other words, there's no values being
provided for the required fields "result", "failureType", or
"failureMessage".  Instead, all of the information is being provided as a
XML string in the "data" field.  This is a violation of the specification,
which states that the Build SDT "data" field is supposed to consist of
key-value pairs, not XML:

<http://hackystat.ics.hawaii.edu/hackystat/docbook/apbs04.html>

There are several problems with the direction you're heading.  The reason
for having SDTs in the first place is to define a consistent structure for
the data so that anyone analyzing the data (either inside of hackystat or
as an external tool) knows what to expect and where to get it.  Your
approach defeats this.  Instead of a well-defined structure, there is
instead an arbitrary XML string without any constraints or semantics
attached to it.  (There are semantics attached to the Build SDT--it's the
link above. There are also constraints, but you've worked around them :-).

You could add some syntax constraints to your approach by providing the
DTD, but that's essentially implementing a brand new, parallel structure
for sensor data to the current one.  That seems redundent, complicated, and
confusing.

It's also important to note that XML is nothing more than a hierarchical
set of key-value strings with a commonly accepted syntactic sugar.  The SDT
is a nothing more than a two-level hierarchical set of key-value strings.
In other words, they are representationally extremely close to each other.

In looking at your current representation, you don't have appear to have
any deep hierarchies, so the question is, what motivated you to bail on the
current SDT specification?

I am going to conjecture that there are two short-comings of the current
SDT implementation that you're trying to work around:

(1) It's quite hard to evolve an SDT.  If you change an SDT's structure,
all of the current data becomes unusable, and you have to start from
scratch.

(2) Putting/retrieving data from a "data" field is not trivial. You need to
basically hand-write the code to parse the key-value pair string, generate
the HashMap, put the HashMap back into its serialized form, etc.

I've known about these inadequacies for a couple of years now, but have
left them on the back burner.

What I want to suggest is that your cure, while perhaps convenient for you
in the short-term,  is ultimately worse than the disease in the long term,
because the result is a representational mishmash that will be confusing
and irrational to new people trying to understand the data in the system.

In the best of all possible worlds, the way SDTs are supposed to work is as
follows:

- The defined fields are supposed to specify the "required" data that every
sensor for every tool should provide and generate in a comparable fashion.

- The "data" field is supposed to support "optional" data that is tool or
context specific.

The problem is that when a new SDT is under development, as with the Build
data, it is hard to always be exactly right at the beginning about what
should be required and what should be optional.  As Burt has been finding
out with the Issue SDT, it's painful to make a change, you have to
essentially delete all of the old data.

So, what I'd like to propose is as follows:

(1) Cedric backs out of this direction, reverting to the standard form for
representing sensor data.

(2) I will start work on enhancements to the SDT implementation to better
support evolution.  The basic idea will be to allow people to add and
delete fields in the SDT definition without invalidating data already on
the server. It will also include a new, implicitly defined field (like
tstamp and tool) called something like "keyvaluepairs" that will provide an
API for easy setting/getting of optional values.  This will eliminate the
need for every SDT to have a "data" field.

Let me know what you think about this.  Cedric, please let me know if there
are other issues that I should be considering in this discussion.

Cheers,
Philip

Reply via email to