Re: [HACKYSTAT-DEV-L] RFC: Evolutionary Sensor Data Types

Aaron Kagawa Wed, 02 Mar 2005 00:35:21 -0800

Hey Guys,

I'm taking a different approach in this email. The pessimistic approach. In
general I agree with everything in the design document. The following are
things that I think could be a problem.

- (Minor) As we all probably know, this is a hard problem. And this design
document did a good job of explaining the basic problems and fixes.
However, I think we will discover a lot more problems along the way.

- (Minor) The idea of Thread Safe Property Lists got me thinking about
different types information for a single SDT. For instance, for some reason
we start collecting LaTex FileMetric data. As stated in the paper,
FileMetric data from a LaTex Sensor will probably contain a different
Property List than FileMetric data from a LOCC sensor. Yet, when we do
processing on the Sensor Data Entries we will still have sort through all
types of FileMetric, because the SD entries are indexed by Day, SDT, and
User. This essentially will be if statement like if
(propertyList.containsKey("numberOfParagraphs") then do something with
LaTex generated FileMetric data. Or I suppose checking the Tool is a better
way of doing the same thing. Either way I think that is just a little
bogus. But, then again there's no way around that unless there are better
ways of indexing the data.

- (Major) The examples provided are pretty simple. I'm not sure Schema
Evolution and Data Reorganization will work for harder changes. For
example, I don't believe it would work for the latest Build SDT
improvement. Cedric mentioned that it is almost impossible to change the
old Build Sensor Data Entries to conform to the new design, because it
lacked the required information to make the data useful. Another example:
imagine that all this time we didn't have a runtime stamp in the coverage
SDT. Evolving the SDT to add a runtime stamp would definitely not work. The
Issue SDT wouldn't have worked either, because we moved from a event
triggered SDT to a snap-shot SDT (again we added a runtime stamp).

I wouldn't make the claim the eSDT would solve _all_ the problems of the
evolution of SDTs. To my knowledge we have gone through, or will go
through, three SDT evolutions; (1) the Build SDT, (2) the Issue SDT, and
(3) the Review SDT (takuya mentioned that he wants to add one attribute).
In my opinion the eSDT would not solve the Build SDT or the Issue SDT
upgrade. However, I'm fairly certain it would solve the Review SDT upgrade.

Again, it seems fairly obvious that simple changes will definitely work.
But, I think that large complex changes, where the new attributes are
critical to validity and usefulness of the SDT, will not work.

- (Major) Client upgrade inconsistency. The design document states that "we
must be able to process data received using any prior version of the SDT
that became publically available and used". I assume that you mean that we
are going to dynamically evolve outdated SDT data when it comes into the
server. When I read that, a big red flag appeared in my brain. I'm not sure
we want to get into the whole, "What version of the sensor are you running,
what does the log file look like, and I wonder how it got converted on the
server side." Imagine writing the DocBook section for that. Okay, I agree
that it is possible to 'process' outdated SDTs but, to 'effectively and
usefully process' outdated SDTs is another story. Again, lets look at real
examples. If you believe my claim that it is impossible to evolve the old
Build SDT to the new version, then what would be the point of processing
old Build SDT data?

I personally think that this is a "You Ain't Going To Need It" thing. I
don't see a problem with releasing a new SDT, converting all old versions
and then requiring the new SDT. That's one time shot fix, versus
dynamically evolving prior SDTs for eternity. I would claim that we should
only accept the current SDT definitions on the server side and send all old
SDT data to Bad Data. In fact, I think Philip changed his mind.

In the client update inconsistency in the Implementation Issues,
should the system support an alert mechanism if the old data is received
after the server side is updated? This would be not the kind of active
alert to let all users know, but just the users who sent the old data
after the server was updated.


Good idea! This is a natural enhancement of our current "Bad Data Alert",
which incidentally was the very first alert implemented for Hackystat back
in 2001.  With this enhancement, though, we will tell users if they need to
upgrade their sensors due to structural evolution (as opposed to notifying
them that there is some bug in the implementation).

To conclude, I think that eSDT is a great step forward.

thanks, aaron

Re: [HACKYSTAT-DEV-L] RFC: Evolutionary Sensor Data Types

Reply via email to