[cc'd to Chandler-Dev because I've been meaning to post something about
information models anyway, so I might as well start with *something*]
At 01:11 PM 7/6/2006 -0700, Morgen Sagen wrote:
On Jun 30, 2006, at 12:08 PM, Phillip J. Eby wrote:
At 02:02 PM 6/29/2006 -0700, Morgen Sagen wrote:
Hey Phillip,
Given your ideas about the information model and the needs of dump
and reload, would Google's data API format suffice?
See their 'Kinds' document:
http://code.google.com/apis/gdata/common-elements.html
and especially this section for an example:
http://code.google.com/apis/gdata/common-elements.html#gdEventKind
It's just an extension of Atom XML schema, and by starting with
Google's schema we'd instantly get interoperability with a useful
service.
It's interesting, but it doesn't have a uniform or elementary
information model. Notice, for example, the embedded iCalendar
data in gd:recurrence. I agree that being able to share in this
format seems useful for interoperability purposes, but it doesn't
appear to solve our other issues. (Note also the idiosyncratic
overlap in semantics between gd:recurrence and
gd:recurrenceException.)
I would say, though, that if this is the kind of thing you want to
be able to do, it's probably a good idea for me to look at it and
see what could be done to reduce it to an elementary representation.
What do you mean by "elementary"? I thought the way they store
icalendar content in the gd:recurrence field was strange at first,
but actually it would probably be convenient since Jeffrey's vobject
lib groks icalendar.
I'm saying that the gd:recurrence stuff means there's no information model,
it's just a data format -- in fact it's *two* data formats. :)
The difference would be that in a uniform information model, all the facts
represented by the icalendar data would be represented in the same way as
all other facts represented by the model. Or conversely, all the other
facts would be represented in vobject form.
Some examples of informational models are the relational model, the
LDAP/x.500 directory model, and the XML document model. XML's information
model consists of hierarchies, text, elements, and attributes. LDAP's
model is a hierarchy of named objects, with multivalued text
attributes. The relational model is tables of atomic values.
Of course, this is a simplistic summary, because you can create many
information models *in* XML, by restricting expressiveness to provide
meaning. You can also express relational-like models in XML or vice
versa. Any of these information models is sufficient to express ideas from
the others.
The Chandler information model can be described as the meta-schema that
defines what schema we can express in Chandler. The Schema API
documentation could be viewed as a summary of this information model or
meta-schema.
However, the information model that we use in Chandler is way too rich for
an interchange format, which should be simpler and more, well,
*elementary*, if it's to be robust in the face of schema changes. So
although we could define an information model that matches the one we use
for Chandler itself, this would just be moving the schema evolution
problems around and not solving them.
I would suggest we define a model based on a restricted subset of the
relational model, but possibly *expressed* using XML namespaces to
represent the "fields" of the different "tables". The reason I say
"relational" rather than just saying "XML" is that the relational model
contains some key ideas that XML does not. For example, the relational
model insists that individual data values be elementary, atomic, and
normalized, without nesting or hierarchy. I think that these are important
qualities for data interchange and upgradeability, because the hierarchy
you write out with may not be the ideal hierarchy to read in, when a schema
changes or across implementation boundaries (i.e. Chandler vs. Cosmo/Scooby).
Another such important quality is discoverability - it should be possible
to map from e.g. namespace URLs to handlers, if namespace URLs are being
used to identify logical "tables".
The key here is that the information model is
representation-independent. Whether we use binary, XML, YAML, or even
Python pickles to physically express the information model, there is still
a schema that defines the scope of what you can "say" in that model. Is
this making any more sense?
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev