[cc'd to Chandler-Dev because I've been meaning to post something about information models anyway, so I might as well start with *something*]

At 01:11 PM 7/6/2006 -0700, Morgen Sagen wrote:
On Jun 30, 2006, at 12:08 PM, Phillip J. Eby wrote:
At 02:02 PM 6/29/2006 -0700, Morgen Sagen wrote:
Hey Phillip,

Given your ideas about the information model and the needs of dump
and reload, would Google's data API format suffice?

See their 'Kinds' document:

   http://code.google.com/apis/gdata/common-elements.html

and especially this section for an example:

   http://code.google.com/apis/gdata/common-elements.html#gdEventKind

It's just an extension of Atom XML schema, and by starting with
Google's schema we'd instantly get interoperability with a useful
service.

It's interesting, but it doesn't have a uniform or elementary
information model.  Notice, for example, the embedded iCalendar
data in gd:recurrence.  I agree that being able to share in this
format seems useful for interoperability purposes, but it doesn't
appear to solve our other issues.  (Note also the idiosyncratic
overlap in semantics between gd:recurrence and
gd:recurrenceException.)

I would say, though, that if this is the kind of thing you want to
be able to do, it's probably a good idea for me to look at it and
see what could be done to reduce it to an elementary representation.

What do you mean by "elementary"?  I thought the way they store
icalendar content in the gd:recurrence field was strange at first,
but actually it would probably be convenient since Jeffrey's vobject
lib groks icalendar.

I'm saying that the gd:recurrence stuff means there's no information model, it's just a data format -- in fact it's *two* data formats. :)

The difference would be that in a uniform information model, all the facts represented by the icalendar data would be represented in the same way as all other facts represented by the model. Or conversely, all the other facts would be represented in vobject form.

Some examples of informational models are the relational model, the LDAP/x.500 directory model, and the XML document model. XML's information model consists of hierarchies, text, elements, and attributes. LDAP's model is a hierarchy of named objects, with multivalued text attributes. The relational model is tables of atomic values.

Of course, this is a simplistic summary, because you can create many information models *in* XML, by restricting expressiveness to provide meaning. You can also express relational-like models in XML or vice versa. Any of these information models is sufficient to express ideas from the others.

The Chandler information model can be described as the meta-schema that defines what schema we can express in Chandler. The Schema API documentation could be viewed as a summary of this information model or meta-schema.

However, the information model that we use in Chandler is way too rich for an interchange format, which should be simpler and more, well, *elementary*, if it's to be robust in the face of schema changes. So although we could define an information model that matches the one we use for Chandler itself, this would just be moving the schema evolution problems around and not solving them.

I would suggest we define a model based on a restricted subset of the relational model, but possibly *expressed* using XML namespaces to represent the "fields" of the different "tables". The reason I say "relational" rather than just saying "XML" is that the relational model contains some key ideas that XML does not. For example, the relational model insists that individual data values be elementary, atomic, and normalized, without nesting or hierarchy. I think that these are important qualities for data interchange and upgradeability, because the hierarchy you write out with may not be the ideal hierarchy to read in, when a schema changes or across implementation boundaries (i.e. Chandler vs. Cosmo/Scooby).

Another such important quality is discoverability - it should be possible to map from e.g. namespace URLs to handlers, if namespace URLs are being used to identify logical "tables".

The key here is that the information model is representation-independent. Whether we use binary, XML, YAML, or even Python pickles to physically express the information model, there is still a schema that defines the scope of what you can "say" in that model. Is this making any more sense?

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to