[Chandler-dev] Re: sharing format / dump and reload question

Phillip J. Eby Thu, 06 Jul 2006 14:48:30 -0700

[cc'd to Chandler-Dev because I've been meaning to post something aboutinformation models anyway, so I might as well start with *something*]


At 01:11 PM 7/6/2006 -0700, Morgen Sagen wrote:

On Jun 30, 2006, at 12:08 PM, Phillip J. Eby wrote:

At 02:02 PM 6/29/2006 -0700, Morgen Sagen wrote:

Hey Phillip,


Given your ideas about the information model and the needs of dump
and reload, would Google's data API format suffice?

See their 'Kinds' document:

   http://code.google.com/apis/gdata/common-elements.html

and especially this section for an example:

   http://code.google.com/apis/gdata/common-elements.html#gdEventKind

It's just an extension of Atom XML schema, and by starting with
Google's schema we'd instantly get interoperability with a useful
service.


It's interesting, but it doesn't have a uniform or elementary
information model.  Notice, for example, the embedded iCalendar
data in gd:recurrence.  I agree that being able to share in this
format seems useful for interoperability purposes, but it doesn't
appear to solve our other issues.  (Note also the idiosyncratic
overlap in semantics between gd:recurrence and
gd:recurrenceException.)

I would say, though, that if this is the kind of thing you want to
be able to do, it's probably a good idea for me to look at it and
see what could be done to reduce it to an elementary representation.


What do you mean by "elementary"?  I thought the way they store
icalendar content in the gd:recurrence field was strange at first,
but actually it would probably be convenient since Jeffrey's vobject
lib groks icalendar.

I'm saying that the gd:recurrence stuff means there's no information model,it's just a data format -- in fact it's *two* data formats. :)

The difference would be that in a uniform information model, all the factsrepresented by the icalendar data would be represented in the same way asall other facts represented by the model. Or conversely, all the otherfacts would be represented in vobject form.

Some examples of informational models are the relational model, theLDAP/x.500 directory model, and the XML document model. XML's informationmodel consists of hierarchies, text, elements, and attributes. LDAP'smodel is a hierarchy of named objects, with multivalued textattributes. The relational model is tables of atomic values.

Of course, this is a simplistic summary, because you can create manyinformation models *in* XML, by restricting expressiveness to providemeaning. You can also express relational-like models in XML or viceversa. Any of these information models is sufficient to express ideas fromthe others.

The Chandler information model can be described as the meta-schema thatdefines what schema we can express in Chandler. The Schema APIdocumentation could be viewed as a summary of this information model ormeta-schema.

However, the information model that we use in Chandler is way too rich foran interchange format, which should be simpler and more, well,*elementary*, if it's to be robust in the face of schema changes. Soalthough we could define an information model that matches the one we usefor Chandler itself, this would just be moving the schema evolutionproblems around and not solving them.

I would suggest we define a model based on a restricted subset of therelational model, but possibly *expressed* using XML namespaces torepresent the "fields" of the different "tables". The reason I say"relational" rather than just saying "XML" is that the relational modelcontains some key ideas that XML does not. For example, the relationalmodel insists that individual data values be elementary, atomic, andnormalized, without nesting or hierarchy. I think that these are importantqualities for data interchange and upgradeability, because the hierarchyyou write out with may not be the ideal hierarchy to read in, when a schemachanges or across implementation boundaries (i.e. Chandler vs. Cosmo/Scooby).

Another such important quality is discoverability - it should be possibleto map from e.g. namespace URLs to handlers, if namespace URLs are beingused to identify logical "tables".

The key here is that the information model isrepresentation-independent. Whether we use binary, XML, YAML, or evenPython pickles to physically express the information model, there is stilla schema that defines the scope of what you can "say" in that model. Isthis making any more sense?


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

[Chandler-dev] Re: sharing format / dump and reload question

Reply via email to