Re: [Chandler-dev] Re: sharing format / dump and reload question

Phillip J. Eby Fri, 07 Jul 2006 10:45:40 -0700

At 09:55 AM 7/7/2006 -0700, Morgen Sagen wrote:

On Jul 6, 2006, at 2:47 PM, Phillip J. Eby wrote:

The key here is that the information model is representation-independent. Whether we use binary, XML, YAML, or even Python
pickles to physically express the information model, there is still
a schema that defines the scope of what you can "say" in that
model.  Is this making any more sense?


I think so.  For example, RDF* is an information model (more
specifically a graph data model) which can be represented in various
ways, be it RDF-XML or N-triples syntax.

Right. Part of the idea here is that having an explicitly-specifiedinformation model is that other systems guarantee what they will supportstoring, querying, and retrieving, in a way that still allows us to changerepresentation formats when the need arises.


The stack, then is:

* Application/domain model
* Information model
* Representation format

Parcel developers have to define a mapping from the application model toand from the information model, and the sharing system in turn maps that tothe representation format. The information model needs to be super-stable;it can evolve only by adding features and features cannot ever be removed,for all practical purposes. The application model needs to be able toevolve over time, and multiple representation formats need to be possible.


Actually, I guess I sort of left out a layer from the stack; really, it's:

* Application/domain model
* Versioned mapping to information model
* Information model
* Representation format

So the application developer must create the top two things, and createadditional mappings (or modify them in a backward-compatible way) when theapplication schema changes.

The reason that the information model should be as elementary as possibleis that it's not the job of the information model to implementapplication-level features, *and* it is almost impossible to removesomething from the information model later. You can't go back and say,"er, RDF isn't triples any more, it's just singles". :) So, theinformation model is basically the part you're nailing down as beingessentially fixed, so that the other parts can vary, to the extent thatthey can still be mapped to or from the information model.

So, a hypothetical information model might be something like, "you canstore tuples of elementary types (int, string, float, datetime, unicode,etc.), and each tuple is associated with a universally unique identifier ofsome kind". This is a very simple model, but you can of course buildanything you like on top of it. It is also very easy to represent in anSQL datastore, XML, flat files, pickles, you name it.

Indeed, you have many choices for representation within e.g. XML. Forexample, assuming you have key relationships between these virtual "tables"of tuples, you can use XML namespaces to reference the unique identifier,and then glom all the related data from several "tables" into one XMLelement's attributes. Different representations would have differentperformance characteristics, of course, but the API would still be in termsof the information model, rather than the concrete representation.

This is just an example of a possible information model, specified veryimprecisely. To be a real spec, we would need to define what we mean by"int" and "string" and so on, and especially what the "etc." is. :) Thisis not a big deal to nail down, but it's "standards"-type work and needs toinvolve stakeholders from all the projects that want to use it.

The more interesting part is defining an API around the information model(and to a certain extent, vice versa). Most of my thinking on this to datehas been about "dump and reload", meaning a mass dumping and reloading ofmost objects in the repository. But if I understand correctly, the sharingsystem is much more about incremental modification to items, so the APIdemands are different.

An example: sharing needs to know if data has "changed", but whether it has"changed" may depend on its meaning. The information model in essencedefines what a "change" looks like; it's not really whether the applicationrepresentation has changed. If you upgrade Chandler and the schemachanges, how do you know if an object has changed? By whether itsinformation-model representation (based on a particular,uniquely-identified mapping) has changed. This isn't really important fordump and reload, but it might be for sharing.


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Re: [Chandler-dev] Re: sharing format / dump and reload question

Reply via email to