At 09:55 AM 7/7/2006 -0700, Morgen Sagen wrote:
On Jul 6, 2006, at 2:47 PM, Phillip J. Eby wrote:
The key here is that the information model is representation-
independent. Whether we use binary, XML, YAML, or even Python
pickles to physically express the information model, there is still
a schema that defines the scope of what you can "say" in that
model. Is this making any more sense?
I think so. For example, RDF* is an information model (more
specifically a graph data model) which can be represented in various
ways, be it RDF-XML or N-triples syntax.
Right. Part of the idea here is that having an explicitly-specified
information model is that other systems guarantee what they will support
storing, querying, and retrieving, in a way that still allows us to change
representation formats when the need arises.
The stack, then is:
* Application/domain model
* Information model
* Representation format
Parcel developers have to define a mapping from the application model to
and from the information model, and the sharing system in turn maps that to
the representation format. The information model needs to be super-stable;
it can evolve only by adding features and features cannot ever be removed,
for all practical purposes. The application model needs to be able to
evolve over time, and multiple representation formats need to be possible.
Actually, I guess I sort of left out a layer from the stack; really, it's:
* Application/domain model
* Versioned mapping to information model
* Information model
* Representation format
So the application developer must create the top two things, and create
additional mappings (or modify them in a backward-compatible way) when the
application schema changes.
The reason that the information model should be as elementary as possible
is that it's not the job of the information model to implement
application-level features, *and* it is almost impossible to remove
something from the information model later. You can't go back and say,
"er, RDF isn't triples any more, it's just singles". :) So, the
information model is basically the part you're nailing down as being
essentially fixed, so that the other parts can vary, to the extent that
they can still be mapped to or from the information model.
So, a hypothetical information model might be something like, "you can
store tuples of elementary types (int, string, float, datetime, unicode,
etc.), and each tuple is associated with a universally unique identifier of
some kind". This is a very simple model, but you can of course build
anything you like on top of it. It is also very easy to represent in an
SQL datastore, XML, flat files, pickles, you name it.
Indeed, you have many choices for representation within e.g. XML. For
example, assuming you have key relationships between these virtual "tables"
of tuples, you can use XML namespaces to reference the unique identifier,
and then glom all the related data from several "tables" into one XML
element's attributes. Different representations would have different
performance characteristics, of course, but the API would still be in terms
of the information model, rather than the concrete representation.
This is just an example of a possible information model, specified very
imprecisely. To be a real spec, we would need to define what we mean by
"int" and "string" and so on, and especially what the "etc." is. :) This
is not a big deal to nail down, but it's "standards"-type work and needs to
involve stakeholders from all the projects that want to use it.
The more interesting part is defining an API around the information model
(and to a certain extent, vice versa). Most of my thinking on this to date
has been about "dump and reload", meaning a mass dumping and reloading of
most objects in the repository. But if I understand correctly, the sharing
system is much more about incremental modification to items, so the API
demands are different.
An example: sharing needs to know if data has "changed", but whether it has
"changed" may depend on its meaning. The information model in essence
defines what a "change" looks like; it's not really whether the application
representation has changed. If you upgrade Chandler and the schema
changes, how do you know if an object has changed? By whether its
information-model representation (based on a particular,
uniquely-identified mapping) has changed. This isn't really important for
dump and reload, but it might be for sharing.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev