[Chandler-dev] Sharing format API proposal

Phillip J. Eby Mon, 28 Aug 2006 09:53:59 -0700

-----------------------------------------------------------------
Proposal: A Logical Format API for the Chandler Sharing Framework
-----------------------------------------------------------------

(nicely formatted HTML version can be found athttp://peak.telecommunity.com/DevCenter/ChandlerSharingModel

)


Overview
========

This document describes a proposed API for the Chandler sharing frameworkto allow individual parcels to support backward and forward-compatiblesharing, even when their domain model changes between parcel versions andthe clients doing the sharing do not all have the same version of theparcel installed.

The proposed API does this by allowing parcel developers to specify"sharing schemas" for their items. A sharing schema is a kind of logicaltransmission format, that breaks items down into simple records containingelementary data types that are easy to store or transmit for use by otherprograms.

Sharing schemas defined using this API will also be used to implement "dumpand reload" including schema evolution during upgrades or downgrades. As aparcel's item schema changes, its sharing schema(s) must be modified sothat data produced by previous versions of the parcel can still beimported. A parcel can also optionally provide support for exporting datain such a way that it can be read by older versions.

Typically, a parcel will provide its own sharing schema for the Kinds andAnnotations it contains. However, it's also possible for a parcel todefine one or more sharing schemas for other parcels that it depends on.

Parcel developers define a sharing schema by defining one or more recordtypes (using the [EMAIL PROTECTED] decorator), and one or more``sharing.Schema`` subclasses. The record types define the format of thedata to be shared, and the ``sharing.Schema`` classes provide code thatconvert items to records and vice versa. The ``sharing.Schema`` base classwill provide many utility methods to automatically handle common mappingpatterns, so that most schemas will include relatively little code.



Records and Record Types
========================

The API treats all data as "records", similar to the rows of a table in arelational database. Each record is of some "record type", and contains afixed number of fields. As in a relational database, each field can holdat most one value of one elementary data type, such as a number, string,date/time value, etc. A field may also hold a value of ``None``, which isconceptually similar to the "null" value in a relational database. Thereis also a second kind of "null" value, called ``sharing.NoChange``, thatcan be used to create "diff" or "delta" records that indicate only certainparts of the record are changed.

To define a record type, a parcel developer will write a short functionusing the [EMAIL PROTECTED] decorator. For example::


    @sharing.recordtype("http://schemas.osafoundation.org/pim/contentitem";)

def itemrecord(itsUUID, title, body, createdOn, description,lastModifiedBy):

        """Record type for content items; note lastModifiedBy is a UUID"""

The above defines a record type with 6 fields, named by the arguments tothe function. The string passed to ``recordtype()`` must be a unique URI,and will be used to allow other programs (such as Cosmo) to identifywhether a particular record type is known to or understood by it.

(Note that any unique URI is acceptable, including URIs of the form"uuid:...". That is, you need not have control of a domain name in orderto create your own unique URI, as you can use a UUID to create one.)



Type Checking and Conversion
----------------------------

In the simplest case, a recordtype function need not contain any code orreturn any value. In such a case, the argument names -- and defaultvalues, if any -- are sufficient to describe how the resulting record typeshould behave. However, if you wish to provide type checking or conversionof arguments, you will need to write a bit more code in a record type. Forexample, here's a new version of the example above, that does a bit morework to ensure it is used correctly::


    @sharing.recordtype("http://schemas.osafoundation.org/pim/contentitem";)

def itemrecord(itsUUID, title, body, createdOn, description,lastModifiedBy):

        """Record type for content items"""
        if isinstance(lastModifiedBy, schema.Item):
             lastModifiedBy = lastModifiedBy.itsUUID
        if not isinstance(lastModifiedBy, UUID):
            raise TypeError("lastModifiedBy must be an item or a UUID")
        return itsUUID, title, body, createdOn, description, lastModifiedBy

Note, however, that although a recordtype function can accept items asinput, it *cannot* return items as output. They must be converted toUUIDs, strings, numbers, or other elementary values. The EIM API isrecord-oriented, not object-oriented.



Inter-Record Dependencies
-------------------------

Just as in a relational database, records may contain references to otherrecords. For example, let's suppose that we want to have a record type torecord "tags" associated with a content item. And, we want tags to be akind of content item themselves. Here's what we would do::


    @sharing.recordtype("http://schemas.osafoundation.org/pim/tag";,
        itsUUID = contentitem.itsUUID
    )
    def tag(itsUUID, tagname):
        """Record type for a tag; "inherits" from itemrecord"""

    @sharing.recordtype("http://schemas.osafoundation.org/pim/contentitem/tags";,
        item = itemrecord.itsUUID, tag = tag.itsUUID
    )
    def tagging(item, tag):
        """Record type linking tags to items"""
        if isinstance(item, schema.Item):
            item = item.itsUUID
        if isinstance(tag, schema.Item):
            tag = tag.itsUUID
        if not isinstance(item, UUID):
            raise TypeError("must be an item or a UUID", item)
        if not isinstance(tag, UUID):
            raise TypeError("must be an item or a UUID", tag)
        return item, tag

Keyword arguments passed to the ``recordtype()`` decorator allow you todefine relationships between the fields in the record type being defined,and the fields of existing record types. As you can see above, we use``itemrecord.itsUUID`` and ``tag.itsUUID`` to refer to the ``itsUUID``fields of the ``itemrecord`` and ``tag`` record types. This creates adependency between the record types, and affects the order in which recordswill be imported or exported.

In the examples above, the order of record processing will always beginwith ``itemrecord``, followed by ``tag`` and ``tagging`` records. Morespecifically, before a ``tagging`` record is processed, any ``tag`` and``itemrecord`` records that have matching ``itsUUID`` fields will beprocessed. And before a ``tag`` record is processed, any ``itemrecord``with the same ``itsUUID`` will be processed first.



Recordtype Evolution
--------------------

As an application's schema changes, it may be necessary to add new fieldsto existing record types. This can be done, as long as:

1. New fields are added to the end of the existing fields in the recordtype function.

2. New fields must have a default value defined, and import code for therecord type must be able to handle a value of ``sharing.NoChange``. (Thisallows two Chandlers with different versions of a parcel to interoperate,even if one supports fields that the other does not.)

3. The record type's URI must not change, and all existing fields's namesmust remain the same and in the same order.

In other words, if you want to change the name, meaning, or position of anexisting field (or remove fields), you *must* create a new recordtype witha new URI to replace the old one. Such replacement also means that youmust create a new ``sharing.Schema`` in order to retain backwardcompatibility with older sharing clients. (This topic will be covered inmore detail below, since we haven't talked about ``Schema`` classes yet.)



Defining a Sharing Schema
=========================

By themselves, record types only define a *format* for sharing andimport/export. To complete a parcel's sharing definition, it must alsodefine how to convert between items and records, by creating a``sharing.Schema`` subclass. At minimum, such a subclass must include aunique URI, a version number, and a user-visible description::


    class ContentSchema(sharing.Schema):
        uri = "http://schemas.osafoundation.org/pim";
        version = 1
        description = _("Core content items")

The sharing system will use these attributes to determine what formats it"understands", and to allow users to select what version of a particularformat should be used for a particular "share", if applicable. (This is sothat users can choose an older version in order to collaborate with userswho don't have the latest version.)

It's important to note that unlike the Chandler application schema, notevery change to parcel's schema will require a change in its schema versionnumber. A ``sharing.Schema`` version number *only* needs to change when arecord type is to be replaced. That is, as long as you are only *adding*new record types, or adding new fields to existing record types (asdescribed in the previous section), there is no need for the version numberto change. That's because older code will still be able to read therecords and fields that it understands, and ignore the new record types andfields that it does not.

When a schema gets a new version number, you will often want to create asecond ``sharing.Schema`` subclass, to keep backward compatibility. Forexample, we might have::


    class OldContentSchema(sharing.Schema):
        uri = "http://schemas.osafoundation.org/pim";
        version = 1
        description = _("Core content items")
        #
        # code to read/write old format here
        # ...

    class NewContentSchema(OldContentSchema):
        version = 2
        #
        # code to read/write new format here
        # ...

This allows the parcel to support sharing (or import/export anddump/reload) of older formats. Any aspects of the old schema that areretained by the new one can potentially be inherited, eliminating the needfor duplicate code. (Notice that in the above example we're alsoinheriting the ``uri`` and ``description`` attributes.)



Export Methods
--------------

In order to function, a ``sharing.Schema`` subclass must define "exporter"and "importer" methods. Continuing our simple item/tags example, let'slook at some exporters::


    class ContentSchema(sharing.Schema):
        uri = "http://schemas.osafoundation.org/pim";
        version = 1
        description = _("Core content items")

        @sharing.exporter(pim.ContentItem)
        def export_contentitem(self, item):
            yield itemrecord(
                item.itsUUID, item.title, item.body, item.createdOn,
                item.description, item.lastModifiedBy
            )
            for t in item.tags:
                yield tagging(item, t)

        @sharing.exporter(pim.Tag)
        def export_tag(self, item):
            yield tag(item.itsUUID, item.tagname)

An exporter method is declared using [EMAIL PROTECTED](cls, ...)``, toindicate what class or classes of items are handled by thatmethod. Methods may be generators that yield records, or they can justreturn a list or other iterable object that yields records.

More than one exporter can be called for the same item. In the exampleabove, assuming that ``pim.Tag`` is a subclass of ``pim.ContentItem``, thenthe ``export_contentitem()`` method will be called before ``export_tag()``for each ``pim.Tag`` item being exported. The same principle applies forexport methods that apply to annotation classes; the export method for eachapplicable annotation class will be called. All of the records supplied bythe various export methods are then output.

Notice that this means that export methods must be written in such a waythat they do not produce duplicate records. Each export method shouldtherefore confine itself to writing records specific to the class(es) it isregistered for, and allowing the base class export methods to handle thebase classes' data.

If you subclass your ``sharing.Schema``, the subclass inherits all of theexport methods defined by the base class. If you wish to redefine theexport handling for some particular item or annotation class, you must doso by explicitly usinga new [EMAIL PROTECTED]()`` decoration; it is *not* sufficient to justoverride a method with the same name. (This is because for performancereasons, the lookup mechanism is not based on method names.)

Finally, you can declare more than one exporter for the same type in thesame ``sharing.Schema`` class; both will be called for items they apply to.



Importer Methods
----------------

Each ``sharing.Schema`` must declare "importer" methods to handle eachrecord type that it outputs. Here are some importers for the record typeswe defined previously::


    class ContentSchema(sharing.Schema):

        # ...

        @itemrecord.importer
        def import_contentitem(self, record):
            self.loadItemByUUID(
                record.itsUUID, pim.ContentItem,
                title = record.title,
                body = record.body,
                createdOn = record.createdOn,
                description = record.description,
                lastModifiedBy = self.loadItemByUUID(record.lastModifiedBy)
            )

        @tag.importer
        def import_tag(self, record):

self.loadItemByUUID(record.itsUUID, pim.Tag,tagname=record.tagname)


        @tagging.importer
        def import_tagging(self, record):
            the_item = self.loadItemByUUID(record.item)
            the_tag = self.loadItemByUUID(record.tag)
            the_item.tags.add(the_tag)

Notice that importer methods do not need to return a value; their solepurpose is to do whatever processing is required for the received records.

Only one importer can be registered for a given record type in a particular``Schema`` subclass. Importers registered by base classes are inherited insubclasses, unless overridden using the appropriate decorator in thesubclass. If you don't want to inherit *or* override support for aparticular record type, the record type can be listed in the``do_not_import`` attribute of the class, e.g.::


        do_not_import = sometype, othertype, ...


Utility Methods
---------------

The ``loadItemByUUID()`` method shown in the importer examples above is autility method provided by the ``sharing.Schema`` base class. It takes aUUID, an optional item or annotation class, and keyword arguments forattributes to set. The return value is an item of the specified class, ora plain ``schema.Item`` if no class was specified and the item didn'talready exist.

If an item with the given UUID already exists, it's returned. If a classwas specified, the item's kind is upgraded if necessary. For example, theimporter for the ``tag`` recordtype above invokes it like this::


    self.loadItemByUUID(record.itsUUID, pim.Tag, tagname=record.tagname)

If a ``pim.ContentItem`` of the right UUID exists, its kind is upgraded to``pim.Tag``. If it does not exist, it is created as a ``pim.Tag``. If anitem exists, and it has a kind that is a subclass of ``pim.Tag``, its kindwill not be changed. This algorithm allows items' types to be upgraded"just in time" as information becomes available.

If any of the attribute values supplied to ``loadItemByUUID()`` are``sharing.NoChange``, no change is made to the attribute. Similarly, ifthe UUID supplied to ``loadItemByUUID()`` is ``sharing.NoChange``,``sharing.NoChange`` is returned instead of an item.

Over time, there will be additional utility methods added to``sharing.Schema`` as common usage patterns are identified, to help reducethe amount of boilerplate code that needs to be written.



The Sharing Interface
---------------------

For each import or export operation to be performed, the sharing frameworkwill create instances of the appropriate ``sharing.Schema`` subclasses,passing in a repository view. So in our running example, the sharingframework would invoke ``ContentSchema(rv)`` to get a ``ContentSchema``instance with an ``itsView`` of ``rv``.

Then, depending on the operation(s) to be performed, the sharing frameworkwill call some of the following methods, which all have reasonable defaultimplementations provided by ``sharing.Schema``:


startExport()

Called before an export process begins, to allow the ``Schema``instance to do any pre-export setup operations. The default implementationdoes nothing, but can be overridden to initialize any data structures thatmight be needed during the export operation.


exportItem(`item`)

Called to export an individual item, it should return a sequence or bea generator yielding the relevant records for the supplied `item`. Thedefault implementation automatically looks up the registered export methodsand calls them, combining their results for the return value. This methodcan be overridden if you have a sufficiently complex special case to needit, or if you want to create a different way of registeringexporters. Note also that it's okay for this method to return an emptysequence.

(Note: the sharing framework must not make any assumptions about arelationship between the records returned, and the item passed in, sincesome of the records may be for *related* items. Also, a schema can choosenot to export records for individual items, but instead just track whichitems are to be exported and then provide all of the records when``finishExport()`` is called.)


finishExport()

Called after an export operation is completed, this method shouldreturn a sequence or be a generator yielding records. These records willbe exported along with any that were yielded by calls to``exportItem()``. The default implementation of this method just returnsan empty list, but can be overridden to return or yield records, andperhaps to tear down any temporary data structures created by``beginExport()`` or ``exportItem()``.


startImport()

Called before an import operation begins. The default implementationdoes nothing, but can be overridden to initialize any data structures thatmight be needed during the import operation.


importRecord(`record`)

Called for each record to be imported, in an order determinedautomatically by the declared inter-recordtype dependencies. (That is,this method will not be passed a record until all the records it depends onhave been imported first.) The default implementation of this methodsimply looks up and calls the relevant importer method.


finishImport()

Called after an import operation is completed. The defaultimplementation does nothing, but can be overridden to do any necessarycleanup or finish-out of the import process.

Notice that for both ``importRecord()`` and ``exportItem()``, there is norequirement that all processing for the given item or record take placeimmediately. Some complex schema changes (or complex schemas) may need orwant to simply keep track of what items are being exported or what recordsare being imported, and then do the actual importing or exporting in``finishImport()`` or ``finishExport()``.

Thus, the sharing framework must not assume that it has seen all recordsuntil all ``finishExport()`` methods (for each schema being exported) havebeen called. Similarly, it cannot assume that items in the repository arein their finished state until all of the active schemas' ``finishImport()``methods have been called.



Implementation Details and Open Issues
======================================


Processing "Diffs"
------------------

Most of the API and examples above are written in terms that assume amore-or-less "complete" and "additive" transfer of records, rather thanbeing difference-oriented.

It is assumed that ``sharing.NoChange`` will be used in record fields toindicate that the field's value has not changed, and that the sharingframework will be responsible for replacing records appropriately. Recordobjects will probably support subtraction to produce diffs, e.g.``diffRecord = newRecord - oldRecord``. It's possible that the sharing APIwill do this by exporting both old and new versions of the same collection,and then differencing the records that are in common, and perhaps creatingsome kind of "deletion" record for records found in the old, but not the new.

At present, however, the API as designed has no support for deletion assuch. For well-defined collections (such as the ``.tags`` attribute in theexamples), this could be handled by clearing the collection when the firstrecord is received, at the cost of re-transmitting all members of thecollection. The alternative possibility is to never delete items fromcollections, only add. (Which is what the above examples do; i.e., tagsare always added, and items are always created or updated, but nothing isever deleted.)



Key Management
--------------

The proposed API doesn't have a way to specify what fields of a record are"keys" or are expected to be unique, except indirectly. Inter-recorddependencies define some keys by implication, in that the depended-on fieldmust be unique in order for a dependency to have meaning.

However, producing diffs for a record requires that the record know of oneor more fields that produce a "primary key" in database terminology,because a difference record must always contain enough information for thereceiver to identify what the difference is to be applied to!

At this point, it's not clear to me if we will need some special way todesignate a primary key. One obvious way to do it would be to assume thatthe first field is always the primary key, except that this doesn't workfor records like the ``tagging`` example, which effectively have *all*their fields as part of the primary key.



Type Information
----------------

Currently, there is no way to define or look up what types are used in whatfields, nor is there any formal definition of what types areacceptable. This is a big gaping hole in the current proposal that must beremedied before we can expect any sort of dependable interoperability (e.g.w/Cosmo). For now, we are punting on this until we get a better idea ofwhat's actually needed.

This gap in the proposal also means that we aren't in a position to e.g.define a bunch of record types to describe other record types. This kindof meta-description is important for being able to define anextensible/discoverable sharing format between Chandler and Cosmo.



Multiple Inheritance
--------------------

There are a few quirks regarding multiple inheritance. First, I think thatwe're going to have to prohibit a ``sharing.Schema`` class from inheritingfrom more than one other ``sharing.Schema`` class, in order to avoidpossible ambiguities as to what inherited importers or exporters should beinvoked when both base classes have different ones defined, and thesubclass doesn't override them.

Second, there is a peculiar corner case that can arise when sharing databetween two machines, when multiple parcels and multiple inheritance areinvolved. Suppose that there are two parcels "a" and "b" containingclasses "A" and "B" respectively, both of which are subclasses of``pim.Item``. And then there is a parcel "c", containing class "C", whichinherits from both "A" and "B".

Let us further say that machine 1 has all three parcels installed, butmachine 2 has only parcels "a" and "b". As long as these two machines areonly sharing instances of "A" and "B", everything will be fine, but ifmachine 1 transmits a "C" instance to machine 2 there will be a problem.

When machine 2 tries to process the records related to ``pim.Item`` or to"A" instances, everything will work correctly. However, the "C" instancewill have created both "A" and "B" records, making it impossible for``loadItemByUUID()`` to find a suitable kind. Morgen and I discussed thepossibility of having it simply synthesize one, but this could produce someproblems of its own, in that the Chandler UI might not know how tocorrectly display this peculiar A/B hybrid, without additional informationthat can only be found in parcel "c" -- which machine 2 does not have.

For the first version, we will probably have to have some kind of kludge todetect this situation and handle it -- but precisely *how* we will handleit is still open to investigation. We may have to create the problem firstin order to get a better handle on it.



Schema Registry and Selection
-----------------------------

``sharing.Schema`` classes and [EMAIL PROTECTED] objects will have tobe part of a parcel's persistent data, stored in the repository at the sametime that the parcel's kinds and annotations and so forth areinitialized. The sharing parcel will probably have some kind of persistentobject(s) stored in the repository that reference schemas and index them bytheir supported record types and kinds, so that the sharing framework canlook them up.

The exact nature of these data structures is currently undefined. The datastructures needed are dependent on how schemas will need to be selected bythe sharing framework, so it's likely that a first cut implementation ofthe API won't actually create any, and rely on the sharing framework tojust explicitly select what schema(s) to use for a particular share.

The selection strategy is further complicated by the possibility that morethan one schema might be offering to produce or consume records of the samerecord type.

And last, but not least, due to the persistent nature of schema classes andrecordtype objects, it's likely that the Chandler application will need toeither set aside another parcel to contain the core types' sharing schema,or else define that schema within the sharing parcel itself. (Otherwise,we would be introducing circular parcel dependencies between the core typesand the sharing parcel.)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

[Chandler-dev] Sharing format API proposal

Reply via email to