[Chandler-dev] Type Definitions for sharing format API

Phillip J. Eby Fri, 22 Sep 2006 10:53:43 -0700

Last week, Brian Moseley, Ted, Morgen, and I met to work out a basis fordefining interoperable data types between Cosmo and Chandler. I'm in theprocess of incorporating this new information into the API proposal, andshould have an updated version of it soon. In the meantime, here's a recapof what we settled on at the meeting, and how I see it being incorporatedinto the API proposal.


Primitive Types
---------------

We ended up deciding on five "primitive" data types:

* Bytes[length], where the maximum length must be specified and it must be1024 or less

* Text[length], where the maximum length (in bytes of UTF-8 encoding) mustbe specified and it must be 1024 or less

* Lob, a blob of arbitrary-length data. Unlike Chandler repository lobs,this type does *NOT* include encoding or mime-type information; these mustbe specified as separate fields if needed.


* Integer, an unsigned 32-bit integer

* Datetime, a date and time value with a timezone name *and a UTCoffset*. There will be a timezone name reserved for "local" time. (TheUTC offset ensures that the time's meaning is unambiguous, in the eventthat two systems have a different definition for the same timezone name,due to e.g. changes in the timezone database.)



Type Aliasing or "Typedefs"
---------------------------

The system will allow for extension of these primitive types via "typedefs". That is, you could define a "UUID" type as having a representationof Bytes[16] or Text[36]. So the metadata describing a schema will includea URI to define the "meaning" of the data that is represented. Borrowingfrom my previous example, if we have a record type defined thus in Chandler:


@sharing.recordtype("URI for 'itemrecord'")
def itemrecord(itsUUID, title, body, createdOn, description, lastModifiedBy):
    # details omitted

We might represent the type information as a set of EIM records like this:

("URI for 'itemrecord'", "itsUUID",        "URI for 'UUID'", "Bytes",   16)
("URI for 'itemrecord'", "title",          "",               "Text",   256)
("URI for 'itemrecord'", "body",           "",               "Lob",      0)
("URI for 'itemrecord'", "createdOn",      "",               "Datetime", 0)
("URI for 'itemrecord'", "description",    "",               "Text",  1024)
("URI for 'itemrecord'", "lastModifiedBy", "URI for 'UUID'", "Bytes",   16)

Substituting the various "URI for" bits with appropriate URIs. The ideahere is that types that have no special semantics beyond those of theprimitive represenation, don't need a URI.

This idea of separating a type's *meaning* from its *representation* meansthat EIM-based applications can trade data without *needing* to understandit, but being able to provide better support for types that they do understand.



API Changes/Additions
---------------------

Here are my current ideas for incorporating this type information into the API.

First, I would move type and dependency information to the default valuesof the record type declaration, so that to do the above, we might dosomething like this:


    @sharing.recordtype("URI for 'itemrecord'")
    def itemrecord(
        itsUUID        = schema.UUID,
        title          = sharing.TextType(256),
        body           = sharing.LobType,
        createdOn      = sharing.DateType,
        description    = sharing.TextType(1024),
        lastModifiedBy = schema.UUID,
    ):
        ...

You'll notice there's a mix of schema.* and sharing.* API calls here; theidea is that sharing would provide type constructors for the primitivetypes, and there would be standard representations registered for schematypes that can be unambiguously defined. For example, schema.UUID can havea representation defined as sharing.BytesType(16, "...some URI..."). Therewould be a registration system to allow mapping schema types to sharingtypes, e.g.:


    sharing.typedef(schema.UUID, sharing.BytesType(16, "...some URI..."))

So, from then on, using 'schema.UUID' to define a field type would "do theright thing".

The type constructors (BytesType, DateType, LobType, TextType, and IntType)would all accept arguments to set the type's URI, size, and converters totranslate the native type (e.g. UUIDs) to and from the primitiverepresentation (e.g. bytes). So, for example, one might actually do theabove type registration as:


    sharing.typedef(
        schema.UUID,
        sharing.BytesType(
            size=16, uri="...some URI...",
            repr=uuid_to_bytes, eval=bytes_to_uuid
        )
    )

Where uuid_to_bytes and bytes_to_uuid are appropriate conversionfunctions. This then allows the EIM API to serialize and deserializerecords using a parcel's preferred datatypes, and helps minimize the amountof coding that someone has to do to represent common data types in theirsharing schema.

In addition to being able to register type aliases like this, there shouldalso be support for specifying types by referring to fields provided byother record types



Open Issues
-----------

* The record type I've been using as an example above should probablyactually define the "lastModifiedBy" field's type as being a reference tothe "itsUUID" field: a self-referential dependency. I don't currently havea way to express this.

* The metadata format example doesn't include field-to-field referenceseither for the same or different record types, but it needs to.

* There's still no way to express what field(s) represent a record'sprimary key. In the examples we've played with so far, this tends to beeither a UUID or the entire record is its primary key, but I'm not surethat other combinations can't arise. (Primary key definition is needed inorder to implement a "diff" or "delta" mechanism for transmittingincremental updates.)

* A new potential issue is that of type representation changes. If youchange the definition of a type or its representation between schemaversions, you could alter the schema in an incompatible way. I'm not surethat this is really a *new* issue, just that the type aliasing machinerymight make it easier to make this mistake. I need to give this some morethought; suggestions are welcome.

In general, actually, any feedback or thoughts on the open issues (or thecurrent state of the API proposal in general) would be useful. Thanks.


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

[Chandler-dev] Type Definitions for sharing format API

Reply via email to