Last week, Brian Moseley, Ted, Morgen, and I met to work out a basis for defining interoperable data types between Cosmo and Chandler. I'm in the process of incorporating this new information into the API proposal, and should have an updated version of it soon. In the meantime, here's a recap of what we settled on at the meeting, and how I see it being incorporated into the API proposal.

Primitive Types
---------------

We ended up deciding on five "primitive" data types:

* Bytes[length], where the maximum length must be specified and it must be 1024 or less

* Text[length], where the maximum length (in bytes of UTF-8 encoding) must be specified and it must be 1024 or less

* Lob, a blob of arbitrary-length data. Unlike Chandler repository lobs, this type does *NOT* include encoding or mime-type information; these must be specified as separate fields if needed.

* Integer, an unsigned 32-bit integer

* Datetime, a date and time value with a timezone name *and a UTC offset*. There will be a timezone name reserved for "local" time. (The UTC offset ensures that the time's meaning is unambiguous, in the event that two systems have a different definition for the same timezone name, due to e.g. changes in the timezone database.)


Type Aliasing or "Typedefs"
---------------------------

The system will allow for extension of these primitive types via "type defs". That is, you could define a "UUID" type as having a representation of Bytes[16] or Text[36]. So the metadata describing a schema will include a URI to define the "meaning" of the data that is represented. Borrowing from my previous example, if we have a record type defined thus in Chandler:

@sharing.recordtype("URI for 'itemrecord'")
def itemrecord(itsUUID, title, body, createdOn, description, lastModifiedBy):
    # details omitted

We might represent the type information as a set of EIM records like this:

("URI for 'itemrecord'", "itsUUID",        "URI for 'UUID'", "Bytes",   16)
("URI for 'itemrecord'", "title",          "",               "Text",   256)
("URI for 'itemrecord'", "body",           "",               "Lob",      0)
("URI for 'itemrecord'", "createdOn",      "",               "Datetime", 0)
("URI for 'itemrecord'", "description",    "",               "Text",  1024)
("URI for 'itemrecord'", "lastModifiedBy", "URI for 'UUID'", "Bytes",   16)

Substituting the various "URI for" bits with appropriate URIs. The idea here is that types that have no special semantics beyond those of the primitive represenation, don't need a URI.

This idea of separating a type's *meaning* from its *representation* means that EIM-based applications can trade data without *needing* to understand it, but being able to provide better support for types that they do understand.


API Changes/Additions
---------------------

Here are my current ideas for incorporating this type information into the API.

First, I would move type and dependency information to the default values of the record type declaration, so that to do the above, we might do something like this:

    @sharing.recordtype("URI for 'itemrecord'")
    def itemrecord(
        itsUUID        = schema.UUID,
        title          = sharing.TextType(256),
        body           = sharing.LobType,
        createdOn      = sharing.DateType,
        description    = sharing.TextType(1024),
        lastModifiedBy = schema.UUID,
    ):
        ...

You'll notice there's a mix of schema.* and sharing.* API calls here; the idea is that sharing would provide type constructors for the primitive types, and there would be standard representations registered for schema types that can be unambiguously defined. For example, schema.UUID can have a representation defined as sharing.BytesType(16, "...some URI..."). There would be a registration system to allow mapping schema types to sharing types, e.g.:

    sharing.typedef(schema.UUID, sharing.BytesType(16, "...some URI..."))

So, from then on, using 'schema.UUID' to define a field type would "do the right thing".

The type constructors (BytesType, DateType, LobType, TextType, and IntType) would all accept arguments to set the type's URI, size, and converters to translate the native type (e.g. UUIDs) to and from the primitive representation (e.g. bytes). So, for example, one might actually do the above type registration as:

    sharing.typedef(
        schema.UUID,
        sharing.BytesType(
            size=16, uri="...some URI...",
            repr=uuid_to_bytes, eval=bytes_to_uuid
        )
    )

Where uuid_to_bytes and bytes_to_uuid are appropriate conversion functions. This then allows the EIM API to serialize and deserialize records using a parcel's preferred datatypes, and helps minimize the amount of coding that someone has to do to represent common data types in their sharing schema.

In addition to being able to register type aliases like this, there should also be support for specifying types by referring to fields provided by other record types


Open Issues
-----------

* The record type I've been using as an example above should probably actually define the "lastModifiedBy" field's type as being a reference to the "itsUUID" field: a self-referential dependency. I don't currently have a way to express this.

* The metadata format example doesn't include field-to-field references either for the same or different record types, but it needs to.

* There's still no way to express what field(s) represent a record's primary key. In the examples we've played with so far, this tends to be either a UUID or the entire record is its primary key, but I'm not sure that other combinations can't arise. (Primary key definition is needed in order to implement a "diff" or "delta" mechanism for transmitting incremental updates.)

* A new potential issue is that of type representation changes. If you change the definition of a type or its representation between schema versions, you could alter the schema in an incompatible way. I'm not sure that this is really a *new* issue, just that the type aliasing machinery might make it easier to make this mistake. I need to give this some more thought; suggestions are welcome.

In general, actually, any feedback or thoughts on the open issues (or the current state of the API proposal in general) would be useful. Thanks.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev

Reply via email to