Last week, Brian Moseley, Ted, Morgen, and I met to work out a basis for
defining interoperable data types between Cosmo and Chandler. I'm in the
process of incorporating this new information into the API proposal, and
should have an updated version of it soon. In the meantime, here's a recap
of what we settled on at the meeting, and how I see it being incorporated
into the API proposal.
Primitive Types
---------------
We ended up deciding on five "primitive" data types:
* Bytes[length], where the maximum length must be specified and it must be
1024 or less
* Text[length], where the maximum length (in bytes of UTF-8 encoding) must
be specified and it must be 1024 or less
* Lob, a blob of arbitrary-length data. Unlike Chandler repository lobs,
this type does *NOT* include encoding or mime-type information; these must
be specified as separate fields if needed.
* Integer, an unsigned 32-bit integer
* Datetime, a date and time value with a timezone name *and a UTC
offset*. There will be a timezone name reserved for "local" time. (The
UTC offset ensures that the time's meaning is unambiguous, in the event
that two systems have a different definition for the same timezone name,
due to e.g. changes in the timezone database.)
Type Aliasing or "Typedefs"
---------------------------
The system will allow for extension of these primitive types via "type
defs". That is, you could define a "UUID" type as having a representation
of Bytes[16] or Text[36]. So the metadata describing a schema will include
a URI to define the "meaning" of the data that is represented. Borrowing
from my previous example, if we have a record type defined thus in Chandler:
@sharing.recordtype("URI for 'itemrecord'")
def itemrecord(itsUUID, title, body, createdOn, description, lastModifiedBy):
# details omitted
We might represent the type information as a set of EIM records like this:
("URI for 'itemrecord'", "itsUUID", "URI for 'UUID'", "Bytes", 16)
("URI for 'itemrecord'", "title", "", "Text", 256)
("URI for 'itemrecord'", "body", "", "Lob", 0)
("URI for 'itemrecord'", "createdOn", "", "Datetime", 0)
("URI for 'itemrecord'", "description", "", "Text", 1024)
("URI for 'itemrecord'", "lastModifiedBy", "URI for 'UUID'", "Bytes", 16)
Substituting the various "URI for" bits with appropriate URIs. The idea
here is that types that have no special semantics beyond those of the
primitive represenation, don't need a URI.
This idea of separating a type's *meaning* from its *representation* means
that EIM-based applications can trade data without *needing* to understand
it, but being able to provide better support for types that they do understand.
API Changes/Additions
---------------------
Here are my current ideas for incorporating this type information into the API.
First, I would move type and dependency information to the default values
of the record type declaration, so that to do the above, we might do
something like this:
@sharing.recordtype("URI for 'itemrecord'")
def itemrecord(
itsUUID = schema.UUID,
title = sharing.TextType(256),
body = sharing.LobType,
createdOn = sharing.DateType,
description = sharing.TextType(1024),
lastModifiedBy = schema.UUID,
):
...
You'll notice there's a mix of schema.* and sharing.* API calls here; the
idea is that sharing would provide type constructors for the primitive
types, and there would be standard representations registered for schema
types that can be unambiguously defined. For example, schema.UUID can have
a representation defined as sharing.BytesType(16, "...some URI..."). There
would be a registration system to allow mapping schema types to sharing
types, e.g.:
sharing.typedef(schema.UUID, sharing.BytesType(16, "...some URI..."))
So, from then on, using 'schema.UUID' to define a field type would "do the
right thing".
The type constructors (BytesType, DateType, LobType, TextType, and IntType)
would all accept arguments to set the type's URI, size, and converters to
translate the native type (e.g. UUIDs) to and from the primitive
representation (e.g. bytes). So, for example, one might actually do the
above type registration as:
sharing.typedef(
schema.UUID,
sharing.BytesType(
size=16, uri="...some URI...",
repr=uuid_to_bytes, eval=bytes_to_uuid
)
)
Where uuid_to_bytes and bytes_to_uuid are appropriate conversion
functions. This then allows the EIM API to serialize and deserialize
records using a parcel's preferred datatypes, and helps minimize the amount
of coding that someone has to do to represent common data types in their
sharing schema.
In addition to being able to register type aliases like this, there should
also be support for specifying types by referring to fields provided by
other record types
Open Issues
-----------
* The record type I've been using as an example above should probably
actually define the "lastModifiedBy" field's type as being a reference to
the "itsUUID" field: a self-referential dependency. I don't currently have
a way to express this.
* The metadata format example doesn't include field-to-field references
either for the same or different record types, but it needs to.
* There's still no way to express what field(s) represent a record's
primary key. In the examples we've played with so far, this tends to be
either a UUID or the entire record is its primary key, but I'm not sure
that other combinations can't arise. (Primary key definition is needed in
order to implement a "diff" or "delta" mechanism for transmitting
incremental updates.)
* A new potential issue is that of type representation changes. If you
change the definition of a type or its representation between schema
versions, you could alter the schema in an incompatible way. I'm not sure
that this is really a *new* issue, just that the type aliasing machinery
might make it easier to make this mistake. I need to give this some more
thought; suggestions are welcome.
In general, actually, any feedback or thoughts on the open issues (or the
current state of the API proposal in general) would be useful. Thanks.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev