Hi, Phillip
This all makes sense ... I imagine the API would provide typedefs for
common non-primitive types, like signed integer, or floating-point.
Out of curiosity: why the restrictions on length of Bytes and Text
(and the size of Integer)?
--Grant
On 22 Sep, 2006, at 10:52, Phillip J. Eby wrote:
Last week, Brian Moseley, Ted, Morgen, and I met to work out a
basis for defining interoperable data types between Cosmo and
Chandler. I'm in the process of incorporating this new information
into the API proposal, and should have an updated version of it
soon. In the meantime, here's a recap of what we settled on at the
meeting, and how I see it being incorporated into the API proposal.
Primitive Types
---------------
We ended up deciding on five "primitive" data types:
* Bytes[length], where the maximum length must be specified and it
must be 1024 or less
* Text[length], where the maximum length (in bytes of UTF-8
encoding) must be specified and it must be 1024 or less
* Lob, a blob of arbitrary-length data. Unlike Chandler repository
lobs, this type does *NOT* include encoding or mime-type
information; these must be specified as separate fields if needed.
* Integer, an unsigned 32-bit integer
* Datetime, a date and time value with a timezone name *and a UTC
offset*. There will be a timezone name reserved for "local" time.
(The UTC offset ensures that the time's meaning is unambiguous, in
the event that two systems have a different definition for the same
timezone name, due to e.g. changes in the timezone database.)
Type Aliasing or "Typedefs"
---------------------------
The system will allow for extension of these primitive types via
"type defs". That is, you could define a "UUID" type as having a
representation of Bytes[16] or Text[36]. So the metadata
describing a schema will include a URI to define the "meaning" of
the data that is represented. Borrowing from my previous example,
if we have a record type defined thus in Chandler:
@sharing.recordtype("URI for 'itemrecord'")
def itemrecord(itsUUID, title, body, createdOn, description,
lastModifiedBy):
# details omitted
We might represent the type information as a set of EIM records
like this:
("URI for 'itemrecord'", "itsUUID", "URI for 'UUID'",
"Bytes", 16)
("URI for 'itemrecord'", "title", "",
"Text", 256)
("URI for 'itemrecord'", "body", "",
"Lob", 0)
("URI for 'itemrecord'", "createdOn", "",
"Datetime", 0)
("URI for 'itemrecord'", "description", "",
"Text", 1024)
("URI for 'itemrecord'", "lastModifiedBy", "URI for 'UUID'",
"Bytes", 16)
Substituting the various "URI for" bits with appropriate URIs. The
idea here is that types that have no special semantics beyond those
of the primitive represenation, don't need a URI.
This idea of separating a type's *meaning* from its
*representation* means that EIM-based applications can trade data
without *needing* to understand it, but being able to provide
better support for types that they do understand.
API Changes/Additions
---------------------
Here are my current ideas for incorporating this type information
into the API.
First, I would move type and dependency information to the default
values of the record type declaration, so that to do the above, we
might do something like this:
@sharing.recordtype("URI for 'itemrecord'")
def itemrecord(
itsUUID = schema.UUID,
title = sharing.TextType(256),
body = sharing.LobType,
createdOn = sharing.DateType,
description = sharing.TextType(1024),
lastModifiedBy = schema.UUID,
):
...
You'll notice there's a mix of schema.* and sharing.* API calls
here; the idea is that sharing would provide type constructors for
the primitive types, and there would be standard representations
registered for schema types that can be unambiguously defined. For
example, schema.UUID can have a representation defined as
sharing.BytesType(16, "...some URI..."). There would be a
registration system to allow mapping schema types to sharing types,
e.g.:
sharing.typedef(schema.UUID, sharing.BytesType(16, "...some
URI..."))
So, from then on, using 'schema.UUID' to define a field type would
"do the right thing".
The type constructors (BytesType, DateType, LobType, TextType, and
IntType) would all accept arguments to set the type's URI, size,
and converters to translate the native type (e.g. UUIDs) to and
from the primitive representation (e.g. bytes). So, for example,
one might actually do the above type registration as:
sharing.typedef(
schema.UUID,
sharing.BytesType(
size=16, uri="...some URI...",
repr=uuid_to_bytes, eval=bytes_to_uuid
)
)
Where uuid_to_bytes and bytes_to_uuid are appropriate conversion
functions. This then allows the EIM API to serialize and
deserialize records using a parcel's preferred datatypes, and helps
minimize the amount of coding that someone has to do to represent
common data types in their sharing schema.
In addition to being able to register type aliases like this, there
should also be support for specifying types by referring to fields
provided by other record types
Open Issues
-----------
* The record type I've been using as an example above should
probably actually define the "lastModifiedBy" field's type as being
a reference to the "itsUUID" field: a self-referential dependency.
I don't currently have a way to express this.
* The metadata format example doesn't include field-to-field
references either for the same or different record types, but it
needs to.
* There's still no way to express what field(s) represent a
record's primary key. In the examples we've played with so far,
this tends to be either a UUID or the entire record is its primary
key, but I'm not sure that other combinations can't arise.
(Primary key definition is needed in order to implement a "diff" or
"delta" mechanism for transmitting incremental updates.)
* A new potential issue is that of type representation changes. If
you change the definition of a type or its representation between
schema versions, you could alter the schema in an incompatible
way. I'm not sure that this is really a *new* issue, just that the
type aliasing machinery might make it easier to make this mistake.
I need to give this some more thought; suggestions are welcome.
In general, actually, any feedback or thoughts on the open issues
(or the current state of the API proposal in general) would be
useful. Thanks.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "chandler-dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/chandler-dev