I've probably written as much parcel XML as anyone, and don't
particularly like it, however, the vast majority of the parcel xml I've
written turns out to be creating instances, not creating schema, so I'm
much more concerned about how you intend to improve creating instances
that populate the repository during parcel loading when Chandler isn't
running. I almost never create new items programatically after the
repository is populated, instead I just make copies of existing items in
the repository, so I also don't care too much about the syntax of Item
construction when Chandler is running.
John
Phillip J. Eby wrote:
-------------------------------------
Defining Chandler Schemas with Python
-------------------------------------
Introduction
============
As many of you may know, I've for some time now been promoting the
idea of replacing parcel XML with Python code for defining item
schemas, and I created a proof-of-concept for this in the "Spike"
project, found under 'internals' in the Chandler CVS.
Since the PyCon sprints, it's my understanding that there's now a
broad and actionable consensus at OSAF that it is indeed desirable to
move to using Python syntax in place of XML for parcels' schema
definition. So, after working with Andi and Grant to get the
necessary infrastructure in place within Chandler, I'd like to present
my proposal for what the Python schema definitions will look like, how
migration might take place, and what new possibilities for Chandler
development these changes will enable.
If you haven't had a chance to look at Spike yet, you may find it
helpful to read at least the "Introduction" section of this document:
http://cvs.osafoundation.org/viewcvs.cgi/internal/Spike/src/spike/schema.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup
which presents a simple Python syntax for defining schemas. The
actual syntax used in Chandler will be different, but the above
document gives a good introduction to the concept, with lots of
working examples. (In fact, the document is designed for use with
Python's "doctest" module and is literally a part of Spike's unit
tests. As much as is practical, I'll be using this approach for the
changes to Chandler, so that the API will be documented and tested at
the same time as it's developed.)
You'll notice, by the way, that the documentation doesn't talk much
about Kinds, or names, paths, repository views, and parents. That's
because in Spike's API, you don't need any of these things in order to
create an Item. You just create the item, and until you take some
action to store it, it's simply an ordinary Python object.
How it will Work
================
Here's a snippet of XML from the parcel.xml of the osaf.contentmodel
package::
<Kind itsName="ContentItem">
<superKinds itemref="Item"/>
<classes
key="python">osaf.contentmodel.ContentModel.ContentItem</classes>
<description>Content Item is the abstract super-kind for
things like Contacts, Calendar Events, Tasks, Mail Messages, and
Notes. Content Items are user-level items, which a user might file,
categorize, share, and delete.</description>
<Attribute itsName="body">
<displayName>Body</displayName>
<type itemref="Lob"/>
<description>All Content Items may have a body to contain
notes. It's not decided yet whether this body would instead contain
the payload for resource items such as presentations or spreadsheets
-- resource items haven't been nailed down yet -- but the payload may
be different from the notes because payload needs to know MIME type,
etc.</description>
</Attribute>
Here's the corresponding code in the proposed schema API::
from application import schema # not sure if this is where it
will go
from repository.schema import Types
class ContentItem(schema.Item):
"""Base class for content items
A content item (such as a contact, note, photo, etc.) Content
objects are
user-level items that a user might file, categorize, share,
and delete.
"""
body = schema.One(Types.Lob,
displayName = "Body",
doc = """\
All Content Items may have a body to contain notes. It's
not decided
yet whether this body would instead contain the payload
for resource
items such as presentations or spreadsheets -- resource
items haven't
been nailed down yet -- but the payload may be different
from the notes
because payload needs to know MIME type, etc."""
)
The fundamental idea here is that Python class definitions replace
Kind elements, and Python property definitions replace Attribute
elements. Superkinds are defined by inheritance. Parcels are Python
packages. Standard Python "import" statements replace XML namespace
definitions.
This has several useful consequences. First, it makes item classes
independent of parcel loading, which means they're easy to unit test.
You can simply create instances of items in order to run tests on
them. Second, it means that content classes don't need getKind()
methods and other chicanery to get access to a Kind object, just to be
able to create instances. Indeed, in all the ways that matter, items
will just be normal Python objects until/unless you link them with
items that are already stored in the repository (at which time they
will become persistent).
This means routines that create new items will no longer need to know
what repository view the item is intended for. Instead, such routines
can simply create an instance of the appropriate class and return it
without further ado. As soon as the caller links the new item to a
persisted item (e.g. by setting an attribute), the new item will be
persisted as well. (This functionality will be made possible by the
"null view" and "view migration" features that Andi has added to the
repository.)
Code vs. Data
-------------
Sometimes when I describe the preceding, people wonder if this use of
Python means that we are giving up on being "data driven", or if we
will still be able to allow users to create kinds and attributes. No,
we are not giving up on data-driven, and we will be just as dynamic as
before.
If you're not familiar with Python's ultra-dynamic nature, it would
seem at first that writing code must be less flexible or less dynamic
than writing XML, but this is not at all the case. The Python code
for a schema definition is just a script that creates data objects.
These data objects are no different than the data objects you would
create by reading XML. The only technical difference is that the
Python code doesn't have to parse the XML first! (Of course, there
are aesthetic differences, too.)
Note also that just because some schema is defined by writing Python
classes, it doesn't stop Chandler from allowing users to create
attributes or kinds. Again, if you're used to more static languages
like Java or C++, it's natural to think of a class as something
fixed. But Python allows you to trivially create new classes on the
fly. For example::
def create_a_class(docstring,base_class=object):
class aNewClass(base_class):
__doc__ = docstring
return aNewClass
This function returns a new, distinct class object each time it's
called. Each returned class will have the name "aNewClass", but it
will be a distinct class object. (And you could change its name by
setting its ``__name__`` attribute, if you wanted to.)
If methods were defined in this "nested class" statement, they would
have access to any parameters that were passed to ``create_a_class``,
which would allow the methods to be customized for each new class
created. In effect, Python is its own macro language at this level.
Also note that there's no speed disadvantage here; the statements are
compiled only once (when the module is compiled), no matter how many
times you call the function and create new classes. They are not
compiled on the fly; the statements are just the same as any other
Python statements, and there is absolutely no observable distinction
between the dynamically created classes and "normal" classes, because
*all* Python classes are dynamically generated in exactly the same way!
So as you can see, Python is an extremely *fluid* language, and the
assumption that "code" is harder to change than data doesn't really
carry over from other languages. "Hard coding" *isn't*, in other
words. So, it's trivial to define fresh classes and descriptors to
represent user-defined kinds and attributes, and in fact the
repository already does this kind of class generation today to support
multiple inheritance of kinds.
What do we gain from this? Well, it won't be necessary to keep track
of or look up Kinds in order to create items: just create an instance
of the class. And if there's a class for every Kind that needs to be
referenced "statically" in code, then you won't need to also keep
track of repository paths in order to get access to a kind; just
import the class and ask for its kind.
Parcel Loading
--------------
There are no plans to change the current parcel loading arrangements;
parcel.xml will remain a valid way to define schemas and instances.
The only change likely to be made to parcel loading is to ensure that
a parcel's Python modules are imported before trying to process
instances defined in the parcel.xml. This is to ensure that the kinds
are present in the repository before the instances are created. Apart
from this change, however, the parcel.xml format should not be impacted.
Existing parcels will be changed to use the new schema definition
mechanism on an "inside out" basis. That is, superkinds will be
changed before subkinds. This is because kinds defined in a
parcel.xml can refer to kinds defined in a Python module, but not the
other way around. So, likely the contentmodel parcel will be changed
first.
There is, however, a new step that will have to be done when new kinds
or attribute definitions are added to a parcel defined using Python.
Each kind or attribute needs a permanent UUID assigned to it, as this
UUID will be used to synchronize the Python module with the
repository, and in the future it may be used to help support schema
evolution. Spike has a tool that will automatically assign UUIDs for
you, so that you don't have to do it by hand::
http://cvs.osafoundation.org/viewcvs.cgi/internal/Spike/src/spike/uuidgen.txt?rev=HEAD&content-type=text/vnd.viewcvs-markup
(Of course, it will have to be ported to work with the new Chandler
schema API, because Spike doesn't currently integrate with the
repository.)
If you forget to run the tool over a module whose schema has changed,
and you didn't set up the UUIDs by hand, an exception will be raised
when you try to create instances of the new or changed classes. There
should be a reminder in the error message telling you to run the UUID
generation tool to resolve the error.
API "Quick Reference"
---------------------
It is currently an open issue where the API will live. But it's going
to be a module called ``schema``, such that you'll do ``from somewhere
import schema``; it's just not clear yet what ``somewhere`` will be.
Here are the main features of interest:
``schema.Item``
The base class for persistent items; inherit from it or a
subclass. Note that your Python inheritance relationship will
determine the superkind hierarchy of your newly defined kinds, so you
will want to be sure that you subclass the appropriate base kind
class, rather than subclassing everything directly from ``schema.Item``
``schema.One``
Define an attribute of "single" cardinality, optionally specifying
any attribute aspects like its type and display name.
``schema.Many``
Define an attribute of "set" cardinality (once this is available
in the repository), optionally specifying any attribute aspects like
its type and display name.
``schema.Sequence``
Define an attribute of "list" cardinality, optionally specifying
any attribute aspects like its type and display name.
``schema.Mapping``
Define an attribute of "dict" cardinality, optionally specifying
any attribute aspects like its type and display name.
``schema.Cloud``
Define a cloud attribute. (This isn't entirely worked out yet;
Spike was using a different approach to the cloud concept, so I may
need some assistance from someone wise in the ways of clouds before
getting a concrete API defined for this.)
In order to reference types (as opposed to kinds), you'll import them
from ``repository.schema.Types``. For example, ``Types.String`` to
define a string attribute. For attributes that reference other kinds,
you'll just import the corresponding class directly from the
appropriate module.
Attribute aspects will mostly be keyword arguments to the attribute
definitions. Inverse attributes for bidirectional relationships will
be specified with an ``inverse`` keyword, and as in Spike they will
refer to an attribute of the other class. For example::
class ContentItem(schema.Item):
...
creator = schema.One(
displayName = "Created By",
doc = "Link to the contact who created the item",
)
class Contact(ContentItem):
itemsCreated = schema.Many(
ContentItem, # sequence of ContentItem
inverse = ContentItem.creator,
...
)
Notice that the inverse need only be specified on *one* side of the
bidirectional relationship -- whichever side is defined last.
Implementation Tasks
====================
1. Update Spike's code generator tests to use the repository's new
"null view" instead of a memory repository. (DONE; this yielded a 40%
speed improvement for the tests, dropping pack load time from roughly
1.3 seconds to about 0.8 seconds.)
2. Add Spike tests to prototype programmatic creation of repository
Kinds and Attributes, and setting their UUIDs at construction time.
3. Test subclassing the repository's new C-based descriptor types and
adding Spike-style metadata to them.
4. Implement the actual schema API and doctests in the main Chandler
codebase for Kinds and Attributes. (This is pending a decision of
where the API should live in the Chandler package namespace; maybe
that decision can be wrapped next week while I'm in SFO.)
5. Define and implement a cloud-definition API (probably needs some
input from persons Wise in the Ways of Clouds)
6. Port Spike's UUID generation tool (and docs) to work with modules
using the Chandler schema API
7. Attempt a port of the ``contentmodel`` parcel using the API,
possibly w/participation by others. (Note: Andi would need to have
completed the repository auto-import feature before this would
actually be usable in the Chandler application.)
8. Modify the parcel loading facilities to ensure that modules
defining kinds are imported before loading parcel.xml files that
define instances of those kinds. (This might need to be done by
someone other than me; it might also require some minor changes to
existing parcels or to the rules for how parcel loading is sequenced.)
9. Investigate possible synergy between the descriptor-level aspect
caching that Andi wants to do for performance reasons, and the aspect
setting that the schema API needs to do for schema definition
reasons. (This will probably actually happen while I'm in SFO next
week; it's only at the bottom of this list because it's optional in
the general scheme of things.)
10. Investigate the feasibility of implementing Spike's
``schema.Relationship`` concept for Chandler, to allow creation of
global attributes that don't appear in a class' static API, allowing
parcels to expand/extend existing parcels.
In Conclusion
=============
* Python class definitions offer a compact and convenient way to
specify Chandler schemas that will be easier and less error-prone to
use than parcel.xml, without losing any of Chandler's current or
planned flexibility.
* parcel.xml isn't going away, and during the transition any schema
components defined in parcel.xml should be able to co-exist with those
defined using Python (barring any inter-dependency issues and assuming
no other issues arise).
* Using Python-defined schema means that content items can be unit
tested in isolation, without parcel loading overhead, making fast unit
tests possible, enabling a test-driven approach to development of the
non-UI portions of Chandler. It also reduces coupling between
routines that currently have to ferry repository views or items around
in order to be able to find kinds and set parents on newly created items.
I hope that this was informative and helpful. I will be in OSAF's San
Francisco offices next Monday through Thursday (April 18th-21st), so
if you'd like to spend some time talking about any aspect of this
proposal during those days, please let me know. Thanks!
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev