[Dev] Schema API update

Phillip J. Eby Wed, 20 Apr 2005 16:56:49 -0700

On Monday, I met with Katie, Ted, Andi, Grant, and Morgen to review the Schema API proposal and its impact on parcel loading, etc. We identified a number of issues, some of which we resolved during the meeting, and others that I've been working on since then and now have resolutions for.


Mapping Python modules to Parcels
---------------------------------

First, the proposal didn't address the mapping of Python modules to parcel objects, or how Parcel subclasses would be defined/used. I propose to address this by defining an API that will be used to create null-view parcels for importing: ``schema.parcel_for_module(module_name)``. So, e.g.::

    aParcel = schema.parcel_for_module('osaf.contentmodel')

would return a null-view parcel object for ``//parcels/osaf/contentmodel``. This API will work by checking whether the named module has a ``__parcel__`` variable defined, and if not, it will create one using the module's ``__parcel_class__`` variable, if defined. If there is neither a ``__parcel__`` nor ``__parcel_class__``, it will just create a stock Parcel for the module. If a new parcel object is created, it will be saved in the ``__parcel__`` attribute of the module so that subsequent invocations will return the same parcel object. There will have to be some locking support in order to make this API threadsafe, using ``threading.RLock`` because this API is recursive. That is, in order to create a parcel for a module, it will first ask for the parent module's parcel, in order to know what parent to set on the child parcel. So, the locking has to support re-entrancy. Finally, the API will need to be able to return a meaningful value when a null module (i.e. the empty string ``""``) is requested, so that the recursion has a place to "bottom out".


Mapping XML Namespaces to Modules
---------------------------------

Second, during the meeting Morgen pointed out that the XML namespaces used in ``parcel.xml`` today do not directly correspond to the modules where contentmodel classes live. That is, parcels correspond to Python "packages", but not to modules. So, in order to allow gradual transition, when we port packages to use the schema API, we'll need to "flatten" them so that all the package's classes can be imported directly from the package. (E.g. by moving the code directly into the package ``__init__.py``.)

Note that if the flattened package ends up as just an ``__init__.py`` with no ``parcel.xml``, it can then also be changed to be just a module instead of a package. For example, if we were porting the ``osaf.contentmodel.contacts`` parcel to use the schema API, we could just take its ``Contacts.py``, rename it to ``contacts.py`` and move it into ``osaf.contentmodel``, thus moving the ``osaf.contentmodel.contacts.Contacts.Contact`` class to just ``osaf.contentmodel.contacts.Contact``. Then, the location of the content classes will match the XML namespaces used in current ``parcel.xml`` files, and the corresponding repository paths.

Of course, parcels that have instance data in ``parcel.xml`` cannot be converted from packages to modules, because they still need a separate directory for the ``parcel.xml`` to live in. Such parcels can still be flattened by moving the schema classes into the ``__init__.py``, however.

(Note: this is a slightly different resolution than the one(s) we discussed at the meeting on Monday. This modified approach has less likelyhood of error during porting, and also achieves the side benefit of helping to reduce the current deep package nesting of our parcels.)


Parcel Synch and Update
-----------------------

Morgen's questions at the meeting also exposed a couple of issues where the sequence of parcel loading and imports could make a difference to the resulting repository contents. The schema API is intended to support lazy loading on a couple of different levels, but parcel loading is a more synchronous process. Schema classes can't load themselves into the repository right away for three reasons: 1) they don't know what repository "the repository" is, 2) they may have dependencies that aren't yet imported, and 3) we don't want to have to import all possible modules at startup.

So, in order to ensure that namespaces referenced by a ``parcel.xml`` file have been initialized in the repository, there will be an API along the lines of ``schema.synchronize_parcel(repository_view,path)``. The parcel loader will invoke this API when setting up an XML namespace, to ensure that the dependent parcel(s) have been imported, and the relevant schema(s), if any, are added to the repository.

It also became clear during the meeting that changes to schema modules can't be practically detected at present by any parcel loading mechanism, and some expressed the opinion that when one changes a parcel's schema, one generally needs to recreate their repository. So, Andi offered to add a checksum facility to the repository so that when Kinds are imported they will be checked against the existing Kind in the repository, and an error will occur if they differ in any substantial way, thereby alerting you of the need to recreate your repository once you use a changed schema.

Alternatively, we could attempt to support simple schema evolution. In a hallway conversation yesterday, John asked if we could include a way to reload parcels in such a way as to incorporate changes to both code and schema at runtime, to afford a faster development feedback loop. This is not on the feature list for 0.6, but I'll be watching for opportunities to move us in this direction, perhaps by adding some sort of "upgrade hook" to parcel modules or to kind classes, and maybe a way to specify a schema's version. This would naturally have some overlap with repository schema evolution and would be something Andi and I would need to talk about more before coming up with an actionable plan.


Clouds and Endpoints
--------------------

I previously mentioned that I needed someone "Wise in the Way of Clouds" to knock some sense into my head about how they work, and there were at least two such people there on Monday, so now my head hurts, but I'm closer to knowing what would work. :) It's likely that it will look something like this::

    class ContentItem(schema.Item):
        # ... other stuff here

        __clouds__ = dict(
            sharing = schema.Cloud(
                byRef = [displayName, body, issues, createdOn]
            )
        )

The idea here is that ``__clouds__`` is a dictionary mapping cloud aliases (like ``sharing``) to ``schema.Cloud()`` objects, which are the same as regular clouds except that you'll specify attributes by referencing the descriptors rather than strings representing the attribute names. And you'll group the names by policy rather than specifying a policy for each name. This is still very vague, and feedback to help steer this in the right direction would be welcome, especially if I've made a stupid mistake like assuming that the order of endpoints in a cloud is inconsequential, when in fact it makes a difference. (And yes, I'm assuming that, so if that's wrong, somebody please apply the appropriate clue-by-four to my head. Thanks!)


Making the Transition
---------------------

At Monday's meeting, Katie asked for input on how we might proceed with the actual transition, in terms of who, what, how, and when. My initial proposal stated that porting of parcels needed to take place from the "inside out", such that a parcel containing a base class needed to be ported prior to a parcel containing a subclass of that base class, because ``parcel.xml`` files can refer to schema items defined in modules, but not the other way around. Andi commented that this need not be the case, because I was going to have to implement such linkage in order to connect to the core schema (e.g. ParcelManager et al).

After thinking about this some more, though, I realized that although Andi is correct, that only works if the parcels are loaded into the null repository view, which means we'd have to load all parcels into the null view every time, and that's not really what we want. So, at this point I think we should stick with the strategy of working from the "inside out", beginning with core schema elements and working our way out.

And, since the implementation design allows us to port parcels incrementally, this will allow us to control scope of porting in 0.6, because we can "stop any time we want". So, I don't have any specific recommendations regarding the who/what/when parts. Probably what will happen is that once I have enough of the schema API implemented, I will try to port a parcel's schema (probably the osaf.contentmodel.* parcels) and see what happens. Either it will then be an example for how to do other parcels, or I'll learn what doesn't work well in the schema API. We'll also know then how long it might take to port a parcel, and can do more planning at that time.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev

[Dev] Schema API update

Reply via email to