[Dev] Re: DRAFT: Python Schema API proposal

Phillip J. Eby Fri, 15 Apr 2005 15:59:25 -0700

At 02:10 PM 4/15/05 -0700, Andi Vajda wrote:

A dirty little detail to also hide, in addition to 'kind', is 'parent', that
is, where in the repository the item lives. While this brings little
semantic value (except for Schema) it is useful for debugging and for
repository maintenance tasks that cannot rely on schema for inferring
structure. In your Schema API you could add the notion of 'defaultParent' to
the class declarations. That default parent would then be used when an item
is instantiated. Let's talk about this next week.


Sure.

What do we gain from this? Well, it won't be necessary to keep track of or look up Kinds in order to create items: just create an instance of the class. And if there's a class for every Kind that needs to be referenced "statically" in code, then you won't need to also keep track of repository paths in order to get access to a kind; just import the class and ask for its kind.
Tadaaah, I think that last paragraph sums it up very well.
I do believe that the educational intro before is also very welcome.
Did you bite your pen not to say 'python is not java' again ? :)

:)

Parcel Loading
--------------
There are no plans to change the current parcel loading arrangements; parcel.xml will remain a valid way to define schemas and instances. The only change likely to be made to parcel loading is to ensure that a parcel's Python modules are imported before trying to process instances defined in the parcel.xml. This is to ensure that the kinds are present in the repository before the instances are created. Apart from this change, however, the parcel.xml format will not be impacted.
Well, parcel loading supports circular dependencies something Python is not
too good at. So, changes in the parcel structure will have to be made. Note
that that is a *good* development, not a bad one. I'm very worried about all
the circularity we currently have in Chandler.

I'm not so sure that the circularity needs any special workarounds; if I understand correctly the primary circularity is due to bidirectional references, and this proposal sidesteps that issue by only requiring one end of the link to reference the other.

Existing parcels will be changed to use the new schema definition mechanism on an "inside out" basis. That is, superkinds will be changed before subkinds. This is because kinds defined in a parcel.xml can refer to kinds defined in a Python module, but not the other way around. So, likely the contentmodel parcel will be changed first.
Why could python-defined kinds not depend on repository kinds ?

Well, given the other information you've laid out in your reply, I see we probably *could* make that happen with some more work. You'd just need to be able to pull a generated class out of the null view for the kind(s) in question. However, I'd just as soon not bother unless developer feedback says we'd rather port parcels in "outermost in" direction instead of "innermost out".

You do need to the core schema, no ?

The only such dependency I'm aware of is for 'Item'; i.e. 'schema.Item' needs to know that it corresponds to 'Item' in the core schema. Apart from that bit of bootstrapping, I believe everything else is handled by loading the core schema pack. But since I haven't actually written that bit yet, I could be wrong.

There is, however, a new step that will have to be done when new kinds or attribute definitions are added to a parcel defined using Python. Each kind or attribute needs a permanent UUID assigned to it, as this UUID will be used to synchronize the Python module with the repository, and in the future it may be used to help support schema evolution. Spike has a tool that will automatically assign UUIDs for you, so that you don't have to do it by hand::
Actually, no, I've again leaned against using the same UUIDs. You can certainly do that if you want but it doesn't buy you much. I've added the _uuid argument to the Item() constructor as you had requested but the importing of schema items from view to view (or from null view to view) does not require the UUIDs to be the same. Schema items are matched across views first using UUIDs, then parent/child paths. Instead, items that need to be copied on export, instead of moved, are marked with a new flag, Citem.COPYEXPORT and their 'export' cloud is used to gather all the related items that are to be copied along. I already defined the 'export' clouds for schema items.

Okay, so the schema API will set this flag, then. It will also have to generate paths based on module location, meaning that moving or renaming modules, classes, or attributes will change the schema. (With UUIDs, the schema could remain unaffected by such "superficial" changes.)

However, using names instead of UUIDs is less work for me so I have no objection. :)

On the other hand, it will probably require deciding a convention for translating a Python package/module/class/attribute name into a repository path, so maybe we can decide that next week.

As for schema evolution, matching UUIDs could tell you that the schemas are
not the same but you still know nothing about which is more recent and which
was derived from what.
Realizing that, I also moved against relying on UUIDs
to match schema since we need to introduce some more talkative identifier
for schema matching, including at least some version number. By the way,
while implementing item import I realized that that could very well be the
starting point for a schema evolution implementation hook as well. But
that's for later.

I'm actually assuming a scenario in which the UUID stays the same across all versions of a schema item, and that the repository simply syncs to whatever the schema says. This would work pretty simply for simple add/delete/rename/alter attribute aspect schema changes.

But in any case I'm not going to argue for more work for myself so I'm certainly fine with sticking with paths for the time being. :)

``schema.Item`` The base class for persistent items; inherit from it or a subclass. Note that your Python inheritance relationship will determine the superkind hierarchy of your newly defined kinds, so you will want to be sure that you subclass the appropriate base kind class, rather than subclassing everything directly from ``schema.Item``
Is that 'Item' class the same as repository.item.Item.Item ?

Probably not, unless you want to allow the changes back into repository.Item. It will have a custom metaclass and some bootstrap code so that it can hardwire itself to the core schema Item kind instead of creating a fresh kind of its own.

``schema.Cloud`` Define a cloud attribute. (This isn't entirely worked out yet; Spike was using a different approach to the cloud concept, so I may need some assistance from someone wise in the ways of clouds before getting a concrete API defined for this.)
There is no such thing currently, Clouds use Endpoints. An Endpoint is to a
Cloud what an Attribute is to a Kind.


And so you see why I'm going to want some help on this part of the API.  :)

Notice that the inverse need only be specified on *one* side of the bidirectional relationship -- whichever side is defined last.
Probably fine for this API but this is more restrictive from what the data
model expects. The data model only matches names in a bidirectional ref, not
attributes.

Actually, it's not more restrictive except insofar as it keeps you from making a mistake by referring to an attribute name that does not exist. It will use the same underlying implementation of the data model, so it too will really be based on names rather than attribute objects.

So, it might seem to be more restrictive but the only scenarios it forbids are ones that would not work now anyway. (I.e., ones where an attribute with the given name does not exist on the referenced kind.)

* Python class definitions offer a compact and convenient way to specify Chandler schemas that will be easier and less error-prone to use than parcel.xml, without losing any of Chandler's current or planned flexibility.
Almost true. You're introducing two major differences/restrictions in your
model. They are limited to your model, in other words, as long as they don't
bleed back into the data model, they're fine. A one-to-one kind-class
mapping and matching attributes instead of attribute names in bidirectional
refs are new constraints you introduced here. Again, if users are fine with
them and they don't bleed back, they're fine.

As per above, attribute matching still takes place by name, it'll just be getting the name from the descriptor instead of relying on the user typing the right pair of names in both places.

And, the one-to-one kind-class mapping isn't actually a restriction either: it merely requires that the repository be able to generate a class for any Kind that wasn't already defined by using a class statement -- something it can do now, and already does when mixins are involved.

* Using Python-defined schema means that content items can be unit tested in isolation, without parcel loading overhead, making fast unit tests possible, enabling a test-driven approach to development of the non-UI portions of Chandler.
Yes, you're removing parcel loading (more or less) but I should point out
that the null view I added two weeks ago should work like any other view
from the standpoint of single-view unit tests. Many of our unit tests could
be converted to using the null view, even before your schema API is
ready. Performance improvements should be noticeable since commit would be
completely shut out. Parcel loading costs would still be incurred of course.

And pack loading as well. However, my experiment with Spike yesterday shows that the null view was 40% faster loading the core schema packs compared to a "ramdb" repository view.

It might be interesting to change the current unit test base class to create fresh null views each time instead of fresh ramdb repositories, and see what kind of a boost that change alone produces.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Open Source Applications Foundation "Dev" mailing list
http://lists.osafoundation.org/mailman/listinfo/dev

[Dev] Re: DRAFT: Python Schema API proposal

Reply via email to