Hi, I mentioned in the DOC review thread that I'd like to start a separate discussion about the object registration for XML parsing in the DOC.
So, here it is... ;-) The attached HTML file outlines possible solutions here. I'm personally in favour of the one outlined in section 2.3 (combined with 2.2). Would greatly appreciate people's feedback since this will be added to the next revision of the DOC Design document. Thanks, Darren.Title: Object Registration for XML Parsing
Object Registration for XML Parsing
1. Introduction
The purpose of this document is to outline the way that XML parsing is done using the Data Object Cache.
1.1. XML to Object Conversion
The Data Object Cache uses the can_handle() and from_xml() methods of the DataObject class as a factory for generating a new object from a snippet of XML.
In general this is done using code like:
class DataObjectCache(DataObject): # ... @classmethod def find_class_to_handle( cls, node ): <2> """ Find a class that handles a node in the known_classes list. """ for klass in cls.known_classes: if klass.can_handle( node ): return klass return None @classmethod def create_doc_from_xml( cls, parent, node ): <1> # Use same parent, skip level by default new_parent = parent klass = cls.find_class_to_handle(node) if klass: obj = klass.from_xml(node) if obj: obj.parent = parent parent.children.append( obj ) new_parent = obj for child in node.getchildren(): cls.create_doc_from_xml( new_parent, child )
The create_doc_from_xml() (<1>) is recursively called to traverse the XML tree, and uses the method find_class_to_handle() (<2>) to find a class that will handle a given node.
For find_class_to_handle() to work, it needs to know in advance what classes are in the system, so that it can ask them whether they can handle the XML or not, etc.
2. Class Registration
There are various mechanisms for knowing what classes are available to instantiation, here I will try to outline what options there are:
2.1. Single Editable Registry
This is probably the most basic mechanism, but relies totally upon the list of possible classes being known at build-time.
Any new sub-class of DataObject should be registered by the developer by adding it to a file in a form like:
from cache.data_object_cache import DataObjectCache from my_new_module import MyClass # New Class Import DataObjectCache.register_class( MyClass ) # Register Class
Advantages
-
It’s very simple to do.
Disadvantages
-
Not extensible, can only be updated, reliably, by developers with access to source.
2.2. Register By Package
This one allows for each new package to have it’s own registration by using similar code to the Single Editable Registration, but in a package by package basis.
This allows for each module to control what it registers, as opposed to a central point.
src
|-- cache/
| |-- __init__.py
| |-- data_object.py
| `-- data_object_cache.py
|-- package1/
| |-- __init__.py
| `-- module1.py
|-- package2/
| |-- __init__.py
| `-- module2.py
|-- package3/
| |-- __init__.py
| `-- module3.py
|-- package4/
| |-- __init__.py
| `-- module4.py
`-- package5/
|-- __init__.py
`-- module5.py
The __init__.py in a package would need to have code along the lines of the following to perform the registration:
# Static registry of know class types... from cache.data_object_cache import DataObjectCache from package1.module1 import Class1 DataObjectCache.register_class(Class1)
This does the same type of registration that we saw in the Simple Editable Registration example above.
Advantages
-
Extensible
-
Each package can define that it wants to register
-
Not limited to packages provided from Install team.
-
Disadvantages
-
Requires some-one (anyone) to import package before ManifestParser is run
-
For most Applications this really isn’t an problem since it’s likely that some code will be import it
-
If using dynamic loading of modules (like thought best to use for Checkpoints) then it not happen early enough, but this can be mitigated by the class that pre-defines the loading parameters being the one registered.
-
Also can be a problem for the introduction of new checkpoints (most likely use here are as finalizers).
-
2.3. Extension To XML Manifest and Register By Package
This is pretty much the same thing as the Register By Package section above, but with the addition of a section in the XML Manifest to have a package loading mechanism at the start, which would be specifically handled by ManifestParser to trigger imports of python packages..
<load_packages> <module package_name="package1"/> <module package_name="package2"/> <module package_name="package3"/> </load_package>
Advantages
-
Allows for finalizers to be written in Python, and have code to handle special tags for it pre-loaded and thus registered with the DOC for completing the import of the rest of the XML into DOC.
Disadvantages
-
Requires an addition to the XML Manifest Schema
2.4. Register By Searching Python Path
This one would require the DOC, on start-up, to recursively search the Python Path (sys.path) for packages that contain a specific signature file, which we would then execute/process to register objects with the DOC.
An example of such a signature file would be for there to be a Python module file of a specific name - e.g. __DataObjectCache__.py, which would be then looked for, and if found executed.
Advantages
-
Very Extensible
Disadvantages
-
Possibly long start-up time when performing search.
-
If Python Path contains insecure directories, then there is a risk of malicious code being auto-loaded.
_______________________________________________ caiman-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

