[caiman-discuss] Object Registration for XML Parsing in the DOC

Darren Kenny Fri, 28 May 2010 09:10:56 -0700

Hi,

I mentioned in the DOC review thread that I'd like to start a separate
discussion about the object registration for XML parsing in the DOC.


So, here it is... ;-)

The attached HTML file outlines possible solutions here.

I'm personally in favour of the one outlined in section 2.3 (combined with 2.2).

Would greatly appreciate people's feedback since this will be added to the next
revision of the DOC Design document.

Thanks,

Darren.

Title: Object Registration for XML Parsing

1. Introduction

The purpose of this document is to outline the way that XML parsing is done using the Data Object Cache.

1.1. XML to Object Conversion

The Data Object Cache uses the can_handle() and from_xml() methods of the DataObject class as a factory for generating a new object from a snippet of XML.

In general this is done using code like:

Example Source for DataObjectCache methods

class DataObjectCache(DataObject):

    # ...

    @classmethod
    def find_class_to_handle( cls, node ):                      <2>
        """
        Find a class that handles a node in the known_classes list.
        """
        for klass in cls.known_classes:
            if klass.can_handle( node ):
                return klass

        return None

    @classmethod
    def create_doc_from_xml( cls, parent, node ):               <1>
        # Use same parent, skip level by default
        new_parent = parent

        klass = cls.find_class_to_handle(node)
        if klass:
            obj = klass.from_xml(node)
            if obj:
                obj.parent = parent
                parent.children.append( obj )
                new_parent = obj

        for child in node.getchildren():
            cls.create_doc_from_xml( new_parent, child )

The create_doc_from_xml() (<1>) is recursively called to traverse the XML tree, and uses the method find_class_to_handle() (<2>) to find a class that will handle a given node.

For find_class_to_handle() to work, it needs to know in advance what classes are in the system, so that it can ask them whether they can handle the XML or not, etc.

2. Class Registration

There are various mechanisms for knowing what classes are available to instantiation, here I will try to outline what options there are:

2.1. Single Editable Registry

This is probably the most basic mechanism, but relies totally upon the list of possible classes being known at build-time.

Any new sub-class of DataObject should be registered by the developer by adding it to a file in a form like:

Example - Single File Registration - registry.py

from cache.data_object_cache import DataObjectCache

from my_new_module import MyClass   # New Class Import

DataObjectCache.register_class( MyClass ) # Register Class

Advantages

It’s very simple to do.

Disadvantages

Not extensible, can only be updated, reliably, by developers with access to source.

2.2. Register By Package

This one allows for each new package to have it’s own registration by using similar code to the Single Editable Registration, but in a package by package basis.

This allows for each module to control what it registers, as opposed to a central point.

Example - Sample Directory Setup

    src
    |-- cache/
    |   |-- __init__.py
    |   |-- data_object.py
    |   `-- data_object_cache.py
    |-- package1/
    |   |-- __init__.py
    |   `-- module1.py
    |-- package2/
    |   |-- __init__.py
    |   `-- module2.py
    |-- package3/
    |   |-- __init__.py
    |   `-- module3.py
    |-- package4/
    |   |-- __init__.py
    |   `-- module4.py
    `-- package5/
        |-- __init__.py
        `-- module5.py

The __init__.py in a package would need to have code along the lines of the following to perform the registration:

Example - Package Initialisation

# Static registry of know class types...
from cache.data_object_cache import DataObjectCache
from package1.module1 import Class1

DataObjectCache.register_class(Class1)

This does the same type of registration that we saw in the Simple Editable Registration example above.

Advantages

Extensible
- Each package can define that it wants to register
- Not limited to packages provided from Install team.

Disadvantages

Requires some-one (anyone) to import package before ManifestParser is run
- For most Applications this really isn’t an problem since it’s likely that some code will be import it
- If using dynamic loading of modules (like thought best to use for Checkpoints) then it not happen early enough, but this can be mitigated by the class that pre-defines the loading parameters being the one registered.
- Also can be a problem for the introduction of new checkpoints (most likely use here are as finalizers).

2.3. Extension To XML Manifest and Register By Package

This is pretty much the same thing as the Register By Package section above, but with the addition of a section in the XML Manifest to have a package loading mechanism at the start, which would be specifically handled by ManifestParser to trigger imports of python packages..

Example - XML Manifest Python Package Loading

    <load_packages>
        <module package_name="package1"/>
        <module package_name="package2"/>
        <module package_name="package3"/>
    </load_package>

Advantages

Allows for finalizers to be written in Python, and have code to handle special tags for it pre-loaded and thus registered with the DOC for completing the import of the rest of the XML into DOC.

Disadvantages

Requires an addition to the XML Manifest Schema

2.4. Register By Searching Python Path

This one would require the DOC, on start-up, to recursively search the Python Path (sys.path) for packages that contain a specific signature file, which we would then execute/process to register objects with the DOC.

An example of such a signature file would be for there to be a Python module file of a specific name - e.g. __DataObjectCache__.py, which would be then looked for, and if found executed.

Advantages

Very Extensible

Disadvantages

Possibly long start-up time when performing search.
If Python Path contains insecure directories, then there is a risk of malicious code being auto-loaded.

_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

[caiman-discuss] Object Registration for XML Parsing in the DOC

Object Registration for XML Parsing

1. Introduction

1.1. XML to Object Conversion

2. Class Registration

2.1. Single Editable Registry

2.2. Register By Package

2.3. Extension To XML Manifest and Register By Package

2.4. Register By Searching Python Path

Reply via email to