Parsing foreign XML

Jeremias Maerki Wed, 15 Feb 2006 06:02:33 -0800

Among other things, I'm currently implementing support for XMP metadata
as part of my PDF/A-1 work [1]. XMP can consist of elements from a
number of namespaces which are not necessarily known beforehand,
especially since you can add arbitrary private extensions to XMP
metadata. With our current method you get warning messages such as the
following for unregistered namespaces:


WARNING: Unknown formatting object http://ns.adobe.com/xap/1.0/^CreateDate

When parsing SVG, I think there's a similar problem, if someone decides
to use foreignObject or RDF in the metadata tag. As part of the
AreaTreeParser I introduced the concept of a ContentHandlerFactory (util
package). This could also be used when building the FO tree instead of
dealing with XMLObj-descendants which are discarded after the DOM
element they generate is added to the Document on one of its ancestors.
I'm already using it there to parse SVG, MathML and Barcode XML. Doing
this would mean refactoring FOTreeBuilder a little. I imagine I can do
better than what I did in AreaTreeParser.Handler so far which still has
a number of "ifs" in the SAX event handlers. I would probably insert a
dispatching ContentHandler in front of the actual handler that builds
the FO tree. The impact on the normal FO tree building should be
minimal: it's only one additional delegation of the SAX event calls. I
can't tell how this will look in detail, yet.

Any objects about me going down this road? I can try to explain more if
the above is not clear.

[1] http://wiki.apache.org/xmlgraphics-fop/PDFA1ConformanceNotes

Jeremias Maerki

Parsing foreign XML

Reply via email to