Stefan Seefeld wrote:

What I originally suggested was not a parser, but a set of APIs to
manipulate XML. The parser part (i.e. the piece of code that generates
a parse tree from an XML file) is the simples part of it all. What
is much more tricky is to get the right internal structure to make
operation on the tree efficient and convenient.

That said, I would *not* recommend to rewrite any such thing. It is
a *lot* of work, and as such quite unrelated to boost's goals.

Would also mapping an implementations structure to a C++ internal structure also require quite a bit of work?


OPTION 1: C++ specific internal mappings.

boost::xml::dom::document doc( "demo.xml" );

class document
{
  private:
     boost::xml::dom::element root;
  public:
     document( const char * fn )
     {
        impl::XMLDOMDocument doc( fn );
        BuildXMLDOM( doc.documentElement, root );
     }
  private:
     void BuildXMLDOM( impl::XMLDOMElement, boost::xml::dom::element & );
};]

Where BuildXMLDOM recursivley builds the internal XML tree structure.

NOTE: This is necessary if you want to use an internal C++ representation to efficiently model the structure for C++ bindings, e.g. using trinary search trees or other associative container for attribute storage.

This would make the loading of an XML document more computationally and memory intensive because you have to load it twice (one by the parser and one by the C++ bindings). There are problems in this regard when loading large documents (effectively having double the memory capacity). Also, what about SAX facilities?

OPTION 2: If you are intending to wrap an implementation like libxml2 into a C++ interface, you would sacrifice how the data is represented internally and you would get a slight performance penalty from the wrappers (not so much if you use inlined functions). This approach would not suffer the loading penalties described above.

OPTION 3: Writing a boost XML/XPath parser would allow the internal structure to be optimised for C++-specific bindings, while not suffering from either wrapper performance penalties nor document loading/SAX parsing penalties.

What I had (and still have) in mind is a C++ interface to an existing implementation (libxml2 actually).

What if the user wants an interface to another implementation? Is it possible to standardize access to other parsers.


NOTE: If you are using the option 1 approach, the variations would occur in boost::xml::dom::document - specifically the constructor and the semantics for BuildXMLDOM.

Also I'm not convinced that the main goal should be to conform with
the DOM specs as provided by w3c. Lots of implementers / users consider its design broken. Instead I'd suggest to try to come up with a 'good' C++ API, and then build a wrapper around it that provides the legacy
mapping as needed.

This is the thinking that I have moved towards, more details of which can be found in my last post on the subject.


NOTE: Here and in my previous post, I use DOM to refer to the C++ Document Object Model binding, and not the W3C DOM standard. when I refer to that, I use the W3C to specify what type of DOM it is.

Regards,
Reece

_________________________________________________________________
Get Hotmail on your mobile phone http://www.msn.co.uk/msnmobile

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Reply via email to