Re: Status of std.xml (D2/Phobos)

Michel Fortin Tue, 29 Jun 2010 05:31:10 -0700

On 2010-06-29 04:41:50 -0400, Alix Pexton <[email protected]> said:

On 28/06/2010 15:11, Steven Schveighoffer wrote:
Yes, I don't think the phobos solution needs to mimic exactly the API of
SAX or DOM, the author should be free to use D idioms. But starting with
a common proven design is probably a good idea.

-Steve
I've been thinking about it, and while I believe you when you say thatSAX can be used to build the DOM, I'm not convinced that SAX is thelowest common abstraction.
Michel Fortin's Tokenizer/Range seems much closer to the metal to me.


It is closer to the metal, but there's a catch...

One issue with SAX is that you must allocate an array of strings topass the attributes of an element, which is probably going to need adynamic allocation at some point. A lower-level abstraction such asmine (or Tango's pull-parser) just returns each attribute as a separatetoken as it parses them.

The downside of the tokenizer interface is that it only checks for asubset of well-formness, for instance it doesn't check that tagsbalance each other correctly or that there is no two attributes withthe same name. It's just a "tokenizer" after all, it can't be describedas a conformant XML parser by itself. The upper layer parser needs tocheck for these things. My mini DOM built on this tokenizer does thesechecks when using the tokenizer, and it's more efficient to do themthere because that's where the context information is kept, which iswhy the tokenizer doesn't do them.

Implementing SAX on top of my tokenizer consists mostly of ensuringproper tag balancing, checking for duplicate attributes, and collectingattributes in an array (or another kind of list) you can then give tothe openElement SAX callback.


--
Michel Fortin
[email protected]
http://michelf.com/

Re: Status of std.xml (D2/Phobos)

Reply via email to