On 2010-06-29 04:41:50 -0400, Alix Pexton <[email protected]> said:

On 28/06/2010 15:11, Steven Schveighoffer wrote:

Yes, I don't think the phobos solution needs to mimic exactly the API of
SAX or DOM, the author should be free to use D idioms. But starting with
a common proven design is probably a good idea.

-Steve

I've been thinking about it, and while I believe you when you say that SAX can be used to build the DOM, I'm not convinced that SAX is the lowest common abstraction.

Michel Fortin's Tokenizer/Range seems much closer to the metal to me.

It is closer to the metal, but there's a catch...

One issue with SAX is that you must allocate an array of strings to pass the attributes of an element, which is probably going to need a dynamic allocation at some point. A lower-level abstraction such as mine (or Tango's pull-parser) just returns each attribute as a separate token as it parses them.

The downside of the tokenizer interface is that it only checks for a subset of well-formness, for instance it doesn't check that tags balance each other correctly or that there is no two attributes with the same name. It's just a "tokenizer" after all, it can't be described as a conformant XML parser by itself. The upper layer parser needs to check for these things. My mini DOM built on this tokenizer does these checks when using the tokenizer, and it's more efficient to do them there because that's where the context information is kept, which is why the tokenizer doesn't do them.

Implementing SAX on top of my tokenizer consists mostly of ensuring proper tag balancing, checking for duplicate attributes, and collecting attributes in an array (or another kind of list) you can then give to the openElement SAX callback.

--
Michel Fortin
[email protected]
http://michelf.com/

Reply via email to