On 2009-08-04 10:01:51 -0400, Michael Rynn <[email protected]> said:

It would be nice  to have well defined interfaces for  DOM, SAX and
PULL parsers which share some of the base parsing code. The DOM can be
partial,  as node sets returned from XPath query. Nice how the phobos
parser can make a full DOM or just the bits required.

Exactly what I've been working on:

Tokenizer part: http://michelf.com/docs/d/mfr/xmltok.html
DOM part:       http://michelf.com/docs/d/mfr/xml.html

Note that it's still a work in progress. Here are some things I'd like to do:

tokenizer: add specialized exception classes to better report various problems, add better checks for valid characters (should be optional), better support for ranges (currently only string because I rely on "a.before(b)" to avoid dynamic allocation), also add support for the internal subset in the doctype (but that's low priority).

Writer: replace by a simple template function and a toString function defined for each token type? or a writeTo function (to avoid creating a intermediary string)?

XMLForwardRange: allow a template parameter specifying the token types you want to see, skipping all others. This could be done by passing a custom Algebraic type instead of the provided one what can contain all tokens.

DOM classes: it's mostly experimental for now.

There's no SAX yet, although it should be trivial to add over the existing callback tokenizer.


--
Michel Fortin
[email protected]
http://michelf.com/

Reply via email to