On 2009-08-04 10:01:51 -0400, Michael Rynn <[email protected]> said:
It would be nice to have well defined interfaces for DOM, SAX and
PULL parsers which share some of the base parsing code. The DOM can be
partial, as node sets returned from XPath query. Nice how the phobos
parser can make a full DOM or just the bits required.
Exactly what I've been working on:
Tokenizer part: http://michelf.com/docs/d/mfr/xmltok.html
DOM part: http://michelf.com/docs/d/mfr/xml.html
Note that it's still a work in progress. Here are some things I'd like to do:
tokenizer: add specialized exception classes to better report various
problems, add better checks for valid characters (should be optional),
better support for ranges (currently only string because I rely on
"a.before(b)" to avoid dynamic allocation), also add support for the
internal subset in the doctype (but that's low priority).
Writer: replace by a simple template function and a toString function
defined for each token type? or a writeTo function (to avoid creating a
intermediary string)?
XMLForwardRange: allow a template parameter specifying the token types
you want to see, skipping all others. This could be done by passing a
custom Algebraic type instead of the provided one what can contain all
tokens.
DOM classes: it's mostly experimental for now.
There's no SAX yet, although it should be trivial to add over the
existing callback tokenizer.
--
Michel Fortin
[email protected]
http://michelf.com/