On 2012-02-08 02:44, Jonathan M Davis wrote:
On Tuesday, February 07, 2012 00:56:40 Adam D. Ruppe wrote:
On Monday, 6 February 2012 at 23:47:08 UTC, Jonathan M Davis
wrote:
Also, two of the major requirements for an improved std.xml are
that it needs to have a range-based API, and it needs to be
fast.
What does range based API mean in this context? I do offer
a couple ranges over the tree, but it really isn't the main
thing there.
Check out Element.tree() for the main one.
But, if you mean taking a range for input, no, doesn't
do that. I've been thinking about rewriting the parse
function (if you look at it, you'll probably hate it
too!). But, what I have works and is tested on a variety
of input, including garbage that was a pain to get working
right, so I'm in no rush to change it.
Tango's XML parser has pretty much set the bar on speed
Yeah, I'm pretty sure Tango whips me hard on speed. I spent
some time in the profiler a month or two ago and got a
significant speedup over the datasets I use (html files),
but I'm sure there's a whole lot more that could be done.
The biggest thing is I don't think you could use my parse
function as a stream.
Ideally, std.xml would operate of ranges of dchar (but obviously be optimized
for strings, since there are lots of optimizations that can be done with
string processing - at least as far as unicode goes) and it would return a
range of some kind. The result would probably be a document type of some kind
which provided a range of its top level nodes (or maybe just the root node)
which each then provided ranges over their sub-nodes, etc. At least, that's
the kind of thing that I would expect. Other calls on the document and nodes
may not be range-based at all (e.g. xpaths should probably be supported, and
that doesn't necessarily involve ranges). The best way to handle it all would
probably depend on the implementation. I haven't implemented a full-blown XML
parser, so I don't know what the best way to go about it would be, but
ideally, you'd be able to process the nodes as a range.
- Jonathan M Davis
I think there should be a pull or sax parser at the lowest level and
then a XML document module on top of that parser.
--
/Jacob Carlborg