On Sunday, 3 May 2015 at 17:39:48 UTC, Robert burner Schadek wrote:
std.xml has been considered not up to specs nearly 3 years now. Time to build a successor. I currently plan the following featues for it:

- SAX and DOM parser
- in-situ / slicing parsing when possible (forward range?)
- compile time switch (CTS) for lazy attribute parsing
- CTS for encoding (ubyte(ASCII), char(utf8), ... )
- CTS for input validating
- performance

Not much code yet, I'm currently building the performance test suite https://github.com/burner/std.xml2

Please post you feature requests, and please keep the posts DRY and on topic.

If I were doing it, I'd do three types of parsers:

1. A parser that was pretty much as low level as you can get, where you basically a range of XML atributes or tags. Exactly how to build that could be a bit entertaining, since it would have to be hierarchical, and ranges aren't, but something like a range of tags where you can get a range of its attributes and sub-tags from it so that the whole document can be processed without actually getting to the level of even a SAX parser. That parser could then be used to build the other parsers, and anyone who needed insanely fast speeds could use it rather than the SAX or DOM parser so long as they were willing to pay the inevitable loss in user-friendliness.

2. SAX parser built on the low level parser.

3. DOM parser built either on the low level parser or the SAX parser (whichever made more sense).

I doubt that I'm really explaining the low level parser well enough or have even though through it enough, but I really think that even a SAX parser is too high level for the base parser and that something that slightly higher than a lexer (high enough to actually be processing XML rather than individual tokens but pretty much only as high as is required to do that) would be a far better choice.

IIRC, Michel Fortin's work went in that direction, and he linked to his code in another post, so I'd suggest at least looking at that for ideas.

Regardless, by building layers of XML parsers rather than just the standard ones, it should be possible to get higher performance while still having the more standard, user-friendly ones for those that don't need the full performance and do need the user-friendliness (though of course, we do want the SAX and DOM parsers to be efficient as well).

- Jonathan M Davis

Reply via email to