On Thu, Aug 29, 2013 at 12:41:16PM -0700, Sean Kelly wrote: > On Aug 29, 2013, at 11:57 AM, H. S. Teoh <[email protected]> wrote: > > > > One way is to write the core code of std.xml in such a way that it > > handles all data as ubyte[] (or ushort[]/uint[] for 16-bit/32-bit > > encodings) so that it's encoding-independent. Then on top of this > > core, write some convenience wrappers that casts/converts to string, > > wstring, dstring. As an initial stab, we could support only UTF-8, > > UTF-16, UTF-32 if the user asks for string/wstring/dstring, and > > leave XML in other encodings up to the user to decode manually. This > > way, at least the user can get the data out of the file. > > > > Later on, once we've gotten our act together with std.encoding, we > > can hook it up to std.xml to provide autoconversion. > > As long autoconversion is optional. When parsing XML or JSON or > whatever, I generally only care about specific strings, and sometimes > don't want anything decoded at all. Having decoding done > automatically before the event fires is a huge and potentially > unnecessary performance hit. Not doing this decoding automatically is > what makes the Tango XML parser so fast.
Right, that's why I said the core of std.xml should handle everything as bytes, only specially treating the ASCII values of <, >, &, and other metacharacters. The tagname and tag body should just be a range over segments of the input. T -- What are you when you run out of Monet? Baroque.
