On Sunday, 3 August 2014 at 17:40:48 UTC, Andrei Alexandrescu wrote:
On 8/3/14, 10:19 AM, Sean Kelly wrote:
I don't want to pay for anything I don't use. No allocations should occur within the parser and it should simply slice up the input.

What to do about arrays and objects, which would naturally allocate arrays and associative arrays respectively? What about strings with backslash-encoded characters?

This is tricky with a range. With an event-based parser I'd have events for object and array begin / end, but with a range you end up having an element that's a token, which is pretty weird. For encoded characters (and you need to make sure you handle surrogate pairs in your decoder) I'd still provide some means of decoding on demand. If nothing else, decode lazily when the user asks for the string value. That way the user isn't paying to decode strings he isn't interested in.


No allocation works for tokenization, but parsing is a whole different matter.

So the
lowest layer should allow me to iterate across symbols in some way.

Yah, that would be the tokenizer.

But that will halt on comma and colon and such, correct? That's a tad lower than I'd want, though I guess it would be easy enough to build a parser on top of it.


When I've done this in the past it was SAX-style (ie. a callback per
type) but with the range interface that shouldn't be necessary.

The parser shouldn't decode or convert anything unless I ask it to. Most of the time I only care about specific values, and paying for
conversions on everything is wasted process time.

That's tricky. Once you scan for 2 specific characters you may as well scan for a couple more, the added cost is negligible. In contrast, scanning once for finding termination and then again for decoding purposes will definitely be a lot more expensive.

I think I'm getting a bit confused. For the JSON parser I wrote, the parser performs full validation but leaves the content as-is, then provides a routine to decode values from their string representation if the user wishes to. I'm not sure where scanning figures in here.
Andrei

Reply via email to