Since Araq mentions XML, I mentioned `parseJsonFragments` (with a hack to get 
"the right fragments" in this particular case), since that elimination of "grow 
up allocation" was the biggest single faster-by-X multiplier I found (also 
mentioned in Lemire's critique), and since it is not mentioned that I could see 
in @timothee's RFC commentary, it probably bears emphasizing that systematizing 
"incremental parsing" is both important and neglected.

One example of how to expose incremental parsing is what the 
[expat](https://libexpat.github.io/) library does this for XML. Expat has 
callbacks for the beginnings and ends of any XML element. This allows more 
general caller-side allocation & object creation (or lack thereof!) strategies. 
In this example, the code could simply update those `x,y,z` totals at the end 
of every complete element with a name in `[xyz]` \-- the way my 
hacked-parseJsonFragments does, but much less hacky.

Callbacks/calls through non-same-translation-unit function pointers can also be 
"slow" (typically about whole pipeline depth number of cycles or 40-80 dynamic 
instruction slots with superscalar). Nim also has nice inline iterators.

So, maybe the best re-design for this partial-parsing-friendly faster parser 
for a format abused to marshal "big" data is some kind of layered iterators. 
They could set up exposed control flow points at these XML-element-analogue 
completion points and export enough so that client code could do the right 
thing in this sort of circumstance without much fanfare. Just food for 
thought..Something like: 
    
    
    for frag in data.completedJsonFragments:
      if frag.name == "x": x += parseFloat(frag.value)
       ...
    
    
    Run

I have not studied the "winners" in this kostya Json benchmark, but it would be 
unsurprising if they were all incremental parsing of some kind. While Json may 
be a generally bad idea, incrementalism is not. Such a parser might serve as 
good example code for parsing other stuff. Once this was done, I'd expect 
almost half the time to be in parseFloat which could maybe be further optimized.

Reply via email to