> On Mar 23, 2017, at 3:17 PM, Deon Brewis <de...@outlook.com> wrote:
> 
> If you however can use a forward-only push or pull parser like a SAX or StAX 
> parse, it's a different story. I'm using a StAX-like pull parser for a binary 
> json-ish internal format we have, and reading & parsing through it is on par 
> with the performance of reading equivalent SQLITE columns directly

I agree that’s a lot faster, but you’re still looking at O(n) lookup time in an 
array or dictionary. And the proportion constant gets worse the bigger the 
document is, since jumping to the next item involves parsing through all of the 
nested items in that collection.

> That obviously implies if you do random-access into a structure you have to 
> keep reparsing it (which is where Memoization would be nice). However, CPU 
> caches are better at reading continues data streams in forward-only fashion 
> than they are with pointers, so forward-only pull parsers, even when you have 
> to repeat the entire parse, are often faster than the math behind it suggests.

It could be; my knowledge of optimization gets tenuous when it comes to 
down-to-the-metal areas like CPU caching. But for large data, you run the risk 
of blowing out the cache traversing it. And if the data is memory-mapped, it 
becomes hugely faster to skip right to the relevant page instead of faulting in 
every page ahead of it.

> Besides, in 99% of cases my users take the outcome from a json parse and just 
> store the results into a C++ data structure anyway. In that case the 
> intermediary object tree is just a throwaway and you may as well have built 
> the C++ structure up using a pull or push parser.

In Fleece I put a lot of effort into making the C++ API nice to use, so that I 
don’t have to have any other data structure. That's worked well so far.

—Jens

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to