Re: std.data.json formal review

Sönke Ludwig via Digitalmars-d Tue, 25 Aug 2015 00:07:12 -0700

Am 25.08.2015 um 07:55 schrieb Martin Nowak:

On Saturday, 22 August 2015 at 13:41:49 UTC, Sönke Ludwig wrote:

There is more than the actual call to validate(), such as writing
tests and making sure the surroundings work, adjusting the interface
and writing documentation. It's not *that* much work, but nonetheless
wasted work.


I also still think that this hasn't been a bad idea at all. Because it
speeds up the most important use case, parsing JSON from a non-memory
source that has not yet been validated. I also very much like the idea
of making it a programming error to have invalid UTF stored in a
string, i.e. forcing the validation to happen before the cast from
bytes to chars.


Also see "utf/unicode should only be validated once"
https://issues.dlang.org/show_bug.cgi?id=14919

If combining lexing and validation is faster (why?) then a ubyte
consuming interface should be available, though why couldn't it be done
by adding a lazy ubyte->char validator range to std.utf.
In any case during lexing we should avoid autodecoding of narrow strings
for redundant validation.

The performance benefit comes from the fact that almost all of JSON is asubset of ASCII, so that lexing the input will implicitly validate it ascorrect UTF. The only places where actual UTF sequences can occur is instring literals outside of escape sequences. Depending on the type ofdocument, that can result is a lot less conditionals compared to a fullvalidation of the input.

Autodecoding during lexing is being avoided, everything happens on thecode unit level.

Re: std.data.json formal review

Reply via email to