Re: RFC: std.json sucessor

Sönke Ludwig via Digitalmars-d Mon, 25 Aug 2014 14:30:58 -0700

Am 25.08.2014 22:51, schrieb "Ola Fosheim Grøstad"<[email protected]>":

On Monday, 25 August 2014 at 20:35:32 UTC, Sönke Ludwig wrote:

BTW, JSON is *required* to be UTF encoded anyway as per RFC-7159,
which is another argument for just letting the lexer assume valid UTF.


The lexer cannot assume valid UTF since the client might be a rogue, but
it can just bail out if the lookahead isn't jSON? So UTF-validation is
limited to strings.

But why should UTF validation be the job of the lexer in the firstplace? D's "string" type is also defined to be UTF-8, so given that, itwould of course be free to assume valid UTF-8. I agree with Walter therethat validation/conversion should be added as a separate proxy range.But if we end up going for validating in the lexer, it would indeed beenough to validate inside strings, because the rest of the grammarassumes a subset of ASCII.


You have to parse the strings because of the \uXXXX escapes of course,
so some basic validation is unavoidable?

At least no UTF validation is needed. Since all non-ASCII characterswill always be composed of bytes >0x7F, a sequence \uXXXX can be assumedto be valid wherever in the string it occurs, and all other bytes thatdon't belong to an escape sequence are just passed through as-is.

But I guess full validation of
string content could be another useful option along with "ignore
escapes" for the case where you want to avoid decode-encode scenarios.
(like for a proxy, or if you store pre-escaped unicode in a database)

Re: RFC: std.json sucessor

Reply via email to