https://issues.dlang.org/show_bug.cgi?id=16639

--- Comment #3 from github-bugzi...@puremagic.com ---
Commits pushed to master at https://github.com/dlang/phobos

https://github.com/dlang/phobos/commit/20353df3d92811345ff2b74762398008ed776730
Fix issue 16639 - Review std.json wrt this article on JSON edge cases and
ambiguities

The test corpus provided at https://github.com/nst/JSONTestSuite/ revealed some
issues with the std.json.parseJSON function. Since addressing some of the
issues required parseJSON to reject input it previously accepted, I have added
a new JSONOptions.strictParsing flag so callers can opt-in to the stricter
parsing.

The issues, and how I've addressed them, are listed below (approximately from
most severe to least):

Silently dropping ASCII NUL characters from strings:
n_string_unescaped_crtl_char.json

This is the most serious problem I found while fixing the test cases. The
current implementation of parseJSON() uses a helper function called peekChar()
which can store the next character to handle in a variable of type Char (an
alias of the character type). Unfortunately it was using 0 to indicate it has
not read a character yet so if an ASCII NUL (which will have the value 0) is
present in the text and someone reads it with peekChar() then it will
effectively be skipped over, which was happening in string and whitespace
parsing.

I changed peekChar() to use a Nullable!Char as the temporary storage for the
next character to disambiguate the case where there is no pending unconsumed
character from the case where there is a pending unconsumed ASCII NUL. In
strict mode JSON with unescaped ASCII NULs in strings will throw an exception
while in non-strict mode the JSON will be accepted with the NUL included in the
string value.

Failure to accept ASCII DEL (0x7f) unescaped in strings:
y_string_unescaped_char_delete.json
y_string_with_del_character.json

These were the only test cases that std.json rejected that it should have
accepted. This issue was addressed by changing the string parsing logic to
explicitly check for character values < 0x20 instead of using
std.ascii.isControl (which also returned true for 0x7f), with a special
exception for ASCII NULs in non-strict mode as mentioned above.

Parsing "true", "false", and "null" tokens case-insensitively:
n_structure_capitalized_True.json

In strict mode those tokens are now parsed case-sensitively.

Accepting control characters other than ' ', '\t', '\r', and '\n' as
whitespace:
n_structure_null-byte-outside-string.json
n_structure_whitespace_formfeed.json

In strict mode only the listed characters are accepted as whitespace, while
non-strict mode continues to use std.ascii.isWhite with an additional exception
for ASCII NUL for a similar reason as the n_string_unescaped_ctrl_char.json
case (the skipWhitespace() function used peekChar() so it didn't handle ASCII
NULs consistently; non-strict mode after my changes is actually more permissive
than the previous behavior but it is at least consistently permissive).

Silently accepting empty data:
n_structure_no_data.json

In strict mode an exception is now thrown instead of returning an empty value.

Failure to enforce that numbers beginning with 0 cannot have any additional
digits in the non-fractional part:
n_number_-01.json
n_number_neg_int_starting_with_zero.json
n_number_with_leading_zero.json

An additional check is now performed in strict mode when the whole part of a
number begins with zero to ensure trailing digits are not present.

Failure to check for trailing characters after parsing:
n_array_comma_after_close.json
n_array_extra_close.json
n_multidigit_number_then_00.json
n_object_trailing_comment.json
n_object_trailing_comment_open.json
n_object_trailing_comment_slash_open_incomplete.json
n_object_trailing_comment_slash_open.json
n_object_with_trailing_garbage.json
n_string_with_trailing_garbage.json
n_structure_array_trailing_garbage.json
n_structure_array_with_extra_array_close.json
n_structure_close_unopened_array.json
n_structure_double_array.json
n_structure_number_with_trailing_garbage.json
n_structure_object_followed_by_closing_object.json
n_structure_object_with_trailing_garbage.json
n_structure_trailing_#.json

An additional check is now performed in strict mode to ensure any trailing
characters after the initial JSON value are only whitespace.

In addition to the above issues, parseJSON() will throw ConvException for
numbers out of the range of double/long/ulong which was not previously
documented. I have updated the ddoc comment to reference that exception.

https://github.com/dlang/phobos/commit/25951d6f7aeaee54fb308f6d6b9d092c3ee09bb2
Merge pull request #6617 from tylerknott/issue-16639

Fix issue 16639 - Review std.json wrt this article on JSON edge cases and
ambiguities
merged-on-behalf-of: Sebastian Wilzbach <sebi.wilzb...@gmail.com>

--

Reply via email to