https://issues.dlang.org/show_bug.cgi?id=16639
--- Comment #3 from github-bugzi...@puremagic.com --- Commits pushed to master at https://github.com/dlang/phobos https://github.com/dlang/phobos/commit/20353df3d92811345ff2b74762398008ed776730 Fix issue 16639 - Review std.json wrt this article on JSON edge cases and ambiguities The test corpus provided at https://github.com/nst/JSONTestSuite/ revealed some issues with the std.json.parseJSON function. Since addressing some of the issues required parseJSON to reject input it previously accepted, I have added a new JSONOptions.strictParsing flag so callers can opt-in to the stricter parsing. The issues, and how I've addressed them, are listed below (approximately from most severe to least): Silently dropping ASCII NUL characters from strings: n_string_unescaped_crtl_char.json This is the most serious problem I found while fixing the test cases. The current implementation of parseJSON() uses a helper function called peekChar() which can store the next character to handle in a variable of type Char (an alias of the character type). Unfortunately it was using 0 to indicate it has not read a character yet so if an ASCII NUL (which will have the value 0) is present in the text and someone reads it with peekChar() then it will effectively be skipped over, which was happening in string and whitespace parsing. I changed peekChar() to use a Nullable!Char as the temporary storage for the next character to disambiguate the case where there is no pending unconsumed character from the case where there is a pending unconsumed ASCII NUL. In strict mode JSON with unescaped ASCII NULs in strings will throw an exception while in non-strict mode the JSON will be accepted with the NUL included in the string value. Failure to accept ASCII DEL (0x7f) unescaped in strings: y_string_unescaped_char_delete.json y_string_with_del_character.json These were the only test cases that std.json rejected that it should have accepted. This issue was addressed by changing the string parsing logic to explicitly check for character values < 0x20 instead of using std.ascii.isControl (which also returned true for 0x7f), with a special exception for ASCII NULs in non-strict mode as mentioned above. Parsing "true", "false", and "null" tokens case-insensitively: n_structure_capitalized_True.json In strict mode those tokens are now parsed case-sensitively. Accepting control characters other than ' ', '\t', '\r', and '\n' as whitespace: n_structure_null-byte-outside-string.json n_structure_whitespace_formfeed.json In strict mode only the listed characters are accepted as whitespace, while non-strict mode continues to use std.ascii.isWhite with an additional exception for ASCII NUL for a similar reason as the n_string_unescaped_ctrl_char.json case (the skipWhitespace() function used peekChar() so it didn't handle ASCII NULs consistently; non-strict mode after my changes is actually more permissive than the previous behavior but it is at least consistently permissive). Silently accepting empty data: n_structure_no_data.json In strict mode an exception is now thrown instead of returning an empty value. Failure to enforce that numbers beginning with 0 cannot have any additional digits in the non-fractional part: n_number_-01.json n_number_neg_int_starting_with_zero.json n_number_with_leading_zero.json An additional check is now performed in strict mode when the whole part of a number begins with zero to ensure trailing digits are not present. Failure to check for trailing characters after parsing: n_array_comma_after_close.json n_array_extra_close.json n_multidigit_number_then_00.json n_object_trailing_comment.json n_object_trailing_comment_open.json n_object_trailing_comment_slash_open_incomplete.json n_object_trailing_comment_slash_open.json n_object_with_trailing_garbage.json n_string_with_trailing_garbage.json n_structure_array_trailing_garbage.json n_structure_array_with_extra_array_close.json n_structure_close_unopened_array.json n_structure_double_array.json n_structure_number_with_trailing_garbage.json n_structure_object_followed_by_closing_object.json n_structure_object_with_trailing_garbage.json n_structure_trailing_#.json An additional check is now performed in strict mode to ensure any trailing characters after the initial JSON value are only whitespace. In addition to the above issues, parseJSON() will throw ConvException for numbers out of the range of double/long/ulong which was not previously documented. I have updated the ddoc comment to reference that exception. https://github.com/dlang/phobos/commit/25951d6f7aeaee54fb308f6d6b9d092c3ee09bb2 Merge pull request #6617 from tylerknott/issue-16639 Fix issue 16639 - Review std.json wrt this article on JSON edge cases and ambiguities merged-on-behalf-of: Sebastian Wilzbach <sebi.wilzb...@gmail.com> --