On 2013-03-23 21:08, Jonathan M Davis wrote:
Curious. According to this page (
http://www.aivosto.com/vbtips/control-characters.html ) both space and delete
are ASCII control characters (though
neither std.ascii nor C's iscntrl deem space to be a control character), but
neither of them are control characters according to recent Unicode standards.
This section on DEL
http://www.aivosto.com/vbtips/control-characters.html#DEL
seems to say that DEL should basically be ignored. It seems to think that NUL
should be treated the same way (and basically complains that languages like C
ever treated it as a terminator).
If I look at the RFC for json ( http://www.rfc-editor.org/rfc/rfc4627.txt ),
it specifically lists control characters as being U+0000 through U+001F, which
does _not_ include DEL or _any_ Unicode-specific control character. So, using
either std.ascii or std.uni's isControl would be wrong. It specifically needs
to check whether a character is < 32 when checking for control characters.
And the grammar rule for string is
string = quotation-mark *char quotation-mark
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
escape = %x5C ; \
quotation-mark = %x22 ; "
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
So, it looks like the only characters that should be considered valid inside
the double-quotes of a string which aren't escaped are / (which indicates the
beginning of an escape sequence), and the characters listed in unescaped. So,
in decimal, that would be 32 and 33, 35 - 91, and everything 93 and greater
(up to 10FFFF). DEL is 127, so it should be considered valid.
So, if std.json is using isControl, my guess is that whoever wrote that was
not careful enough with the grammar (though it's easy enough to assume that
everyone means the same thing by control characters), and I'd be concerned
that std.json is not handling this set of grammar rules correctly with more
characters than just DEL.
I see. Yes, one could think that "control character" would mean the same
thing in every situation for a given encoding.
--
/Jacob Carlborg