On Feb 20, 2008, at 6:10 PM, Mike Samuel wrote:
> JSON ⊂ ADsafe ⊂ Cajita ⊂ Caja ⊂ ES3 ⊂ ES4
People who know Unicode are dangerous ;).
Yes, we need more of you ;-).
There's three problems according to my reading of http://
www.ietf.org/rfc/rfc4627.txt but only the first is directly related
to syntax:
(1) There are JSON programs that are not valid ES programs.
The JSON program [ "\u2028" ] where the unicode escape is replaced
with its literal equivalent is valid according to JSON since the
set of characters that can appear in a string unescaped is
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
but ES does not allow codepoint 0x2028 or 0x2029 to appear
unescaped in a string since they are newline characters.
I wonder if JSON should not change on this point. Is there a use-case
for unescaped line/paragraph separators in strings?
(2) There are JSON programs that have the same text as ES programs
but different meaning.
ES262 says that all format control codepoints, such as 0x200C,
should be stripped out of the program in a pre-lex phase. This is
not consistently implemented:
eval("'\u200c'.length") == 0 on SpiderMonkey, and 1 on most
other interpreters
Not lately, meaning post-Firefox-2/JS1.7. Fresh js shell, same
results for Firefox 3 any beta:
js> eval("'\u200c'.length") == 0
false
js> eval("'\u200c'.length")
1
See https://bugzilla.mozilla.org/show_bug.cgi?id=274152, where
SpiderMonkey yields to IE JScript's flouting of ECMA-262. IE set a
real-world web standard, and for the better according to people in
certain locales.
According to https://bugzilla.mozilla.org/show_bug.cgi?id=368516#c34,
IE does not report illegal character errors correctly, instead
treating misplaced BOMs as identifiers whose references result in
runtime ReferenceErrors (I don't know what it does with other format-
control characters that occur outside of strings and regexps).
See also the follow-on bug to tolerate mislocated BOMs, https://
bugzilla.mozilla.org/show_bug.cgi?id=368516. Ain't the copy/paste
Internet grand?
JSON does not strip these characters out, so they are treated as
significant.
ES4 is specifying as a bug fix to match other browsers that format-
control characters shall not be stripped; it must also, to be a real-
world web standard, specify tolerance for mislocated BOMs. Postel's
Law bites back!
So JSON and ES4 will agree on this one.
(3) There are JSON programs that can be parsed to ES but that
cannot be serialized back to JSON without losing track of where
info was lost.
JSON does not put any limits on numbers, but ES does. ES will
treat 1e1000 as Infinity. Since JSON does not have a value
Infinity, it is unclear how to implement toJSON(fromJSON("[1e1000]")).
JSON's grammar is nice and simple, it facilitates exhaustive testing
(Rob Sayre used Koushik Sen's jCUTE to generate all-paths tests for a
Java implementation).
BigInts or BigNums could help in the future, but the installed base
will not have them for a while and their literal syntax, without a
pragma, will have a suffix.
This kind of edge case is unlikely to be a problem in practice,
although such "overflow" conditions recur throughout the security
exploit literature. Could JSON stand to grow support for the IEEE-754
non-finite values?
/be
_______________________________________________
Es4-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es4-discuss