On Feb 20, 2008, at 6:10 PM, Mike Samuel wrote:

>     JSON ⊂ ADsafe ⊂ Cajita ⊂ Caja ⊂ ES3 ⊂ ES4

People who know Unicode are dangerous ;).

Yes, we need more of you ;-).

There's three problems according to my reading of http:// www.ietf.org/rfc/rfc4627.txt but only the first is directly related to syntax:

(1) There are JSON programs that are not valid ES programs.
The JSON program [ "\u2028" ] where the unicode escape is replaced with its literal equivalent is valid according to JSON since the set of characters that can appear in a string unescaped is
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
but ES does not allow codepoint 0x2028 or 0x2029 to appear unescaped in a string since they are newline characters.

I wonder if JSON should not change on this point. Is there a use-case for unescaped line/paragraph separators in strings?

(2) There are JSON programs that have the same text as ES programs but different meaning. ES262 says that all format control codepoints, such as 0x200C, should be stripped out of the program in a pre-lex phase. This is not consistently implemented: eval("'\u200c'.length") == 0 on SpiderMonkey, and 1 on most other interpreters

Not lately, meaning post-Firefox-2/JS1.7. Fresh js shell, same results for Firefox 3 any beta:

js> eval("'\u200c'.length") == 0
false
js> eval("'\u200c'.length")
1

See https://bugzilla.mozilla.org/show_bug.cgi?id=274152, where SpiderMonkey yields to IE JScript's flouting of ECMA-262. IE set a real-world web standard, and for the better according to people in certain locales.

According to https://bugzilla.mozilla.org/show_bug.cgi?id=368516#c34, IE does not report illegal character errors correctly, instead treating misplaced BOMs as identifiers whose references result in runtime ReferenceErrors (I don't know what it does with other format- control characters that occur outside of strings and regexps).

See also the follow-on bug to tolerate mislocated BOMs, https:// bugzilla.mozilla.org/show_bug.cgi?id=368516. Ain't the copy/paste Internet grand?

JSON does not strip these characters out, so they are treated as significant.

ES4 is specifying as a bug fix to match other browsers that format- control characters shall not be stripped; it must also, to be a real- world web standard, specify tolerance for mislocated BOMs. Postel's Law bites back!

So JSON and ES4 will agree on this one.

(3) There are JSON programs that can be parsed to ES but that cannot be serialized back to JSON without losing track of where info was lost. JSON does not put any limits on numbers, but ES does. ES will treat 1e1000 as Infinity. Since JSON does not have a value Infinity, it is unclear how to implement toJSON(fromJSON("[1e1000]")).

JSON's grammar is nice and simple, it facilitates exhaustive testing (Rob Sayre used Koushik Sen's jCUTE to generate all-paths tests for a Java implementation).

BigInts or BigNums could help in the future, but the installed base will not have them for a while and their literal syntax, without a pragma, will have a suffix.

This kind of edge case is unlikely to be a problem in practice, although such "overflow" conditions recur throughout the security exploit literature. Could JSON stand to grow support for the IEEE-754 non-finite values?

/be
_______________________________________________
Es4-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es4-discuss

Reply via email to