On Mon, Mar 19, 2018 at 10:30 AM, Anders Rundgren < anders.rundgren....@gmail.com> wrote:
> On 2018-03-19 15:17, Mike Samuel wrote: > >> >> >> On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren < >> anders.rundgren....@gmail.com <mailto:anders.rundgren....@gmail.com>> >> wrote: >> >> On 2018-03-19 14:34, Mike Samuel wrote: >> >> How does the transform you propose differ from? >> >> JSON.canonicalize = (x) => JSON.stringify( >> x, >> (_, x) => { >> if (x && typeof x === 'object' && !Array.isArray(x)) { >> const sorted = {} >> for (let key of Object.getOwnPropertyNames(x).sort()) { >> sorted[key] = x[key] >> } >> return sorted >> } >> return x >> }) >> >> >> Probably not all. You are the JS guru, not me :-) >> >> >> The proposal says "in lexical (alphabetical) order." >> If "lexical order" differs from the lexicographic order that sort >> uses, then >> the above could be adjusted to pass a comparator function. >> >> >> I hope (and believe) that this is just a terminology problem. >> >> >> I think you're right. http://www.ecma-international. >> org/ecma-262/6.0/#sec-sortcompare >> is where it's specified. After checking that no custom comparator is >> present: >> >> 1. Let/xString/beToString <http://www.ecma-international >> .org/ecma-262/6.0/#sec-tostring>(/x/). >> 2. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec- >> returnifabrupt>(/xString/). >> 3. Let/yString/beToString <http://www.ecma-international >> .org/ecma-262/6.0/#sec-tostring>(/y/). >> 4. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec- >> returnifabrupt>(/yString/). >> 5. If/xString/</yString/, return −1. >> 6. If/xString/>/yString/, return 1. >> 7. Return +0. >> >> >> (<) and (>) do not themselves bring in any locale-specific collation >> rules. >> They bottom out on http://www.ecma-international. >> org/ecma-262/6.0/#sec-abstract-relational-comparison >> >> If both/px/and/py/are Strings, then >> >> 1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a >> prefix of String value/q/if/q/can be the result of concatenating/p/and some >> other String/r/. Note that any String is a prefix of itself, because/r/may >> be the empty String.) >> 2. If/px/is a prefix of/py/, return*true*. >> 3. Let/k/be the smallest nonnegative integer such that the code unit at >> index/k/within/px/is different from the code unit at index/k/within/py/. >> (There must be such a/k/, for neither String is a prefix of the other.) >> 4. Let/m/be the integer that is the code unit value at >> index/k/within/px/. >> 5. Let/n/be the integer that is the code unit value at >> index/k/within/py/. >> 6. If/m/</n/, return*true*. Otherwise, return*false*. >> >> Those code unit values are UTF-16 code unit values per >> http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascri >> pt-language-types-string-type >> >> each element in the String is treated as a UTF-16 code unit value >> >> As someone mentioned earlier in this thread, lexicographic string >> comparisons that use different code >> unit sizes can compute different results for the same semantic string >> value. Between UTF-8 and UTF-32 >> you should see no difference, but UTF-16 can differ from those given >> supplementary codepoints. >> >> It might be worth making explicit that your lexical order is over UTF-16 >> strings if that's what you intend. >> > > Right, it is actually already in 3.2.3: > My apologies. I missed that. Property strings to be sorted depend on that strings are represented > as arrays of 16-bit unsigned integers where each integer holds a single > UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value > comparisons, independent of locale settings. > > This maps "natively" to JS and Java. Probably to .NET as well. > Other systems may need a specific comparator. > Yep. Off the top of my head: Go and Rust use UTF-8. Python3 is UTF-16, Python2 is usually UTF-16 but may be UTF-32 depending on sizeof(wchar) when compiling the interpreter. C++ as is its wont is all of them. > >> Applied to your example input, >> >> JSON.canonicalize({ >> "escaping": "\u20ac$\u000F\u000aA'\u0042\u >> 0022\u005c\\\"\/", >> "other": [null, true, false], >> "numbers": [1E30, 4.50, 6, 2e-3, >> 0.000000000000000000000000001] >> }) === >> String.raw`{"escaping":"€$\u00 >> 0f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"othe >> r":[null,true,false]}` >> // proposed {"escaping":"\u20ac$\u000f\nA' >> B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[nul >> l,true,false]} >> >> >> The canonicalized example from section 3.2.3 seems to conflict >> with the text of 3.2.2: >> >> >> If you look a under the result you will find a pretty sad explanation: >> >> "Note: \u20ac denotes the Euro character, which not >> being ASCII, is currently not displayable in RFCs" >> >> >> Cool. >> >> After 30 years with RFCs, we can still only use ASCII :-( :-( >> >> Updates: >> https://github.com/cyberphone/json-canonicalization/blob/mas >> ter/JSON.canonicalize.md <https://github.com/cyberphone >> /json-canonicalization/blob/master/JSON.canonicalize.md> >> https://cyberphone.github.io/doc/security/browser-json-canon >> icalization.html <https://cyberphone.github.io/ >> doc/security/browser-json-canonicalization.html> >> >> >> If this can be implemented in a small amount of library code, what do you >> need from TC39? >> > > At this stage probably nothing, the BIG issue is the algorithm which I > took the liberty airing in this forum. > To date all efforts creating a JSON canonicalization standard has been > shot down or been abandoned. > Like I said, I think the hashing use case is worthwhile.
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss