Re: JSON.canonicalize()

Mike Samuel Mon, 19 Mar 2018 08:03:46 -0700

On Mon, Mar 19, 2018 at 10:30 AM, Anders Rundgren <
anders.rundgren....@gmail.com> wrote:


> On 2018-03-19 15:17, Mike Samuel wrote:
>
>>
>>
>> On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <
>> anders.rundgren....@gmail.com <mailto:anders.rundgren....@gmail.com>>
>> wrote:
>>
>>     On 2018-03-19 14:34, Mike Samuel wrote:
>>
>>         How does the transform you propose differ from?
>>
>>         JSON.canonicalize = (x) => JSON.stringify(
>>               x,
>>               (_, x) => {
>>                 if (x && typeof x === 'object' && !Array.isArray(x)) {
>>                   const sorted = {}
>>                   for (let key of Object.getOwnPropertyNames(x).sort()) {
>>                     sorted[key] = x[key]
>>                   }
>>                   return sorted
>>                 }
>>                 return x
>>               })
>>
>>
>>     Probably not all.  You are the JS guru, not me :-)
>>
>>
>>         The proposal says "in lexical (alphabetical) order."
>>         If "lexical order" differs from the lexicographic order that sort
>> uses, then
>>         the above could be adjusted to pass a comparator function.
>>
>>
>>     I hope (and believe) that this is just a terminology problem.
>>
>>
>> I think you're right. http://www.ecma-international.
>> org/ecma-262/6.0/#sec-sortcompare
>> is where it's specified.  After checking that no custom comparator is
>> present:
>>
>>  1. Let/xString/beToString <http://www.ecma-international
>> .org/ecma-262/6.0/#sec-tostring>(/x/).
>>  2. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-
>> returnifabrupt>(/xString/).
>>  3. Let/yString/beToString <http://www.ecma-international
>> .org/ecma-262/6.0/#sec-tostring>(/y/).
>>  4. ReturnIfAbrupt <http://www.ecma-international.org/ecma-262/6.0/#sec-
>> returnifabrupt>(/yString/).
>>  5. If/xString/</yString/, return −1.
>>  6. If/xString/>/yString/, return 1.
>>  7. Return +0.
>>
>>
>> (<) and (>) do not themselves bring in any locale-specific collation
>> rules.
>> They bottom out on http://www.ecma-international.
>> org/ecma-262/6.0/#sec-abstract-relational-comparison
>>
>> If both/px/and/py/are Strings, then
>>
>>  1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a
>> prefix of String value/q/if/q/can be the result of concatenating/p/and some
>> other String/r/. Note that any String is a prefix of itself, because/r/may
>> be the empty String.)
>>  2. If/px/is a prefix of/py/, return*true*.
>>  3. Let/k/be the smallest nonnegative integer such that the code unit at
>> index/k/within/px/is different from the code unit at index/k/within/py/.
>> (There must be such a/k/, for neither String is a prefix of the other.)
>>  4. Let/m/be the integer that is the code unit value at
>> index/k/within/px/.
>>  5. Let/n/be the integer that is the code unit value at
>> index/k/within/py/.
>>  6. If/m/</n/, return*true*. Otherwise, return*false*.
>>
>> Those code unit values are UTF-16 code unit values per
>> http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascri
>> pt-language-types-string-type
>>
>> each element in the String is treated as a UTF-16 code unit value
>>
>> As someone mentioned earlier in this thread, lexicographic string
>> comparisons that use different code
>> unit sizes can compute different results for the same semantic string
>> value.  Between UTF-8 and UTF-32
>> you should see no difference, but UTF-16 can differ from those given
>> supplementary codepoints.
>>
>> It might be worth making explicit that your lexical order is over UTF-16
>> strings if that's what you intend.
>>
>
> Right, it is actually already in 3.2.3:
>

My apologies.  I missed that.

  Property strings to be sorted depend on that strings are represented
>   as arrays of 16-bit unsigned integers where each integer holds a single
>   UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
>   comparisons, independent of locale settings.
>
> This maps "natively" to JS and Java.  Probably to .NET as well.
> Other systems may need a specific comparator.
>

Yep.  Off the top of my head:
Go and Rust use UTF-8.
Python3 is UTF-16, Python2 is usually UTF-16 but may be UTF-32 depending on
sizeof(wchar) when compiling the interpreter.
C++ as is its wont is all of them.



>
>>         Applied to your example input,
>>
>>         JSON.canonicalize({
>>               "escaping": "\u20ac$\u000F\u000aA'\u0042\u
>> 0022\u005c\\\"\/",
>>               "other":  [null, true, false],
>>               "numbers": [1E30, 4.50, 6, 2e-3,
>> 0.000000000000000000000000001]
>>             }) ===
>>                 String.raw`{"escaping":"€$\u00
>> 0f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"othe
>> r":[null,true,false]}`
>>         // proposed {"escaping":"\u20ac$\u000f\nA'
>> B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[nul
>> l,true,false]}
>>
>>
>>         The canonicalized example from section 3.2.3 seems to conflict
>> with the text of 3.2.2:
>>
>>
>>     If you look a under the result you will find a pretty sad explanation:
>>
>>              "Note: \u20ac denotes the Euro character, which not
>>               being ASCII, is currently not displayable in RFCs"
>>
>>
>> Cool.
>>
>>     After 30 years with RFCs, we can still only use ASCII :-( :-(
>>
>>     Updates:
>>     https://github.com/cyberphone/json-canonicalization/blob/mas
>> ter/JSON.canonicalize.md <https://github.com/cyberphone
>> /json-canonicalization/blob/master/JSON.canonicalize.md>
>>     https://cyberphone.github.io/doc/security/browser-json-canon
>> icalization.html <https://cyberphone.github.io/
>> doc/security/browser-json-canonicalization.html>
>>
>>
>> If this can be implemented in a small amount of library code, what do you
>> need from TC39?
>>
>
> At this stage probably nothing, the BIG issue is the algorithm which I
> took the liberty airing in this forum.
> To date all efforts creating a JSON canonicalization standard has been
> shot down or been abandoned.
>

Like I said, I think the hashing use case is worthwhile.

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: JSON.canonicalize()

Reply via email to