Re: JSON.canonicalize()

Mike Samuel Mon, 19 Mar 2018 07:17:56 -0700

On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <
[email protected]> wrote:


> On 2018-03-19 14:34, Mike Samuel wrote:
>
>> How does the transform you propose differ from?
>>
>> JSON.canonicalize = (x) => JSON.stringify(
>>      x,
>>      (_, x) => {
>>        if (x && typeof x === 'object' && !Array.isArray(x)) {
>>          const sorted = {}
>>          for (let key of Object.getOwnPropertyNames(x).sort()) {
>>            sorted[key] = x[key]
>>          }
>>          return sorted
>>        }
>>        return x
>>      })
>>
>
> Probably not all.  You are the JS guru, not me :-)
>
>
>> The proposal says "in lexical (alphabetical) order."
>> If "lexical order" differs from the lexicographic order that sort uses,
>> then
>> the above could be adjusted to pass a comparator function.
>>
>
> I hope (and believe) that this is just a terminology problem.
>

I think you're right.
http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
is where it's specified.  After checking that no custom comparator is
present:

   1. Let *xString* be ToString
   <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(*x*).
   2. ReturnIfAbrupt
   <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(
   *xString*).
   3. Let *yString* be ToString
   <http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(*y*).
   4. ReturnIfAbrupt
   <http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(
   *yString*).
   5. If *xString* < *yString*, return −1.
   6. If *xString* > *yString*, return 1.
   7. Return +0.


(<) and (>) do not themselves bring in any locale-specific collation rules.
They bottom out on
http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison

If both *px* and *py* are Strings, then

   1. If *py* is a prefix of *px*, return *false*. (A String value *p* is a
   prefix of String value *q* if *q* can be the result of concatenating *p* and
   some other String *r*. Note that any String is a prefix of itself,
   because *r* may be the empty String.)
   2. If *px* is a prefix of *py*, return *true*.
   3. Let *k* be the smallest nonnegative integer such that the code unit
   at index *k* within *px* is different from the code unit at index *k*
   within *py*. (There must be such a *k*, for neither String is a prefix
   of the other.)
   4. Let *m* be the integer that is the code unit value at index *k* within
    *px*.
   5. Let *n* be the integer that is the code unit value at index *k* within
    *py*.
   6. If *m* < *n*, return *true*. Otherwise, return *false*.

Those code unit values are UTF-16 code unit values per
http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string
comparisons that use different code
unit sizes can compute different results for the same semantic string
value.  Between UTF-8 and UTF-32
you should see no difference, but UTF-16 can differ from those given
supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16
strings if that's what you intend.



> Applied to your example input,
>>
>> JSON.canonicalize({
>>      "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
>>      "other":  [null, true, false],
>>      "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
>>    }) ===
>>        String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[
>> 1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
>> // proposed {"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4
>> .5,6,0.002,1e-27],"other":[null,true,false]}
>>
>>
>> The canonicalized example from section 3.2.3 seems to conflict with the
>> text of 3.2.2:
>>
>
> If you look a under the result you will find a pretty sad explanation:
>
>         "Note: \u20ac denotes the Euro character, which not
>          being ASCII, is currently not displayable in RFCs"
>

Cool.


> After 30 years with RFCs, we can still only use ASCII :-( :-(
>
> Updates:
> https://github.com/cyberphone/json-canonicalization/blob/mas
> ter/JSON.canonicalize.md
> https://cyberphone.github.io/doc/security/browser-json-canon
> icalization.html
>

If this can be implemented in a small amount of library code, what do you
need from TC39?



> Anders
>
>
>> """
>> If the Unicode value is outside of the ASCII control character range, it
>> MUST be serialized "as is" unless it is equivalent to 0x005c (\) or
>> 0x0022 (") which MUST be serialized as \\ and \" respectively.
>> """
>>
>> So I think the "\u20ac" should actually be "€" and the implementation
>> above matches your proposal.
>>
>>
>> On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <
>> [email protected] <mailto:[email protected]>>
>> wrote:
>>
>>     Dear List,
>>
>>     Here is a proposal that I would be very happy getting feedback on
>> since it builds on ES but is not (at all) limited to ES.
>>
>>     The request is for a complement to the ES "JSON" object called
>> canonicalize() which would have identical parameters to the existing
>> stringify() method.
>>
>>     The JSON canonicalization scheme (including ES code for emulating
>> it), is described in:
>>     https://cyberphone.github.io/doc/security/draft-rundgren-jso
>> n-canonicalization-scheme.html <https://cyberphone.github.io/
>> doc/security/draft-rundgren-json-canonicalization-scheme.html>
>>
>>     Current workspace: https://github.com/cyberphone/
>> json-canonicalization <https://github.com/cyberphone
>> /json-canonicalization>
>>
>>     Thanx,
>>     Anders Rundgren
>>     _______________________________________________
>>     es-discuss mailing list
>>     [email protected] <mailto:[email protected]>
>>     https://mail.mozilla.org/listinfo/es-discuss <
>> https://mail.mozilla.org/listinfo/es-discuss>
>>
>>
>>
>

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: JSON.canonicalize()

Reply via email to