Re: JSON.canonicalize()

Anders Rundgren Mon, 19 Mar 2018 07:31:52 -0700

On 2018-03-19 15:17, Mike Samuel wrote:



On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <[email protected] 
<mailto:[email protected]>> wrote:

    On 2018-03-19 14:34, Mike Samuel wrote:

        How does the transform you propose differ from?

        JSON.canonicalize = (x) => JSON.stringify(
              x,
              (_, x) => {
                if (x && typeof x === 'object' && !Array.isArray(x)) {
                  const sorted = {}
                  for (let key of Object.getOwnPropertyNames(x).sort()) {
                    sorted[key] = x[key]
                  }
                  return sorted
                }
                return x
              })


    Probably not all.  You are the JS guru, not me :-)


        The proposal says "in lexical (alphabetical) order."
        If "lexical order" differs from the lexicographic order that sort uses, 
then
        the above could be adjusted to pass a comparator function.


    I hope (and believe) that this is just a terminology problem.


I think you're right. 
http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
is where it's specified.  After checking that no custom comparator is present:

 1. Let/xString/beToString 
<http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/x/).
 2. ReturnIfAbrupt 
<http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/xString/).
 3. Let/yString/beToString 
<http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/y/).
 4. ReturnIfAbrupt 
<http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/yString/).
 5. If/xString/</yString/, return −1.
 6. If/xString/>/yString/, return 1.
 7. Return +0.


(<) and (>) do not themselves bring in any locale-specific collation rules.
They bottom out on 
http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison

If both/px/and/py/are Strings, then

 1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a prefix of 
String value/q/if/q/can be the result of concatenating/p/and some other 
String/r/. Note that any String is a prefix of itself, because/r/may be the 
empty String.)
 2. If/px/is a prefix of/py/, return*true*.
 3. Let/k/be the smallest nonnegative integer such that the code unit at 
index/k/within/px/is different from the code unit at index/k/within/py/. (There 
must be such a/k/, for neither String is a prefix of the other.)
 4. Let/m/be the integer that is the code unit value at index/k/within/px/.
 5. Let/n/be the integer that is the code unit value at index/k/within/py/.
 6. If/m/</n/, return*true*. Otherwise, return*false*.

Those code unit values are UTF-16 code unit values per
http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type

each element in the String is treated as a UTF-16 code unit value

As someone mentioned earlier in this thread, lexicographic string comparisons 
that use different code
unit sizes can compute different results for the same semantic string value.  
Between UTF-8 and UTF-32
you should see no difference, but UTF-16 can differ from those given 
supplementary codepoints.

It might be worth making explicit that your lexical order is over UTF-16 
strings if that's what you intend.


Right, it is actually already in 3.2.3:

  Property strings to be sorted depend on that strings are represented
  as arrays of 16-bit unsigned integers where each integer holds a single
  UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
  comparisons, independent of locale settings.

This maps "natively" to JS and Java.  Probably to .NET as well.
Other systems may need a specific comparator.


        Applied to your example input,

        JSON.canonicalize({
              "escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
              "other":  [null, true, false],
              "numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
            }) ===
                
String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
        // proposed 
{"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}


        The canonicalized example from section 3.2.3 seems to conflict with the 
text of 3.2.2:


    If you look a under the result you will find a pretty sad explanation:

             "Note: \u20ac denotes the Euro character, which not
              being ASCII, is currently not displayable in RFCs"


Cool.

    After 30 years with RFCs, we can still only use ASCII :-( :-(

    Updates:
    
https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md 
<https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md>
    https://cyberphone.github.io/doc/security/browser-json-canonicalization.html 
<https://cyberphone.github.io/doc/security/browser-json-canonicalization.html>


If this can be implemented in a small amount of library code, what do you need 
from TC39?


At this stage probably nothing, the BIG issue is the algorithm which I took the 
liberty airing in this forum.
To date all efforts creating a JSON canonicalization standard has been shot 
down or been abandoned.

Anders


    Anders


        """
        If the Unicode value is outside of the ASCII control character range, it MUST be serialized 
"as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized 
as \\ and \" respectively.
        """

        So I think the "\u20ac" should actually be "€" and the implementation 
above matches your proposal.


        On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[email protected] 
<mailto:[email protected]> <mailto:[email protected] 
<mailto:[email protected]>>> wrote:

             Dear List,

             Here is a proposal that I would be very happy getting feedback on 
since it builds on ES but is not (at all) limited to ES.

             The request is for a complement to the ES "JSON" object called 
canonicalize() which would have identical parameters to the existing stringify() method.

             The JSON canonicalization scheme (including ES code for emulating 
it), is described in:
        
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html 
<https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>
 <https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html 
<https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>>

             Current workspace: https://github.com/cyberphone/json-canonicalization 
<https://github.com/cyberphone/json-canonicalization> 
<https://github.com/cyberphone/json-canonicalization 
<https://github.com/cyberphone/json-canonicalization>>

             Thanx,
             Anders Rundgren
             _______________________________________________
             es-discuss mailing list
        [email protected] <mailto:[email protected]> 
<mailto:[email protected] <mailto:[email protected]>>
        https://mail.mozilla.org/listinfo/es-discuss 
<https://mail.mozilla.org/listinfo/es-discuss> 
<https://mail.mozilla.org/listinfo/es-discuss 
<https://mail.mozilla.org/listinfo/es-discuss>>


_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: JSON.canonicalize()

Reply via email to