Re: JSON.canonicalize()

Mike Samuel Sun, 18 Mar 2018 07:30:06 -0700

On Sun, Mar 18, 2018 at 10:08 AM, Richard Gibson <[email protected]>
wrote:


> On Sunday, March 18, 2018, Anders Rundgren <[email protected]>
> wrote:
>
>> On 2018-03-16 20:24, Richard Gibson wrote:
>>
>> Though ECMAScript JSON.stringify may suffice for certain
>> Javascript-centric use cases or otherwise restricted subsets thereof as
>> addressed by JOSE, it is not suitable for producing
>> canonical/hashable/etc. JSON, which requires a fully general solution such
>> as [1]. Both its number serialization [2] and string serialization [3]
>> specify aspects that harm compatibility (the former having arbitrary
>> branches dependent upon the value of numbers, the latter being capable of
>> producing invalid UTF-8 octet sequences that represent unpaired surrogate
>> code points—unacceptable for exchange outside of a closed ecosystem [4]).
>> JSON is a general *language-agnostic* interchange format, and ECMAScript
>> JSON.stringify is *not* a JSON canonicalization solution.
>>
>> [1]: *http://gibson042.github.io/canonicaljson-spec/
>> <http://gibson042.github.io/canonicaljson-spec/>*
>> [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostrin
>> g-applied-to-the-number-type
>> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring
>> [4]: https://tools.ietf.org/html/rfc8259#section-8.1
>>
>>
>> Richard, I may be wrong but AFAICT, our respective canoncalization
>> schemes are in fact principally IDENTICAL.
>>
>
> In that they have the same goal, yes. In that they both achieve that goal,
> no. I'm not married to choices like exponential notation and uppercase
> escapes, but a JSON canonicalization scheme MUST cover all of JSON.
>
>
>> That the number serialization provided by JSON.stringify() is
>> unacceptable, is not generally taken as a fact.  I also think it looks a
>> bit weird, but that's just a matter of esthetics.  Compatibility is an
>> entirely different issue.
>>
>
> I concede this point. The modified algorithm is sufficient, but note that
> a canonicalization scheme will remain static even if ECMAScript changes.
>

Does this mean that the language below would need to be fixed at a specific
version of Unicode or that we would need to cite a specific version for
canonicalization but might allow a higher version for
String.prototype.normalize
and in future versions of the spec require it?

http://www.ecma-international.org/ecma-262/6.0/#sec-conformance
"""
A conforming implementation of ECMAScript must interpret source text input
in conformance with the Unicode Standard, Version 5.1.0 or later
"""

and in ECMA 404
<http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf>

"""
For undated references, the latest edition of the referenced document
(including any amendments) applies. ISO/IEC 10646, Information Technology –
Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode
Standard http://www.unicode.org/versions/latest.
"""


Sorting on Unicode Code Points is of course "technically 100% right" but
>> strictly put not necessary.
>>
>
> Certain scenarios call for different systems to _independently_ generate
> equivalent data structures, and it is a necessary property of canonical
> serialization that it yields identical results for equivalent data
> structures. JSON does not specify significance of object member ordering,
> so member ordering does not distinguish otherwise equivalent objects, so
> canonicalization MUST specify member ordering that is deterministic with
> respect to all valid data.
>

Code points include orphaned surrogates in a way that scalar values do not,
right?  So both "\uD800" and "\uD800\uDC00" are single codepoints.
It seems like a strict prefix of a string should still sort before that
string but prefix transitivity in general does not hold: "\uFFFF" <
"\uD800\uDC00" && "\uFFFF" > "\uD800".
That shouldn't cause problems for hashability but I thought I'd raise it
just in case.



> Your claim about uppercase Unicode escapes is incorrect, there is no such
>> requirement:
>>
> https://tools.ietf.org/html/rfc8259#section-7
>>
>
> I don't recall ever making a claim about uppercase Unicode escapes, other
> than observing that it is the preferred form for examples in the JSON
> RFCs... what are you talking about?
>

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: JSON.canonicalize()

Reply via email to