On Sun, Mar 18, 2018 at 10:08 AM, Richard Gibson <[email protected]> wrote:
> On Sunday, March 18, 2018, Anders Rundgren <[email protected]> > wrote: > >> On 2018-03-16 20:24, Richard Gibson wrote: >> >> Though ECMAScript JSON.stringify may suffice for certain >> Javascript-centric use cases or otherwise restricted subsets thereof as >> addressed by JOSE, it is not suitable for producing >> canonical/hashable/etc. JSON, which requires a fully general solution such >> as [1]. Both its number serialization [2] and string serialization [3] >> specify aspects that harm compatibility (the former having arbitrary >> branches dependent upon the value of numbers, the latter being capable of >> producing invalid UTF-8 octet sequences that represent unpaired surrogate >> code points—unacceptable for exchange outside of a closed ecosystem [4]). >> JSON is a general *language-agnostic* interchange format, and ECMAScript >> JSON.stringify is *not* a JSON canonicalization solution. >> >> [1]: *http://gibson042.github.io/canonicaljson-spec/ >> <http://gibson042.github.io/canonicaljson-spec/>* >> [2]: http://ecma-international.org/ecma-262/7.0/#sec-tostrin >> g-applied-to-the-number-type >> [3]: http://ecma-international.org/ecma-262/7.0/#sec-quotejsonstring >> [4]: https://tools.ietf.org/html/rfc8259#section-8.1 >> >> >> Richard, I may be wrong but AFAICT, our respective canoncalization >> schemes are in fact principally IDENTICAL. >> > > In that they have the same goal, yes. In that they both achieve that goal, > no. I'm not married to choices like exponential notation and uppercase > escapes, but a JSON canonicalization scheme MUST cover all of JSON. > > >> That the number serialization provided by JSON.stringify() is >> unacceptable, is not generally taken as a fact. I also think it looks a >> bit weird, but that's just a matter of esthetics. Compatibility is an >> entirely different issue. >> > > I concede this point. The modified algorithm is sufficient, but note that > a canonicalization scheme will remain static even if ECMAScript changes. > Does this mean that the language below would need to be fixed at a specific version of Unicode or that we would need to cite a specific version for canonicalization but might allow a higher version for String.prototype.normalize and in future versions of the spec require it? http://www.ecma-international.org/ecma-262/6.0/#sec-conformance """ A conforming implementation of ECMAScript must interpret source text input in conformance with the Unicode Standard, Version 5.1.0 or later """ and in ECMA 404 <http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf> """ For undated references, the latest edition of the referenced document (including any amendments) applies. ISO/IEC 10646, Information Technology – Universal Coded Character Set (UCS) The Unicode Consortium. The Unicode Standard http://www.unicode.org/versions/latest. """ Sorting on Unicode Code Points is of course "technically 100% right" but >> strictly put not necessary. >> > > Certain scenarios call for different systems to _independently_ generate > equivalent data structures, and it is a necessary property of canonical > serialization that it yields identical results for equivalent data > structures. JSON does not specify significance of object member ordering, > so member ordering does not distinguish otherwise equivalent objects, so > canonicalization MUST specify member ordering that is deterministic with > respect to all valid data. > Code points include orphaned surrogates in a way that scalar values do not, right? So both "\uD800" and "\uD800\uDC00" are single codepoints. It seems like a strict prefix of a string should still sort before that string but prefix transitivity in general does not hold: "\uFFFF" < "\uD800\uDC00" && "\uFFFF" > "\uD800". That shouldn't cause problems for hashability but I thought I'd raise it just in case. > Your claim about uppercase Unicode escapes is incorrect, there is no such >> requirement: >> > https://tools.ietf.org/html/rfc8259#section-7 >> > > I don't recall ever making a claim about uppercase Unicode escapes, other > than observing that it is the preferred form for examples in the JSON > RFCs... what are you talking about? >
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

