On 2018-03-19 15:17, Mike Samuel wrote:
On Mon, Mar 19, 2018 at 9:53 AM, Anders Rundgren <[email protected]
<mailto:[email protected]>> wrote:
On 2018-03-19 14:34, Mike Samuel wrote:
How does the transform you propose differ from?
JSON.canonicalize = (x) => JSON.stringify(
x,
(_, x) => {
if (x && typeof x === 'object' && !Array.isArray(x)) {
const sorted = {}
for (let key of Object.getOwnPropertyNames(x).sort()) {
sorted[key] = x[key]
}
return sorted
}
return x
})
Probably not all. You are the JS guru, not me :-)
The proposal says "in lexical (alphabetical) order."
If "lexical order" differs from the lexicographic order that sort uses,
then
the above could be adjusted to pass a comparator function.
I hope (and believe) that this is just a terminology problem.
I think you're right.
http://www.ecma-international.org/ecma-262/6.0/#sec-sortcompare
is where it's specified. After checking that no custom comparator is present:
1. Let/xString/beToString
<http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/x/).
2. ReturnIfAbrupt
<http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/xString/).
3. Let/yString/beToString
<http://www.ecma-international.org/ecma-262/6.0/#sec-tostring>(/y/).
4. ReturnIfAbrupt
<http://www.ecma-international.org/ecma-262/6.0/#sec-returnifabrupt>(/yString/).
5. If/xString/</yString/, return −1.
6. If/xString/>/yString/, return 1.
7. Return +0.
(<) and (>) do not themselves bring in any locale-specific collation rules.
They bottom out on
http://www.ecma-international.org/ecma-262/6.0/#sec-abstract-relational-comparison
If both/px/and/py/are Strings, then
1. If/py/is a prefix of/px/, return*false*. (A String value/p/is a prefix of
String value/q/if/q/can be the result of concatenating/p/and some other
String/r/. Note that any String is a prefix of itself, because/r/may be the
empty String.)
2. If/px/is a prefix of/py/, return*true*.
3. Let/k/be the smallest nonnegative integer such that the code unit at
index/k/within/px/is different from the code unit at index/k/within/py/. (There
must be such a/k/, for neither String is a prefix of the other.)
4. Let/m/be the integer that is the code unit value at index/k/within/px/.
5. Let/n/be the integer that is the code unit value at index/k/within/py/.
6. If/m/</n/, return*true*. Otherwise, return*false*.
Those code unit values are UTF-16 code unit values per
http://www.ecma-international.org/ecma-262/6.0/#sec-ecmascript-language-types-string-type
each element in the String is treated as a UTF-16 code unit value
As someone mentioned earlier in this thread, lexicographic string comparisons
that use different code
unit sizes can compute different results for the same semantic string value.
Between UTF-8 and UTF-32
you should see no difference, but UTF-16 can differ from those given
supplementary codepoints.
It might be worth making explicit that your lexical order is over UTF-16
strings if that's what you intend.
Right, it is actually already in 3.2.3:
Property strings to be sorted depend on that strings are represented
as arrays of 16-bit unsigned integers where each integer holds a single
UCS2/UTF-16 [UNICODE] code unit. The sorting is based on pure value
comparisons, independent of locale settings.
This maps "natively" to JS and Java. Probably to .NET as well.
Other systems may need a specific comparator.
Applied to your example input,
JSON.canonicalize({
"escaping": "\u20ac$\u000F\u000aA'\u0042\u0022\u005c\\\"\/",
"other": [null, true, false],
"numbers": [1E30, 4.50, 6, 2e-3, 0.000000000000000000000000001]
}) ===
String.raw`{"escaping":"€$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}`
// proposed
{"escaping":"\u20ac$\u000f\nA'B\"\\\\\"/","numbers":[1e+30,4.5,6,0.002,1e-27],"other":[null,true,false]}
The canonicalized example from section 3.2.3 seems to conflict with the
text of 3.2.2:
If you look a under the result you will find a pretty sad explanation:
"Note: \u20ac denotes the Euro character, which not
being ASCII, is currently not displayable in RFCs"
Cool.
After 30 years with RFCs, we can still only use ASCII :-( :-(
Updates:
https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md
<https://github.com/cyberphone/json-canonicalization/blob/master/JSON.canonicalize.md>
https://cyberphone.github.io/doc/security/browser-json-canonicalization.html
<https://cyberphone.github.io/doc/security/browser-json-canonicalization.html>
If this can be implemented in a small amount of library code, what do you need
from TC39?
At this stage probably nothing, the BIG issue is the algorithm which I took the
liberty airing in this forum.
To date all efforts creating a JSON canonicalization standard has been shot
down or been abandoned.
Anders
Anders
"""
If the Unicode value is outside of the ASCII control character range, it MUST be serialized
"as is" unless it is equivalent to 0x005c (\) or 0x0022 (") which MUST be serialized
as \\ and \" respectively.
"""
So I think the "\u20ac" should actually be "€" and the implementation
above matches your proposal.
On Fri, Mar 16, 2018 at 3:16 AM, Anders Rundgren <[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>> wrote:
Dear List,
Here is a proposal that I would be very happy getting feedback on
since it builds on ES but is not (at all) limited to ES.
The request is for a complement to the ES "JSON" object called
canonicalize() which would have identical parameters to the existing stringify() method.
The JSON canonicalization scheme (including ES code for emulating
it), is described in:
https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html
<https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>
<https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html
<https://cyberphone.github.io/doc/security/draft-rundgren-json-canonicalization-scheme.html>>
Current workspace: https://github.com/cyberphone/json-canonicalization
<https://github.com/cyberphone/json-canonicalization>
<https://github.com/cyberphone/json-canonicalization
<https://github.com/cyberphone/json-canonicalization>>
Thanx,
Anders Rundgren
_______________________________________________
es-discuss mailing list
[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
https://mail.mozilla.org/listinfo/es-discuss
<https://mail.mozilla.org/listinfo/es-discuss>
<https://mail.mozilla.org/listinfo/es-discuss
<https://mail.mozilla.org/listinfo/es-discuss>>
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss