Re: JSON.canonicalize()

Mike Samuel Fri, 16 Mar 2018 11:31:03 -0700

On Fri, Mar 16, 2018 at 1:54 PM, C. Scott Ananian <[email protected]>
wrote:


> And just to be clear: I'm all for standardizing a canonical JSON form.  In
> addition to my 11-year-old attempt, there have been countless others, and
> still no *standard*.  I just want us to learn from the previous attempts
> and try to make something at least as good as everything which has come
> before, especially in terms of the various non-obvious considerations which
> individual implementors have discovered the hard way over the years.
>

I think the hashing use case is an important one.  At the risk of
bikeshedding, "canonical" seems to overstate the usefulness.  Many assume
that the canonical form of something is usually the one you use in
preference to any other equivalent.

If the integer-only restriction is relaxed (see below), then
* The proposed canonical form seems useful as an input to strong hash
functions.
* It seems usable as a complete message body, but not preferable due to
potential loss of precision.
* It seems usable but not preferable as a long-term storage format.
* It seems a source of additional risk when used in conjunction with other
common web languages.

If that is correct, Would people be averse to marketing this as "hashable
JSON" instead of "canonical JSON?"

------

Numbers

There seem to be 3 main forks in the design space w.r.t. numbers.  I'm sure
cscott has thought of more, but to make it clear why I think canonical JSON
is not very useful as a wire/storage format.

1. Integers only
    PROS: avoids floating point equality issues that have bedeviled many
systems
    CONS: can support only a small portion of the JSON value space
    CONS: small loss of precision risk with integers encoded from Decimal
values.
        For example, won't roundtrip Java BigDecimals.
2. Any numbers with minimal changes: dropping + signs, normalizing zeros,
    using a fixed threshold for scientific notation.
    PROS: supports whole JSON value-space
    CONS: less useful for hashing
    CONS: risks loss of precision when decoders decide based on presence of
       decimal point whether to represent as double or int.
3. Preserve textual representation.
    PROS: avoids loss of precision
    PROS: can support whole JSON value-space
    CONS: not very useful for hashing

It seems that there is a tradeoff between usefulness for hashing and the
ability to
support the whole JSON value-space.

Recommending this as a wire / storage format further complicates that
tradeoff.

Regardless of which fork is chosen, there are some risks with the current
design.
For example, 1e100000 takes up some space in memory.  This might allow
timing attacks.
Imagine an attacker can get Alice to embed 1e100000 or another number in
her JSON.
Alice sends that message to Bob over an encrypted channel.  Bob converts
the JSON to
canonical JSON.  If Bob refuses some JSON payloads over a threshold size or
the
time to process is noticably different for 1e100000 vs 1e1 then the
attacker can
tell, via traffic analysis alone, when Alice communicates with Bob.
We should avoid that in-memory blowup if possible.






>   --scott
>
> On Fri, Mar 16, 2018 at 1:46 PM, Mike Samuel <[email protected]> wrote:
>
>>
>>
>> On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
>> [email protected]> wrote:
>>
>>> On 2018-03-16 18:04, Mike Samuel wrote:
>>>
>>> It is entirely unsuitable to embedding in HTML or XML though.
>>>> IIUC, with an implementation based on this
>>>>
>>>>    JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>>>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>>>
>>>
>>> I don't know what you are trying to prove here :-)
>>>
>>
>> Only that canonical JSON is useful in a very narrow context.
>> It cannot be embedded in an HTML script tag.
>> It cannot be embedded in an XML or HTML foreign content context without
>> extra care.
>> If it contains a string literal that embeds a NUL it cannot be embedded
>> in XML period even if extra care is taken.
>>
>>
>>
>>>
>>> The output of JSON.canonicalize would also not be in the subset of JSON
>>>> that is also a subset of JavaScript's PrimaryExpression.
>>>>
>>>>     JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>>>> `"\u2028\u2029"`
>>>>
>>>> It also is not suitable for use internally within systems that
>>>> internally use cstrings.
>>>>
>>>>    JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>>>>
>>>>
>>> JSON.canonicalize() would be [almost] identical to JSON.stringify()
>>>
>>
>> You're correct.  Many JSON producers have a web-safe version, but the
>> JavaScript builtin does not.
>> My point is that JSON.canonicalize undoes those web-safety tweaks.
>>
>>
>>
>>> JSON.canonicalize(JSON.parse('"\u2028\u2029"')) === '"\u2028\u2029"'
>>> // Returns true
>>>
>>> "Emulator":
>>>
>>> var canonicalize = function(object) {
>>>
>>>     var buffer = '';
>>>     serialize(object);
>>>
>>
>> I thought canonicalize took in a string of JSON and produced the same.
>> Am I wrong?
>> "Canonicalize" to my mind means a function that returns the canonical
>> member of an
>> equivalence class given any member from that same equivalence class, so
>> is always 'a -> 'a.
>>
>>
>>>     return buffer;
>>>
>>>     function serialize(object) {
>>>         if (object !== null && typeof object === 'object') {
>>>
>>
>> JSON.stringify(new Date(0)) === "\"1970-01-01T00:00:00.000Z\""
>> because Date.prototype.toJSON exists.
>>
>> If you operate as a JSON_string -> JSON_string function then you
>> can avoid this complexity.
>>
>>             if (Array.isArray(object)) {
>>>                 buffer += '[';
>>>                 let next = false;
>>>                 object.forEach((element) => {
>>>                     if (next) {
>>>                         buffer += ',';
>>>                     }
>>>                     next = true;
>>>                     serialize(element);
>>>                 });
>>>                 buffer += ']';
>>>             } else {
>>>                 buffer += '{';
>>>                 let next = false;
>>>                 Object.keys(object).sort().forEach((property) => {
>>>                     if (next) {
>>>                         buffer += ',';
>>>                     }
>>>                     next = true;
>>
>>                     buffer += JSON.stringify(property);
>>>
>>
>> I think you need a symbol check here.  JSON.stringify(Symbol.for('foo'))
>> === undefined
>>
>>
>>>                     buffer += ':';
>>>                     serialize(object[property]);
>>>                 });
>>>                 buffer += '}';
>>>             }
>>>         } else {
>>>             buffer += JSON.stringify(object);
>>>
>>
>> This fails to distinguish non-integral numbers from integral ones, and
>> produces non-standard output
>> when object === undefined.  Again, not a problem if the input is required
>> to be valid JSON.
>>
>>
>>>         }
>>>     }
>>> };
>>>
>>
>>
>> _______________________________________________
>> es-discuss mailing list
>> [email protected]
>> https://mail.mozilla.org/listinfo/es-discuss
>>
>>
>

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: JSON.canonicalize()

Reply via email to