On Fri, Mar 16, 2018 at 12:44 PM, C. Scott Ananian <[email protected]>
wrote:

> On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <[email protected]>
> wrote:
>>
>>
>> On Fri, Mar 16, 2018 at 11:38 AM, C. Scott Ananian <[email protected]
>> > wrote:
>>
>>> Canonical JSON is often used to imply a security property: two JSON
>>> blobs with identical contents are expected to have identical canonical JSON
>>> forms (and thus identical hashed values).
>>>
>>
>> What does "identical contents" mean in the context of numbers?  JSON
>> intentionally avoids specifying any precision for numbers.
>>
>> JSON.stringify(1/3) === '0.3333333333333333'
>>
>> What would happen with JSON from systems that allow higher precision?
>> I.e., what would (JSON.canonicalize(JSON.stringify(1/3) + '3')) produce?
>>
>> However, unicode normalization allows multiple representations of "the
>>> same" string, which defeats this security property.  Depending on your
>>> implementation language
>>>
>>
>> We shouldn't normalize unicode in strings that contain packed binary
>> data.  JSON strings are strings of UTF-16 code-units, not Unicode scalar
>> values and any system that assumes the latter will break often.
>>
>
> Both of these points are made on the URL I originally cited:
> http://wiki.laptop.org/go/Canonical_JSON
>

Thanks, I see
"""
Floating point numbers are not allowed in canonical JSON. Neither are
leading zeros or "minus 0" for integers.
"""
which answers my question.

I also see
"""
A previous version of this specification required strings to be valid
unicode, and relied on JSON's \u escape. This was abandoned as it doesn't
allow representing arbitrary binary data in a string, and it doesn't
preserve the identity of non-canonical unicode strings.
"""
which addresses my question.

I also see
"""
It is suggested that unicode strings be represented as the UTF-8 encoding
of unicode Normalization Form C <http://www.unicode.org/reports/tr15/> (UAX
#15). However, arbitrary content may be represented as a string: it is not
guaranteed that string contents can be meaningfully parsed as UTF-8.
"""
which seems to be mixing concerns about the wire format used to encode JSON
as octets and NFC which would apply to the text of the JSON string.


If that confusion is cleaned up, then it seems a fine subset of JSON to
ship over the wire with a JSON content-type.


It is entirely unsuitable to embedding in HTML or XML though.
IIUC, with an implementation based on this

  JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
  JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`

The output of JSON.canonicalize would also not be in the subset of JSON
that is also a subset of JavaScript's PrimaryExpression.

   JSON.canonicalize(JSON.stringify("\u2028\u2029")) === `"\u2028\u2029"`

It also is not suitable for use internally within systems that internally
use cstrings.

  JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to