On Fri, Mar 16, 2018 at 1:30 PM, Anders Rundgren <
[email protected]> wrote:

> On 2018-03-16 18:04, Mike Samuel wrote:
>
> It is entirely unsuitable to embedding in HTML or XML though.
>> IIUC, with an implementation based on this
>>
>>    JSON.canonicalize(JSON.stringify("</script>")) === `"</script>"` &&
>> JSON.canonicalize(JSON.stringify("]]>")) === `"]]>"`
>>
>
> I don't know what you are trying to prove here :-)


He wants to ship it as application/json and have it be safe if the browser
happens to ignore the mime type and interpret it as HTML or XML, I
believe.  Mandatory encoding of < as an escape would make the output "safe"
for such use.  I'm not convinced this is in-scope, but it's an interesting
case to consider when determining which characters ought to be escaped.

(I think he's writing `JSON.canonicalize(JSON.stringify(...))` where he
means to write `JSON.canonicalize(...)`, at least if I understand the
proposed API correctly.)


> The output of JSON.canonicalize would also not be in the subset of JSON
>> that is also a subset of JavaScript's PrimaryExpression.
>>
>>     JSON.canonicalize(JSON.stringify("\u2028\u2029")) ===
>> `"\u2028\u2029"`
>>
>
I'm not sure about this, but I think he's saying you can't just `eval` the
canonical JSON output, because newlines appear literally, not escaped. I
believe I actually ran into some compatibility issues with this back when I
was playing around with canonical JSON as well; certain JSON parsers
wouldn't accept "JSON" with embedded literal newlines.

OTOH, I don't think anyone should be encouraged to eval JSON!  As noted
previously, there should be a strict parse function to go along with the
strict serialize function.


> It also is not suitable for use internally within systems that internally
>> use cstrings.
>>
>>    JSON.canonicalize(JSON.stringify("\u0000")) === `"\u0000"`
>>
>
A literal NUL character is unrepresentable in a naive C implementation.
You need to use pascal-style strings in your low-level implementation.
This is an important consideration for non-JavaScript use.  In my page I
noted, "Because only two byte values are escaped, be aware that
JSON-encoded data may contain embedded control characters and nulls."  A
similar warning is at least called for here.


> On Fri, Mar 16, 2018 at 12:23 PM, Mike Samuel <[email protected]>
> wrote:
> I also see
> """
> It is suggested that unicode strings be represented as the UTF-8 encoding
> of unicode Normalization Form C <http://www.unicode.org/reports/tr15/> (UAX
> #15). However, arbitrary content may be represented as a string: it is not
> guaranteed that string contents can be meaningfully parsed as UTF-8.
> """
> which seems to be mixing concerns about the wire format used to encode
> JSON as octets and NFC which would apply to the text of the JSON string.
>

Yes, it is rather unfortunate that we have only one datatype here and a bit
of an impedance mismatch.  JSON serialization is usually considered
literally as a byte-stream, but JavaScript wants to parse those bytes as
some encoding (usually UTF-8) of a UTF-16 string.

My suggestion is just to make this very plain in a SHOULD comment to the
potentially implementor.  If the underlying data is unicode string data, it
SHOULD be represented as the UTF-8 encoding of unicode Normalization Form C
(UAX #15).   However, the consumer should be aware that the data may be
binary bits and not interpretable as a valid UTF-8 string.

Re:

> Escape normalization: If you don't do this normalization, signatures would
> typically break and that's not really a "security" (=attacker) problem; it
> is rather a "nuisance" of the same caliber as a server not responding.


Consider signatures for malware detection.  If an attacker can trivially
modify their (in this example) JSON-encoded payload so that it is still
"canonical" and still passes whatever input verifier exists (so much easier
if there is not strict parsing!), then they can bypass your signature-based
detection system.  That's a security problem.

Both sides must be true: equal hashes should mean equal content (to high
probability) and unequal hashes should mean different content.  Otherwise
there is a security problem.
 --scott
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to