On Mon, Jun 10, 2013 at 9:45 PM, Manger, James H <
[email protected]> wrote:
> >>> I was thinking about proposing that we make a change to the content of
> the protected field in the JWS JSON serialization format. If we encoded
> this as a UTF8 string rather than the base64url encoded UTF8 string, then
> the content would be smaller.
>
>
> This would work. Mike's canonicalization worries are misplaced. Richard's
> Chrome/Firefox "bug" seems wrong.
>
> Example: JOSE originator creates the following header, with U+000A for
> line ending.
>
> {"alg": "RSA-OAEP",
> "enc": "A256GCM",
> "kid": "7"
> }
>
> This header consists of 19+1+17+1+10+1+1+1=51 characters (including 3
> spaces and 4 newlines). Put these 51 chars as the value of the "protected"
> field.
>
> {"protected":"{\"alg\": \"RSA-OAEP\",\n\"enc\": \"A256GCM\",\n\"kid\":
> \"7\"\n}\n"}
>
> They appear as 51+16=67 chars inside the quotes of the "protected" field:
> 4 newlines are replaced with 4 n chars; plus 16 slashes are added. This
> step could have used \u0022 instead of \", or \u000A or \u000a instead of
> \n, or escaped any of the other 35 chars. But that doesn't matter since
> this escaping will be removed by the first JSON parse at the other end,
> which will gives the original 51 chars to its caller (in whatever native
> string type the language uses).
>
> If the originator used CRLF instead (giving 55 chars), the transmitted
> data will be {"protected":"...\r\n...\r\n...\r\n...\r\n"}. The first JSON
> parse at the recipient will recreate the correct 55 chars. No problem.
>
> There is plenty of potential to confuse a human doing this manually as
> there are 2 layers of JSON string escaping. It is well-defined enough for a
> computer (or spec) though.
>
>
> As to whether this is worthwhile...
> In this specific example the base64url overhead would be +17 chars, merely
> 1 more char than the overhead of omitting the base64url step.
>
>
> A more valuable change would be to apply the crypto operations AFTER
> removing the base64url encoding. That is, apply the crypto to a
> UTF-8-encoded JSON header, instead of to the US-ASCII encoding of the
> base64url-encoded UTF-8-encoded JSON header. Implementations always need
> the UTF-8 form at some point anyway. It means the crypto is applied to
> fewer bytes so it is faster. It means an idea like Jim's that doesn't need
> base64url to maintain the correct bytes in transmission doesn't need to
> artificially add a base64url-encoding step at the receiver just to do the
> crypto.
>
>
> P.S.
> >> the JSON parsers in both Chrome and Firefox (possible others; haven't
> tried) choke on parsing strings that have control characters
>
> JSON does not allow unescaped control chars when representing strings so
> choking is the correct behaviour. JSON.parse(JSON.stringify(x)) works for
> me in Chrome. JSON.stringify does NOT leave unescaped control chars in
> quotes in the result.
>
I have heard multiple independent reports of my incorrectness in this
experiment. So I retract my assertion of infeasibility, with regard to
control characters.
However, there's still an issue with code points that don't have to be
escaped. Consider the following example:
var x = '{"a":"caf\\u00e9"}';
var x2 = JSON.stringify(JSON.parse(x))
In a quick test of the environments I have on hand (several browsers +
Node.JS + PHP + Perl), I see a wide variety of behaviors, most of which
would break the signature. The browsers and Node all converted the
"\u00e9" sequence to an unescaped "é" character, while Perl (creatively)
rendered it directly as a Latin-1 0xe9 byte. Only PHP rendered it back to
the escape sequence. Test code here:
<http://www.ipv.sx/jose/json_test.html>
Now, even if we think that re-encoding is a problem, there's the question
of whether we think the risk is acceptable to achieve better human
readability. I'm pretty split, myself.
--Richard
>
> --
> James Manger
>
_______________________________________________
jose mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/jose