Richard,
Lets not confuse the text representation and the memory representation. All that matters is that what would be presented to the hash function is the byte sequence 0x63 0x61 0x66 0xe9 It does not matter what the escaped text string looks like as long as everyone can read it in and get the correct byte sequence. Jim From: Richard Barnes [mailto:[email protected]] Sent: Tuesday, June 11, 2013 10:34 AM To: Jim Schaad Cc: Manger, James H; Mike Jones; [email protected] Subject: Re: [jose] Possible change to protected field On Tue, Jun 11, 2013 at 1:12 PM, Jim Schaad <[email protected]> wrote: From: Richard Barnes [mailto:[email protected]] Sent: Tuesday, June 11, 2013 7:11 AM To: Manger, James H Cc: Mike Jones; Jim Schaad; [email protected] Subject: Re: [jose] Possible change to protected field On Mon, Jun 10, 2013 at 9:45 PM, Manger, James H <[email protected]> wrote: >>> I was thinking about proposing that we make a change to the content of the protected field in the JWS JSON serialization format. If we encoded this as a UTF8 string rather than the base64url encoded UTF8 string, then the content would be smaller. This would work. Mike's canonicalization worries are misplaced. Richard's Chrome/Firefox "bug" seems wrong. Example: JOSE originator creates the following header, with U+000A for line ending. {"alg": "RSA-OAEP", "enc": "A256GCM", "kid": "7" } This header consists of 19+1+17+1+10+1+1+1=51 characters (including 3 spaces and 4 newlines). Put these 51 chars as the value of the "protected" field. {"protected":"{\"alg\": \"RSA-OAEP\",\n\"enc\": \"A256GCM\",\n\"kid\": \"7\"\n}\n"} They appear as 51+16=67 chars inside the quotes of the "protected" field: 4 newlines are replaced with 4 n chars; plus 16 slashes are added. This step could have used \u0022 instead of \", or \u000A or \u000a instead of \n, or escaped any of the other 35 chars. But that doesn't matter since this escaping will be removed by the first JSON parse at the other end, which will gives the original 51 chars to its caller (in whatever native string type the language uses). If the originator used CRLF instead (giving 55 chars), the transmitted data will be {"protected":"...\r\n...\r\n...\r\n...\r\n"}. The first JSON parse at the recipient will recreate the correct 55 chars. No problem. There is plenty of potential to confuse a human doing this manually as there are 2 layers of JSON string escaping. It is well-defined enough for a computer (or spec) though. As to whether this is worthwhile... In this specific example the base64url overhead would be +17 chars, merely 1 more char than the overhead of omitting the base64url step. A more valuable change would be to apply the crypto operations AFTER removing the base64url encoding. That is, apply the crypto to a UTF-8-encoded JSON header, instead of to the US-ASCII encoding of the base64url-encoded UTF-8-encoded JSON header. Implementations always need the UTF-8 form at some point anyway. It means the crypto is applied to fewer bytes so it is faster. It means an idea like Jim's that doesn't need base64url to maintain the correct bytes in transmission doesn't need to artificially add a base64url-encoding step at the receiver just to do the crypto. P.S. >> the JSON parsers in both Chrome and Firefox (possible others; haven't tried) choke on parsing strings that have control characters JSON does not allow unescaped control chars when representing strings so choking is the correct behaviour. JSON.parse(JSON.stringify(x)) works for me in Chrome. JSON.stringify does NOT leave unescaped control chars in quotes in the result. I have heard multiple independent reports of my incorrectness in this experiment. So I retract my assertion of infeasibility, with regard to control characters. However, there's still an issue with code points that don't have to be escaped. Consider the following example: var x = '{"a":"caf\\u00e9"}'; var x2 = JSON.stringify(JSON.parse(x)) In a quick test of the environments I have on hand (several browsers + Node.JS + PHP + Perl), I see a wide variety of behaviors, most of which would break the signature. The browsers and Node all converted the "\u00e9" sequence to an unescaped "é" character, while Perl (creatively) rendered it directly as a Latin-1 0xe9 byte. Only PHP rendered it back to the escape sequence. Test code here: <http://www.ipv.sx/jose/json_test.html> Now, even if we think that re-encoding is a problem, there's the question of whether we think the risk is acceptable to achieve better human readability. I'm pretty split, myself. Do you thing from your test that this would also be a problem if you did Base64(JSON.stringify(JSON.parse(Unbase64(x))) This would appear to be a general re-creation problem even if we use base64 encoding The base64 part in your line actually isn't relevant. It's actually the outer parse/stringify (on the overall JWE object) that we're worried about changing the string. The control case for the experiment would be: var x = '{"a":"validBase64=="}'; var x2 = JSON.stringify(JSON.parse(x)) That works just fine with my standard test suite. <http://www.ipv.sx/jose/json_test.html> --Richard Jim --Richard -- James Manger
_______________________________________________ jose mailing list [email protected] https://www.ietf.org/mailman/listinfo/jose
