>>> I was thinking about proposing that we make a change to the content of the 
>>> protected field in the JWS JSON serialization format.  If we encoded this 
>>> as a UTF8 string rather than the base64url encoded UTF8 string, then the 
>>> content would be smaller.


This would work. Mike's canonicalization worries are misplaced. Richard's 
Chrome/Firefox "bug" seems wrong.

Example: JOSE originator creates the following header, with U+000A for line 
ending.

{"alg": "RSA-OAEP",
"enc": "A256GCM",
"kid": "7"
}

This header consists of 19+1+17+1+10+1+1+1=51 characters (including 3 spaces 
and 4 newlines). Put these 51 chars as the value of the "protected" field.

{"protected":"{\"alg\": \"RSA-OAEP\",\n\"enc\": \"A256GCM\",\n\"kid\": 
\"7\"\n}\n"}

They appear as 51+16=67 chars inside the quotes of the "protected" field: 4 
newlines are replaced with 4 n chars; plus 16 slashes are added. This step 
could have used \u0022 instead of \", or \u000A or \u000a instead of \n, or 
escaped any of the other 35 chars. But that doesn't matter since this escaping 
will be removed by the first JSON parse at the other end, which will gives the 
original 51 chars to its caller (in whatever native string type the language 
uses).

If the originator used CRLF instead (giving 55 chars), the transmitted data 
will be {"protected":"...\r\n...\r\n...\r\n...\r\n"}. The first JSON parse at 
the recipient will recreate the correct 55 chars. No problem.

There is plenty of potential to confuse a human doing this manually as there 
are 2 layers of JSON string escaping. It is well-defined enough for a computer 
(or spec) though.


As to whether this is worthwhile...
In this specific example the base64url overhead would be +17 chars, merely 1 
more char than the overhead of omitting the base64url step.


A more valuable change would be to apply the crypto operations AFTER removing 
the base64url encoding. That is, apply the crypto to a UTF-8-encoded JSON 
header, instead of to the US-ASCII encoding of the base64url-encoded 
UTF-8-encoded JSON header. Implementations always need the UTF-8 form at some 
point anyway. It means the crypto is applied to fewer bytes so it is faster. It 
means an idea like Jim's that doesn't need base64url to maintain the correct 
bytes in transmission doesn't need to artificially add a base64url-encoding 
step at the receiver just to do the crypto.


P.S.
>> the JSON parsers in both Chrome and Firefox (possible others; haven't tried) 
>> choke on parsing strings that have control characters

JSON does not allow unescaped control chars when representing strings so 
choking is the correct behaviour. JSON.parse(JSON.stringify(x)) works for me in 
Chrome. JSON.stringify does NOT leave unescaped control chars in quotes in the 
result.

--
James Manger
_______________________________________________
jose mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/jose

Reply via email to