Richard,

 

Let’s not confuse the text representation and the memory representation.
All that matters is that what would be presented to the hash function is the
byte sequence

 

0x63 0x61 0x66 0xe9

 

It does not matter what the escaped text string looks like as long as
everyone can read it in and get the correct byte sequence.

 

Jim

 

 

From: Richard Barnes [mailto:[email protected]] 
Sent: Tuesday, June 11, 2013 10:34 AM
To: Jim Schaad
Cc: Manger, James H; Mike Jones; [email protected]
Subject: Re: [jose] Possible change to protected field

 

On Tue, Jun 11, 2013 at 1:12 PM, Jim Schaad <[email protected]> wrote:

 

 

From: Richard Barnes [mailto:[email protected]] 
Sent: Tuesday, June 11, 2013 7:11 AM
To: Manger, James H
Cc: Mike Jones; Jim Schaad; [email protected]


Subject: Re: [jose] Possible change to protected field

 

On Mon, Jun 10, 2013 at 9:45 PM, Manger, James H
<[email protected]> wrote:

>>> I was thinking about proposing that we make a change to the content of
the protected field in the JWS JSON serialization format.  If we encoded
this as a UTF8 string rather than the base64url encoded UTF8 string, then
the content would be smaller.

This would work. Mike's canonicalization worries are misplaced. Richard's
Chrome/Firefox "bug" seems wrong.

Example: JOSE originator creates the following header, with U+000A for line
ending.


{"alg": "RSA-OAEP",
"enc": "A256GCM",
"kid": "7"
}

This header consists of 19+1+17+1+10+1+1+1=51 characters (including 3 spaces
and 4 newlines). Put these 51 chars as the value of the "protected" field.

{"protected":"{\"alg\": \"RSA-OAEP\",\n\"enc\": \"A256GCM\",\n\"kid\":
\"7\"\n}\n"}

They appear as 51+16=67 chars inside the quotes of the "protected" field: 4
newlines are replaced with 4 n chars; plus 16 slashes are added. This step
could have used \u0022 instead of \", or \u000A or \u000a instead of \n, or
escaped any of the other 35 chars. But that doesn't matter since this
escaping will be removed by the first JSON parse at the other end, which
will gives the original 51 chars to its caller (in whatever native string
type the language uses).

If the originator used CRLF instead (giving 55 chars), the transmitted data
will be {"protected":"...\r\n...\r\n...\r\n...\r\n"}. The first JSON parse
at the recipient will recreate the correct 55 chars. No problem.

There is plenty of potential to confuse a human doing this manually as there
are 2 layers of JSON string escaping. It is well-defined enough for a
computer (or spec) though.


As to whether this is worthwhile...
In this specific example the base64url overhead would be +17 chars, merely 1
more char than the overhead of omitting the base64url step.


A more valuable change would be to apply the crypto operations AFTER
removing the base64url encoding. That is, apply the crypto to a
UTF-8-encoded JSON header, instead of to the US-ASCII encoding of the
base64url-encoded UTF-8-encoded JSON header. Implementations always need the
UTF-8 form at some point anyway. It means the crypto is applied to fewer
bytes so it is faster. It means an idea like Jim's that doesn't need
base64url to maintain the correct bytes in transmission doesn't need to
artificially add a base64url-encoding step at the receiver just to do the
crypto.


P.S.

>> the JSON parsers in both Chrome and Firefox (possible others; haven't
tried) choke on parsing strings that have control characters

JSON does not allow unescaped control chars when representing strings so
choking is the correct behaviour. JSON.parse(JSON.stringify(x)) works for me
in Chrome. JSON.stringify does NOT leave unescaped control chars in quotes
in the result.

 

I have heard multiple independent reports of my incorrectness in this
experiment.  So I retract my assertion of infeasibility, with regard to
control characters.

 

However, there's still an issue with code points that don't have to be
escaped. Consider the following example:

 

var x = '{"a":"caf\\u00e9"}';

var x2 = JSON.stringify(JSON.parse(x))

 

In a quick test of the environments I have on hand (several browsers +
Node.JS + PHP + Perl), I see a wide variety of behaviors, most of which
would break the signature.  The browsers and Node all converted the "\u00e9"
sequence to an unescaped "é" character, while Perl (creatively) rendered it
directly as a Latin-1 0xe9 byte.  Only PHP rendered it back to the escape
sequence.  Test code here:

<http://www.ipv.sx/jose/json_test.html>

 

Now, even if we think that re-encoding is a problem, there's the question of
whether we think the risk is acceptable to achieve better human readability.
I'm pretty split, myself.

 

Do you thing from your test that this would also be a problem if you did

 

Base64(JSON.stringify(JSON.parse(Unbase64(x)))

 

This would appear to be a general re-creation problem even if we use base64
encoding

 

The base64 part in your line actually isn't relevant.  It's actually the
outer parse/stringify (on the overall JWE object) that we're worried about
changing the string.  

 

The control case for the experiment would be:

 

var x = '{"a":"validBase64=="}';

var x2 = JSON.stringify(JSON.parse(x))

 

That works just fine with my standard test suite.  

<http://www.ipv.sx/jose/json_test.html>

 

--Richard

 

 

 

Jim

 

 

--Richard

 

 


--
James Manger

 

 

_______________________________________________
jose mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/jose

Reply via email to