It's not canonicalization, it's just UTF-8 serialization. As to your
questions about CR / LF / tab, let us recall RFC 4627:
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
Empirically, it appears that this serialization is stable with regard to
re-encoding, at least for browser JSON engines:
<http://www.ipv.sx/jose/json_test.html>
The only thing that would lead me to prefer this serialization against the
Occam's Razor argument is that it's more readable for someone that receives
a JOSE object. Instead of staring at base64-encoded junk, you can actually
sort of see what's in there. But I don't feel very strongly about this.
--Richard
On Mon, Jun 10, 2013 at 1:08 PM, Mike Jones <[email protected]>wrote:
> This would take us down the road to canonicalization. How would you
> represent a CRLF in the value? How would you represent a bare LF? How
> would you represent a tab? Canonical representations for all of these, and
> more, would have to be specified.****
>
> ** **
>
> Using a special encoding for this field that is different than the
> encoding used for all other fields (base64url encoding) is
> developer-hostile. It forces them to have (and test) special code to
> produce a different for this field alone. This would undoubtedly result in
> interop problems, because the corner cases almost never get tested (and
> often never get coded correctly either).****
>
> ** **
>
> We already have a simple means of ensuring the correct transmission of
> fields. Occam’s Razor would suggest just using it.****
>
> ** **
>
> -- Mike***
> *
>
> ** **
>
> *From:* Richard Barnes [mailto:[email protected]]
> *Sent:* Monday, June 10, 2013 10:01 AM
> *To:* Mike Jones
> *Cc:* Jim Schaad; [email protected]
>
> *Subject:* Re: [jose] Possible change to protected field****
>
> ** **
>
> I think you've misunderstood the proposal. It's not that the field will
> be unencoded, it's that it will be encoded as a UTF-8 string instead of a
> base64 string.****
>
> ** **
>
> OLD: (json) --> UTF-8 --> base64****
>
> NEW: (json) --> UTF-8****
>
> ** **
>
> OLD: { protected: "eyJhbGciOiJSUzI1NiJ9Cg==" }****
>
> NEW: { protected: "{\"alg\":\"RS256\"}" }****
>
> ** **
>
> Now, there might still be a problem with that, because JSON strings aren't
> canonical. In the example above, the quote characters could also have been
> presented as "\u0022". So if some JSON re-processor were to change the
> encoding ot the string, it would break the signature. However, that's also
> a problem for the base64-encoded version ("a" vs. "\u0061").****
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>
> On Mon, Jun 10, 2013 at 12:38 PM, Mike Jones <[email protected]>
> wrote:****
>
> The problem with this suggested change is that it requires unencoded JSON
> objects to be transmitted with no transformations whatsoever, whereas the
> problems with this assumption are well known. For starters, on UNIX-based
> systems, newlines are typically represented by a bare LF character, whereas
> on DOS-based systems, newlines are typically represented by a CRLF pair.
> In transmission, many systems, including mail agents tend to convert
> newlines to the system’s normal format. This would break these JOSE
> objects.****
>
> ****
>
> Some agents wrap lines after a certain length. Particularly when being
> transmitted in HTML or XML, many systems replace two or more consecutive
> spaces with a single space. Others replace two or more spaces with a
> single space and N-1 non-breaking space characters. I could go on…****
>
> ****
>
> I appreciate the attempt to make things appear to be more uniform and
> readable, but the CRLF/LF problems alone are enough to make this a
> non-starter.****
>
> ****
>
> Best
> wishes,****
>
> -- Mike***
> *
>
> ****
>
> *From:* [email protected] [mailto:[email protected]] *On Behalf
> Of *Richard Barnes
> *Sent:* Monday, June 10, 2013 9:29 AM
> *To:* Jim Schaad
> *Cc:* [email protected]
> *Subject:* Re: [jose] Possible change to protected field****
>
> ****
>
> This sounds like a fine idea to me. It saves space and makes the JSON
> format more human-readable. It actually makes kind of a nice analogy to
> ASN.1, namely use of OCTET STRING to encapsulate more DER content.****
>
> ****
>
> The compact serialization can continue to base64url-encode that field, so
> it would not be a breaking change for that serialization.****
>
> ****
>
> --Richard****
>
> ****
>
> On Mon, Jun 10, 2013 at 1:46 AM, Jim Schaad <[email protected]>
> wrote:****
>
> <no hat>****
>
> ****
>
> I am trying to figure out if I am missing something. This is not yet a
> formal proposal to actual change the document.****
>
> ****
>
> I was thinking about proposing that we make a change to the content of the
> protected field in the JWS JSON serialization format. If we encoded this
> as a UTF8 string rather than the base64url encoded UTF8 string, then the
> content would be smaller. The computation of the signature would be
> unchanged in that it would still be computed over the base64url encoded
> string. I believe that the conversion from the UTF8 string to the
> base64url encoded UTF8 string is a deterministic encoding and thus would
> not generate any problems from that point.****
>
> ****
>
> At this point I and trying to figure out if I missed anything that would
> preclude this from working. I am not worried about how hard or easy it
> would be to do, just if it is even possible.****
>
> ****
>
> Jim****
>
> ****
>
>
> _______________________________________________
> jose mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/jose****
>
> ****
>
> ** **
>
_______________________________________________
jose mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/jose