Hi Ted,

In reviewing the JOSE drafts in preparation for them being approved, I was 
looking at 
https://datatracker.ietf.org/doc/draft-ietf-jose-json-web-key/ballot/#ted-lemon 
and saw that you'd filed a NO OBJECTION ballot (with COMMENT) that for some 
reason wasn't delivered to my e-mail.  Since I hasn't seen it until today, I 
hadn't previously responded.  My apologies!  Your comment was:
Comment (2014-10-02 for -33)
I'm not sure whether I need to complain about this, but the following seems
underspecified:

   UTF8(STRING) denotes the octets of the UTF-8 [RFC3629] representation
   of STRING.

   ASCII(STRING) denotes the octets of the ASCII [USASCII]
   representation of STRING.

The issue is that we don't know what STRING is.   Is it 32-bit unicode?   Is it
ASCII?   What does it mean to have ASCII(unicode string)?   Is ASCII(STRING) an
assertion that STRING is representable as ASCII?

These are fair questions.  The STRING in this notation is always a sequence of 
characters with an unspecified representation.  The notations UTF8(STRING) and 
ASCII(STRING) are used to represent the character string as an octet sequence 
with a particular character encoding.

You're right that ASCII(Unicode string) isn't meaningful in the general case; 
it's only used when the character set of STRING is constrained to containing 
only ASCII characters.  I suppose that you're right you could think of 
ASCII(STRING) as an assertion that STRING is representable in ASCII, but it 
means more than that; it specifies a particular octet sequence that represents 
those characters.

For instance, while both ASCII("Abc") and UTF8("Abc") result in the octet 
sequence [65, 98, 99], if we were to have a related UTF16BitEndian() function 
(which we don't), UTF16BitEndian("Abc") would represent the octet sequence [0, 
65, 0, 98, 0, 99] and EBCDIC("Abc") would represent the octet sequence [193, 
130, 131].  But now I'm off into esoterica... ;-)

Back to the topic at hand, the notation UTF8(STRING) was adopted to replace the 
much more verbose notation "the octets of the UTF-8 representation of STRING" 
which used to appear repeatedly throughout the drafts and in particular, the 
notation BASE64URL(UTF-8(STRING)) replaces the also previously very common 
notation "the Base64url encoding of the octets of the UTF-8 representation of 
STRING".  This was an improvement suggested by Jim Schaad in one of his review 
comments.

If you think that the current notation is unclear, we should sort out how to 
clarify it.  The best I've come up with is to add the phrase ", where STRING is 
a sequence of zero or more Unicode characters" to these definitions.  (The 
language "sequence of zero or more Unicode characters" comes from the 
introduction to RFC 7159.)  Do you think that would address your questions, or 
do you have an alternate suggestion?

Sorry again for you not receiving a reply to this until now!

                                                            Best wishes,
                                                            -- Mike

_______________________________________________
jose mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/jose

Reply via email to