----- Original Message ----- From: "Sam Hartman" <[EMAIL PROTECTED]> To: "tom.petch" <[EMAIL PROTECTED]> Cc: "David Harrington" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, February 05, 2007 10:44 PM Subject: Re: [Syslog] An early last call comment on protocol-19
> What part of 4646 allows non-ASCII characters? How is encoding an > issue? Sam In section 3.1. " Format of the IANA Language Subtag Registry" it says " Characters from outside the US-ASCII [ISO646] repertoire, as well as the AMPERSAND character ("&", %x26) when it occurs in a field-body, are represented by a "Numeric Character Reference" using hexadecimal notation in the style used by [XML10" which suggests to me that characters outside the US-ASCII repertoire may occur in a language subtag. . This section does define the encoding within the IANA Language Subtag Registry but I do not see that as necessarily defining encodings to be used elsewhere and I see benefits in using UTF-8 in -protocol should encoding be needed. I am conscious that section 2.1 of RFC4646 says "Note that although [RFC4234] refers to octets, the language tags described in this document are sequences of characters from the US-ASCII [ISO646] repertoire. Language tags MAY be used in documents and applications that use other encodings, so long as these encompass the US-ASCII repertoire." which supports my view language tags are characters, not an encoding thereof. I cannot reconcile the reference in 2.1 to US-ASCII repertoire with 3.1 and its reference to encoding when outside the US-ASCII repertoire. I note that section 4.4. "Canonicalization of Language Tags" refers to "Case folding of ASCII letters in certain locales, unless carefully handled, sometimes produces non-ASCII character values." with the delightful example of "the letter 'i' (U+0069) in Turkish and Azerbaijani is uppercased to U+0130" so on balance, I think that characters outside the US-ASCII repertoire may occur. It may be that this is considered too low a probability to consider and that we limit the language subtags to ASCII, in which case, encoding is not an issue. I have checked draft-ietf-ltru-4646bis and the wording is unchanged there. As I said to start with, I do find RFC4646 magnificently powerful, perhaps too much so, in its entirety, for some use cases. Tom Petch _______________________________________________ Syslog mailing list Syslog@lists.ietf.org https://www1.ietf.org/mailman/listinfo/syslog