----- Original Message -----
From: "Sam Hartman" <[EMAIL PROTECTED]>
To: "tom.petch" <[EMAIL PROTECTED]>
Cc: "David Harrington" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, February 05, 2007 10:44 PM
Subject: Re: [Syslog] An early last call comment on protocol-19


> What part of 4646 allows non-ASCII characters?  How is encoding an
> issue?

Sam

In section 3.1. " Format of the IANA Language Subtag Registry" it says
"  Characters from outside the US-ASCII [ISO646] repertoire, as well as
   the AMPERSAND character ("&", %x26) when it occurs in a field-body,
   are represented by a "Numeric Character Reference" using hexadecimal
   notation in the style used by [XML10"
which suggests to me that characters outside the US-ASCII repertoire may occur
in
a language subtag.
.
This section does define the encoding within the IANA Language Subtag Registry
but I do not see that as necessarily defining encodings to be used elsewhere and
I see benefits in using UTF-8 in -protocol should encoding be needed.

I am conscious that section 2.1 of RFC4646 says
"Note that although [RFC4234] refers to octets, the language tags
   described in this document are sequences of characters from the
   US-ASCII [ISO646] repertoire.  Language tags MAY be used in documents
   and applications that use other encodings, so long as these encompass
   the US-ASCII repertoire."
which supports my view language tags are characters, not an encoding thereof.  I
cannot reconcile the reference in 2.1 to US-ASCII repertoire with 3.1 and its
reference to encoding when outside the US-ASCII repertoire.

I note that section 4.4.  "Canonicalization of Language Tags" refers to
"Case folding of ASCII letters in certain locales, unless carefully handled,
sometimes produces non-ASCII character values."
with the delightful example of
"the letter 'i' (U+0069) in Turkish and Azerbaijani is uppercased to U+0130"
so on balance, I think that characters outside the US-ASCII repertoire may
occur.

It may be that this is considered too low a probability to consider and that we
limit the language subtags to ASCII, in which case, encoding is not an issue.

I have checked draft-ietf-ltru-4646bis and the wording is unchanged there.

As I said to start with, I do find RFC4646 magnificently powerful, perhaps
too much so, in its entirety, for some use cases.

Tom Petch


_______________________________________________
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog

Reply via email to