Re: [Syslog] Unicode - was: AD Review fordraft-ietf-syslog-protocol-14

Tom Petch Thu, 27 Oct 2005 06:46:06 -0700

I am not quite clear about this.

In the I-D, it isn't really English (or American) that we are restricting
SD-NAME to
but, as the I-D says, to
 PRINTUSASCII except = SP ] "
There are lots of other English characters -  which this keyboard won't
generate - we do not want to see in there:-)  So far so good.

But you seem to be saying more, that SD-NAME SHOULD be an English word, as
opposed to German or French or .. as well as being limited to the character set
above.

Tom Petch

----- Original Message -----
From: "Rainer Gerhards" <[EMAIL PROTECTED]>
To: "Anton Okmianski (aokmians)" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, October 17, 2005 12:19 PM
Subject: RE: [Syslog] Unicode - was: AD Review fordraft-ietf-syslog-protocol-14

Anton:

thanks for your reply. I agree that structured data can contain (and probably
does in a real use case) data that is also present in the MSG part. As of this,
there is need to support Unicode there, too. As you outline, STRUCTURED-DATA is
mostly machine-processed. I do not fully agree that it won't be interpreted by a
human, so there eventually is some hit by visual spoofing. This is acceptable as
the security concerns are outweighted by the required functionality.

However, there might still be one thing that we could consider to do:
STRUCTURED-DATA consists of SD-IDs, PARAM-NAMEs and PARAM-VALUEs. Your argument
definitely shows that PARAM-VALUEs must support Unicode. But is it true for the
other two entities, too? Will we loose required functionality for the
international community if we restrict either SD-ID, PARAM-NAME or both to
US-ASCII? If the answer is "no", then we can probably restrict some entities. I
know that local characters in these identifiers might be helpful. But is it
really something we MUST have?

Let me use an example. In Germany, "Müller" (containing u with Umlaut - ü) is a
common name. As such, a user name "Müller" is something we can have. Now, if I
encode this in (hypothetical) STRUCTRED-DATA, I may end up with something like

USER="Müller"  [PARAM-NAME = PARAM-VALUE]

The "USER=" part should be locale- and language-ignorant - at least in my point
of view. So it is probably not a good idea that a German implementation would
encode it as

BENUTZER="Müller" ["Benutzer" is German for "User"]

While the extension-mechanism for vendor- and experimental extensions does not
specify any details, it probably is a good idea to use English language tags in
order to facilitate interpretation of the tags (and universal) adoption. The
extension mechanism should not be used as a translation tool. (Maybe this is
also something we should point out in syslog-protocol). But if we intend to
facilitate universal adoption of tags, we can probably require them to be
English names. And in such a case, US-ASCII would be sufficient.

Please do not misunderstand me. I am not suggesting that using local language is
a bad thing per se. I also do not think of the use of English in this context as
the use of the local language of e.g. the US and the British. I am thinking
about the use of English as the language of IT, the language that - at least
currently - lays the foundation for international collaboration (and as such
conceptually should not be tied to some nations). The fact that I am from a
non-native English speaking country might proove my point a little.

My concern is that if we encourage implementors to create language-specific tag
names, interoperability might become a much bigger problem than when we stick
with a single, universal (in this context!) language. I currently do not see any
samples where local-language tag names will actually be required. Maybe I am
overlooking the obvious. Maybe English is not sufficiently enough considered to
be the "univesal language of IT", so that local policies might require tag names
to be in local language. But at the same time, would this not also mean that we
would need to support not a single, universal, timestamp but rather a wealth of
different local formats?

Any feedback is deeply appreciated.
Rainer

> -----Original Message-----
> From: Anton Okmianski (aokmians) [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 14, 2005 5:56 PM
> To: Rainer Gerhards; [EMAIL PROTECTED]
> Subject: RE: [Syslog] Unicode - was: AD Review for
> draft-ietf-syslog-protocol-14
>
> Rainer:
>
> > So it might be useful to think about where we have an issue
> > at all. The MSG field, I think, does not count as identifying
> > information. It is meant as a human-readable message. Even
> > though it obviously carries information, I think it is not
> > subject to an easy visual spoofing attack. Ok, one can think
> > about scenarios where visual spoofing might cause confusion,
> > but the severity level of this should be fairly low. I think
> > it has the same implications as hoax mails, where
> > misinformation in the textual part is a simple fact of life
> > and not avoidable without stopping to use that service. So I
> > conclude that supporting the full set of Unicode characters
> > inside MSG is fine.
>
> Agree.
>
> > The STRUCTURED-DATA is another story. Here, it includes
> > information that might primarily be used as identifying
> > information.
>
> Identifying, but mostly used by software that can filter
> messages, which is not susceptible to visual character confusion.
>
> > Reviewing the current defined SD-IDs, I hardly
> > see any need for using Unicode encoding. As far as I recall,
> > we have selected Unicode instead of US-ASCII because we
> > thought it might be benefitial for further extensions.
> > However, given the fact of visual confusability and the need
> > to deal with it, I am questioning if it acually is a good
> > idea to encode STRUCTURED-DATA in Unicode. Wouldn't it be
> > better to use US-ASCII, which relievs us of all of these
> > issues? So far, I do not see a compelling reason for full
> > Unicode support in SD-IDs.
>
> With all due respect, I strongly disagree. Structured data
> may include anything. It is just structured. It can contain
> same pieces of information that may be found in the message.
>
> We have a very specific use-case where a structured element
> is a username. And that username can be in Japanese.  I can
> see many other use-cases like this.  Being from Germany, I am
> sure you can easily come with some too. :) How about any
> company name in a foreign language?  Address?  English is not
> an exclusive language of system administration anymore.
>
> > Of course, we could just go ahead and document these issues
> > in security considerations. I think, however, that we should
> > try to solve them before resorting to that. I think we have a
> > good chance of finding a workable solution.
> >
> > My suggestion to the WG is that we drop Unicode encoding for
> > STRUCTURED-DATA and use printable US-ASCII instead.
> >
> > I would appreciate feedback on the following:
> >
> > #1 Is it OK the support Unicode - without restriction - in MSG?
>
> Well, the restriction is that we require use of the most
> compact encoding as you mentioned.
>
> > #2 Is there support in the WG for changing STRUCTURED-DATA encoding
> >    from Unicode to US-ASCII?
>
> Not from me. :) In fact, I think it is very critical to
> support non-ASCII in structured data.
>
> Thanks,
> Anton.
>
> >
> > If the answer to #2 is "no", please provide reasoning as that
> > will help speed up the process.
> >
> > --
> > Rainer
> >
> > _______________________________________________
> > Syslog mailing list
> > Syslog@lists.ietf.org
> > https://www1.ietf.org/mailman/listinfo/syslog
> >
>

_______________________________________________
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog

_______________________________________________
Syslog mailing list
Syslog@lists.ietf.org
https://www1.ietf.org/mailman/listinfo/syslog

Re: [Syslog] Unicode - was: AD Review fordraft-ietf-syslog-protocol-14

Reply via email to