RE: -international: trailer

Rainer Gerhards Fri, 06 Feb 2004 15:10:50 -0800

> I have a concern about making C-compatibility a requirement of
> -protocol. I understand the concern about the amount of work
> implementors may need to do, and it spotential impact on adoption.
> However, I think this is a red herring.


I know that we do not discuss progamming languages here in the IETF. I
hope I will be granted a quick exception because it is hard to otherwise
show the importance of this point.

One primary thing we need to keep in mind is that most syslog code today
is written in C and I guess that this will be the case for quite some
time. So it may be worth taking a closer look at it...

> All an implementor
> has to do is
> put in one piece of code that looks at the incoming message and looks
> for 0x00 octets, if they care, and handle however they choose in their
> implementation.

Actually, this is the issue. It is *not* easy to do this. It requires
architectural change to the application and even requires that the
normal run time library can NOT be used.

Why that? We are talking about Unicode. As such, all octet values are
defined, there is no single octet value that the 0x00 could be mapped
to. It can also not be mapped to a multi-byte sequence, because they,
too, are all taken up by the "normal" encodings. So in order to support
this in C, you must do either one of these:

A) extend the character size, e.g. use not 16 or 32 bit for the
character but 24 or 40. This gives you room for the extra "flag bit" to
escape the 0x00 value.
B) use (or write) a non-standard string library, that handles strings
based on a byte counter (as Java and hopefully C# does)

I think A) is an totally impractical approach. B) works, but requires
complete re-design of (most) existing applications. It also forces
developers to use a non-standard (but more secure!) approach. In the C
community, there is a lot of objection to byte-counted strings. This
alone can cause some acceptance problems.

Then, some other systems/tools that are written in C may also misbehave
if they have to deal with 0x00 characters. I think a number of *nix
system tools qualifies as victims. I have no idea about PERL, but I have
a weak feeling that it may have problems handling 0x00 inside strings,
too. If that would be the case, that would be bad, too, because a lot of
administrators use PERL to analyze their logs (just think about SWATCH).

Granted, this is a programming issue - and parts of it are not even
related to on-the-wire protocol.

I wouldn't care if the 0x00 could have *any* legitimate use. But is has
NOT. I can hardly envision any legitimate use for 0x00 inside a message.
For UTF-8, it is not needed (it is in the reserved US-ASCII pane). In
US-ASCII, it is traditionally a) the C string terminator and b) a
fill-character (NUL), thrown in a string to give a slow tty time to
catch up (eg after receiving a CR character) - remember those devices
connected at 110 baud ;). So why should it be in a syslog message? It
traditionally was never seen in syslog nor in any other message text.

Besides that, 0x00 has a prooven track record in causing security issues
(of course, all boiling down to the "smart" C string handling, which is
another issue in itself...).

So this is my point: we know that 0x00 potentially causes security
troubles, causes big implementation issues, costs us acceptance - and we
can't find a legit use for it. On the other hand (in my point of view),
we have the point that allowing 0x00 is cleaner and less crippled.

If I weight both of the arguments, I come to the conclusion that it may
be better to disallow 0x00, as it may not be as clean but has some
obvious advantages...

Rainer

RE: -international: trailer

Reply via email to