On Sat, 17 Jan 2004, Markus Kuhn wrote:
> [EMAIL PROTECTED] wrote on 2004-01-11 16:53 UTC:
> > > That is not necessarily good advice in security issues.
> >
> > What harm can it be? It will not be characters that are relevant in any
> > syntactical analyses.
> >
> > Consider: parser 1 knows that a UTF-8 sequence can have
> > at most 6 bytes, and sees an illegal 5-byte sequence.
> >
> > Parser 2 knows that a UTF-8 sequence can have at most
> > 4 bytes, and sees an illegal 4-byte sequence followed by
> > an ASCII symbol.
> >
> > Difference in interpretation of a byte sequence always has
> > security implications.
>
> Your example does not work, because an ASCII byte must always
> resynchronize the decoder and be recognized as an ASCII character,
> completely independent of whether the decoder knows about the existance
> of 6-byte UTF-8 sequences or treats all bytes in the range 0xf8..0xff as
> illegal. Bytes in the range 0x00..0x7f cannot be part of a malformed
> UTF-8 sequence.
>
> I have yet to see a scenario where the difference between 4-byte and
> 6-byte UTF-8 decoder could lead to a plausible security risk and I don't
> believe that one is easy to construct or likely to happen.
>
Hi, Markus,
Then I assume you would advocate that UTF-8 encoders/decoders (for example
for Linux) be written to handle all 6 bytes, not just the four which is
probably the case now?
> Markus
>
> --
> Markus Kuhn, Computer Lab, Univ of Cambridge, GB
> http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__
>
>
> --
> Linux-UTF8: i18n of Linux on all levels
> Archive: http://mail.nl.linux.org/linux-utf8/
>
>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/