Re: UTF-8(4) versus UTF-8(6) security issues

Edward H Trager Sat, 17 Jan 2004 12:30:15 -0800

On Sat, 17 Jan 2004, Markus Kuhn wrote:

> [EMAIL PROTECTED] wrote on 2004-01-11 16:53 UTC:
> >     > That is not necessarily good advice in security issues.
> >
> >     What harm can it be? It will not be characters that are relevant in any
> >     syntactical analyses.
> >
> > Consider: parser 1 knows that a UTF-8 sequence can have
> > at most 6 bytes, and sees an illegal 5-byte sequence.
> >
> > Parser 2 knows that a UTF-8 sequence can have at most
> > 4 bytes, and sees an illegal 4-byte sequence followed by
> > an ASCII symbol.
> >
> > Difference in interpretation of a byte sequence always has
> > security implications.
>
> Your example does not work, because an ASCII byte must always
> resynchronize the decoder and be recognized as an ASCII character,
> completely independent of whether the decoder knows about the existance
> of 6-byte UTF-8 sequences or treats all bytes in the range 0xf8..0xff as
> illegal. Bytes in the range 0x00..0x7f cannot be part of a malformed
> UTF-8 sequence.
>
> I have yet to see a scenario where the difference between 4-byte and
> 6-byte UTF-8 decoder could lead to a plausible security risk and I don't
> believe that one is easy to construct or likely to happen.
>

Hi, Markus,

Then I assume you would advocate that UTF-8 encoders/decoders (for example
for Linux) be written to handle all 6 bytes, not just the four which is
probably the case now?

> Markus
>
> --
> Markus Kuhn, Computer Lab, Univ of Cambridge, GB
> http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__
>
>
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/linux-utf8/
>
>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: UTF-8(4) versus UTF-8(6) security issues

Reply via email to