Henry Spencer wrote on 2000-07-28 04:45 UTC:
> On 21 Jul 2000, H. Peter Anvin wrote:
> > The user of the decoder is the user that gets bitten by these security
> > holes...
>
> Um, no, I think you've missed my point. The user of a decoder is *not*
> going to get bitten by these security holes, because he's *decoding*. The
> act of decoding transforms the input into a form where these holes do not
> exist. The potential for security holes comes when you attempt to use the
> raw input, *without* decoding it. It is the *non-decoding* users who are
> vulnerable.
That still doesn't get the point quite right:
You get only bitten if you *combine* in a processing pipeline software
with decoders and software that trusts the ASCII compatibility of UTF-8
and does not decode. Attackers could theoretically exploit the different
semantics between the two. So a discussion of whether the user of a
decoder or the non-user of a decoder gets bitten misses the point again,
because it is the users that combine both scenarios who could get
bitten.
Remember that UTF-8 came with a promise: ASCII compatibility.
UTF-8 implements certainly what I call ASCII compatibility of the first
kind (also known as file-system safety):
ASCII bytes represent only ASCII characters (and are not parts of other
characters).
But many decoder authors fail to provide what UTF-8 should also provide,
namely what I like to call ASCII compatibility of the second kind:
ASCII characters can only be represented by ASCII bytes (and not by
combinations of other bytes).
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/