Re: UTF-8 and security

Markus Kuhn Fri, 28 Jul 2000 01:31:15 -0700
Henry Spencer wrote on 2000-07-28 04:45 UTC:
> On 21 Jul 2000, H. Peter Anvin wrote:
> > The user of the decoder is the user that gets bitten by these security
> > holes...
> 
> Um, no, I think you've missed my point.  The user of a decoder is *not*
> going to get bitten by these security holes, because he's *decoding*.  The
> act of decoding transforms the input into a form where these holes do not
> exist.  The potential for security holes comes when you attempt to use the
> raw input, *without* decoding it.  It is the *non-decoding* users who are
> vulnerable. 

That still doesn't get the point quite right:

You get only bitten if you *combine* in a processing pipeline software
with decoders and software that trusts the ASCII compatibility of UTF-8
and does not decode. Attackers could theoretically exploit the different
semantics between the two. So a discussion of whether the user of a
decoder or the non-user of a decoder gets bitten misses the point again,
because it is the users that combine both scenarios who could get
bitten.

Remember that UTF-8 came with a promise: ASCII compatibility.

UTF-8 implements certainly what I call ASCII compatibility of the first
kind (also known as file-system safety):

  ASCII bytes represent only ASCII characters (and are not parts of other
  characters).

But many decoder authors fail to provide what UTF-8 should also provide,
namely what I like to call ASCII compatibility of the second kind:

  ASCII characters can only be represented by ASCII bytes (and not by
  combinations of other bytes).

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: UTF-8 and security

Reply via email to