Re: charsets in debian/control

Bart Schuller Sun, 05 Dec 2004 11:52:54 -0600

On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote:
> On that note, how likely is it to hit a UTF-8 character encoding that
> contains a '\n'? Any non UTF-8 aware parser would assume a new line
> has started and get parse errors.


0% likely, guaranteed.

UTF-8 is *designed* to be upwards compatible with plain ASCII. Every
valid ASCII character has the same meaning in UTF-8. Every UTF-8 byte
sequence for a non-ASCII character will not contain *any* ASCII characters.

This is achieved by making sure that everything above plain ASCII has
the high bit set, not just for the first byte, but for all of them.

-- 
Bart.

Re: charsets in debian/control

Reply via email to