On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote: > On that note, how likely is it to hit a UTF-8 character encoding that > contains a '\n'? Any non UTF-8 aware parser would assume a new line > has started and get parse errors.
0% likely, guaranteed. UTF-8 is *designed* to be upwards compatible with plain ASCII. Every valid ASCII character has the same meaning in UTF-8. Every UTF-8 byte sequence for a non-ASCII character will not contain *any* ASCII characters. This is achieved by making sure that everything above plain ASCII has the high bit set, not just for the first byte, but for all of them. -- Bart.