Bart Schuller <[EMAIL PROTECTED]> writes: > On Sun, Dec 05, 2004 at 06:40:52PM +0100, Goswin von Brederlow wrote: >> On that note, how likely is it to hit a UTF-8 character encoding that >> contains a '\n'? Any non UTF-8 aware parser would assume a new line >> has started and get parse errors. > > 0% likely, guaranteed. > > UTF-8 is *designed* to be upwards compatible with plain ASCII. Every > valid ASCII character has the same meaning in UTF-8. Every UTF-8 byte > sequence for a non-ASCII character will not contain *any* ASCII characters. > > This is achieved by making sure that everything above plain ASCII has > the high bit set, not just for the first byte, but for all of them.
Ok, so no problems there. Any parser that acceps 8bit non-ascii chars will accept UTF-8 then. What remains is just making the UTF-8 chars visually correct then. MfG Goswin