On Thu, 6 Feb 2003, David Olofson wrote: > Would be interesting to know which ASCII values are valid inside > multibyte charatcers, BTW. Is there a risk you'll see false slashes, > colons and things like that in paths, if you don't parse the UTF-8 > properly? (There isn't IIRC, but I'll have to read up on this.)
No. All bytes inside a multibyte character have their highest bit set. That is one good thing about UTF-8: even structured documents can be parsed without precise knowledge of the encoding, as long as it is backwards compatible with ASCII. UTF-8 is pretty easy to split, too: any byte that does not mark a new character has the bit pattern 10xxxxxx. -- Sami Perttu "Flower chase the sunshine" [EMAIL PROTECTED] http://www.cs.helsinki.fi/u/perttu
