On Thu, Mar 01, 2007 at 09:41:44AM +0100, Marcel Ruff wrote: > > >>Are you thinking of Java's _modified_ version of UTF-8 > >>(http://en.wikipedia.org/wiki/UTF-8#Java)? > >> > > > >Uhg, disgusting... > > > Yes - this is an open & serious issue for my approach! > > Has anybody some practical advice on this?
Just treat the sequence c0 80 according to the spec, as an invalid sequence. Neither it (because it's illegal utf-8) nor a real NUL (because it's illegal in text) should appear. If your problem is more specific and there's a real reason you need to handle such data differently, please describe what you're doing so we can offer better advice. Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
