"H. Peter Anvin" <[EMAIL PROTECTED]> writes: > The alternate spelling > > 11000001 10001011 > > ... is not the character K <U+004B> but INVALID SEQUENCE. One > possible thing to do in a decoder is to emit U+FFFD SUBSTITUTION > CHARACTER on encountering illegal sequences. Is there any consensus whether to use one or two U+FFFD characters in such situations? For example, what do Perl, Tcl and Java here? - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
- utf-8 encoding scheme Jeu George
- Re: utf-8 encoding scheme Bruno Haible
- Re: utf-8 encoding scheme Markus Kuhn
- Re: utf-8 encoding scheme Jeu George
- Re: utf-8 encoding scheme H. Peter Anvin
- Re: utf-8 encoding scheme Henry Spencer
- Re: utf-8 encoding scheme Bruno Haible
- Re: utf-8 encoding scheme Jeu George
- Re: utf-8 encoding scheme Henry Spencer
- Re: utf-8 encoding scheme H. Peter Anvin
- Re: utf-8 encoding scheme Florian Weimer
- Re: utf-8 encoding scheme Larry Wall
- Re: utf-8 encoding scheme Florian Weimer
- Re: utf-8 encoding scheme Henry Spencer
- Re: UTF-8 and security Markus Kuhn
- Re: utf-8 encoding scheme H. Peter Anvin
- Re: utf-8 encoding scheme Henry Spencer
- Re: utf-8 encoding scheme H. Peter Anvin
- Re: utf-8 encoding scheme Bruno Haible
- Re: utf-8 encoding scheme Markus Kuhn
- Compiler options while using STL Jeu George
