Markus Kuhn wrote:
> I think, Juliusz has already understood that naively using iconv() alone
> might not necessarily be well suited well for luit, because it doesn't
> resynchronize all encodings cleverly. You need a bit additional logic. If
> you press ^C in an application that spits out BIG5 in an unfortunate
> moment or truncate a string by counting bytes, then you will loose BIG5
> synchronization, and the terminal has to skip characters in the input
> stream until is finds two G0 characters in a row to be sure again where
> the next character starts. BIG5 is an example of a rather messy encoding,
> not only in that respect.
iconv() itself doesn't resynchronize, but it is easy to resynchronize
using iconv(). It needs less than 10 lines of code. Both the GNU
Compiler for Java and a new gettext PO file lexer that I wrote last
week are based on iconv() and do support resynchronization. The
resynchronization is simple: Whenever iconv() returns -1/EILSEQ, skip
1 byte.
> ISO 2022 is far worse.
Yes. How do you want to "resynchronize" when an Escape sequence was
dropped during transmission? You can only try an arbitrary ISO 2022
state and hope it's the correct one.
Bruno
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/