On Sun, 9 Sep 2001, Bruno Haible wrote:
> Carl W. Brown writes:
>
> > ICU has an invalid character callback handler. I use it for example to
> > convert characters that are not in the code page to HTML/XML escape
> > sequences.
>
> You can do that with iconv() as well. With iconv(), the processing
> simply stops at an invalid/unconvertible character, and the programmer
> can do any kind of error handling before restarting the conversion.
Perhaps it might be nice to extend iconv(1) (not a C lib.
function iconv(3) but a cmd line tool iconv(1) ) to add a couple of
options as to how to deal with chars not directly representable in the
target encoding. Needless to say, the default behavior should be as it
is now.
--xml : represent chars not in the target encoding/codeset
with XML NCRs
--ucv : represent chars not in the target encoding/codeset
with Unicode Scalar Value in the format of 'U+hhhh[hh]'
--ignore_invalid : just skip over invalid characters instead of stopping
at them
Jungshik Shin
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/