On Sun, 9 Sep 2001, Bruno Haible wrote:
> Carl W. Brown writes:
>
> > ICU has an invalid character callback handler.  I use it for example to
> > convert characters that are not in the code page to HTML/XML escape
> > sequences.
>
> You can do that with iconv() as well. With iconv(), the processing
> simply stops at an invalid/unconvertible character, and the programmer
> can do any kind of error handling before restarting the conversion.

  Perhaps it might be nice to extend iconv(1) (not a C lib.
function iconv(3) but a cmd line tool iconv(1) ) to add a couple of
options as to how to deal with chars not directly representable in the
target encoding. Needless to say, the default behavior should be as it
is now.

  --xml : represent chars not in the target encoding/codeset
          with XML NCRs
  --ucv : represent chars not in the target encoding/codeset
          with Unicode Scalar Value in the format of 'U+hhhh[hh]'
  --ignore_invalid : just skip over invalid characters instead of stopping
                     at them

  Jungshik Shin

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to