https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112652

Bruno Haible <bruno at clisp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bruno at clisp dot org

--- Comment #15 from Bruno Haible <bruno at clisp dot org> ---
(In reply to Jakub Jelinek from comment #3)
> On Linux I get:
> echo á | iconv -f UTF-8 -t ASCII -; echo 😁 | iconv -f UTF-8 -t ISO-8859-1 -
> iconv: illegal input sequence at position 0
> iconv: illegal input sequence at position 0
> while on Solaris
> echo á | iconv -f UTF-8 -t ASCII -; echo 😁 | iconv -f UTF-8 -t ISO-8859-1 -
> ?
> ?
> If it maps all characters which do not have representation in the
> destination character set into ?, then it is useless for the test in
> question.

Yes, while mapping unrepresentable characters to '?' (in case of Solaris 11 and
NetBSD) or to '*' (in case of musl libc) is compliant to the iconv()
specification in POSIX, it leads to suboptimal results in some practical cases.

It is therefore commonplace to test against these substitutions where needed.
For example, here is the corresponding code from gnulib/lib/unicodeio.c:
==============================================================================
      /* Convert the character from UTF-8 to the locale's charset.  */
      size_t res = iconv (utf8_to_local,
                          (ICONV_CONST char **)&inptr, &inbytesleft,
                          &outptr, &outbytesleft);
      /* Analyze what iconv() actually did and distinguish replacements
         that are OK (no need to invoke the FAILURE callback), such as
           - replacing GREEK SMALL LETTER MU with MICRO SIGN, or
           - replacing FULLWIDTH COLON with ':', or
           - replacing a Unicode TAG character (U+E00xx) with an empty string,
         from replacements that are worse than the FAILURE callback, such as
           - replacing 'ç' with '?' (NetBSD, Solaris 11) or '*' (musl).  */
      if (inbytesleft > 0 || res == (size_t)(-1)
          /* FreeBSD iconv(), NetBSD iconv(), and Solaris 11 iconv() insert
             a '?' if they cannot convert.  */
# if !defined _LIBICONV_VERSION || (_LIBICONV_VERSION == 0x10b && defined
__APPLE__)
          || (res > 0 && outptr - outbuf == 1 && *outbuf == '?')
# endif
          /* musl libc iconv() inserts a '*' if it cannot convert.  */
# if !defined _LIBICONV_VERSION && MUSL_LIBC
          || (res > 0 && outptr - outbuf == 1 && *outbuf == '*')
# endif
         )
        return failure (code, NULL, callback_arg);
==============================================================================

Reply via email to