A NOTE has been added to this issue. ====================================================================== https://austingroupbugs.net/view.php?id=1635 ====================================================================== Reported By: steffen Assigned To: ====================================================================== Project: 1003.1(2016/18)/Issue7+TC2 Issue ID: 1635 Category: Base Definitions and Headers Type: Clarification Requested Severity: Editorial Priority: normal Status: New Name: steffen Organization: User Reference: Section: iconv Page Number: 1123 Line Number: 38014 Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2023-02-21 00:14 UTC Last Modified: 2024-06-11 23:42 UTC ====================================================================== Summary: iconv: please be more explicit in input-not-convertible case ====================================================================== Relationships ID Summary ---------------------------------------------------------------------- related to 0001007 iconv function not allowed to fail to c... ======================================================================
---------------------------------------------------------------------- (0006812) bhaible (reporter) - 2024-06-11 23:42 https://austingroupbugs.net/view.php?id=1635#c6812 ---------------------------------------------------------------------- Regarding the case "when a valid sequence in the source codeset cannot be represented in the destination codeset." Here's how the various implementations behave (in case "when a valid sequence in the source codeset cannot be represented in the destination codeset"): * GNU libc and GNU libiconv and win-iconv (https://github.com/win-iconv/win-iconv): - They fail the conversion with EILSEQ, when the to_codeset did not have a //TRANSLIT or //IGNORE suffix. - If the to_codeset had a //IGNORE suffix, the character is discarded, i.e. produces 0 bytes in the output. - If the to_codeset had a //TRANSLIT suffix, then a transliteration is attempted. It may do substitutions such as ½ → 1/2 or å → aa. Transliterations between scripts (e.g. from cyrillic to latin script) are generally not done. * musl libc uses produces a '*' character in the output. * FreeBSD, NetBSD produce a '?' character in the output. * Solaris attempts a transliteration if enabled, otherwise it produces a '?' character in the output. * IRIX produces a NUL character in the output. * macOS 14 iconv always does transliteration, - regardless whether a //TRANSLIT suffix was present in to_codeset or not, - regardless whether a //IGNORE suffix was present in to_codeset or not, - regardless whether iconvctl ICONV_SET_TRANSLITERATE was done on the conversion descriptor, - regardless whether iconvctl ICONV_SET_DISCARD_ILSEQ was done on the conversion descriptor, - regardless whether iconvctl ICONV_SET_ILSEQ_INVALID was done on the conversion descriptor. The transliteration result depends on the input character. In some cases, the result is merely a '?' character. And the return value (count of "non-identical conversions") is always 0. Issue History Date Modified Username Field Change ====================================================================== 2023-02-21 00:14 steffen New Issue 2023-02-21 00:14 steffen Name => steffen 2023-02-21 00:14 steffen Section => iconv 2023-02-21 00:14 steffen Page Number => 1123 2023-02-21 00:14 steffen Line Number => 38014 2023-02-21 18:20 steffen Note Added: 0006164 2023-03-06 16:35 nick Relationship added related to 0001007 2024-06-11 23:42 bhaible Note Added: 0006812 ======================================================================
