Package: libc-bin Version: 2.37-7 Severity: normal File: /usr/bin/iconv Tags: patch X-Debbugs-Cc: bugs.debian....@wongs.net
Dear Maintainer, The iconv program, following POSIX, allows charmap files to be used directly for conversion without having to be compiled into a gconv module. For example, iconv -f ./palimpsest.charmap This is a very handy feature as it allows end users to quickly make custom mappings without needing to compile a gconv module. Unfortunately, due to a simple bug (using the wrong hash table), iconv scrambles the conversion when the char hash table is realloc'd. Changing `char_table` to `byte_table` in iconv/iconv_charmap.c:339 will fix this. (Patch attached.) An example file, palimpsest.char, that exercises this bug is also attached. Current version of iconv: $ echo 0123456789 | iconv -f ./palimpsest.charmap ෦꩑꧒꧓४꘥꧖෭෮෯ Patched version of iconv: $ echo 0123456789 | iconv -f ./palimpsest.charmap 0123456789 -- System Information: Debian Release: trixie/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 6.4.0-2-amd64 (SMP w/8 CPU threads; PREEMPT) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libc-bin depends on: ii libc6 2.37-7 Versions of packages libc-bin recommends: ii manpages 6.03-2 libc-bin suggests no packages. -- no debconf information
--- iconv/iconv_charmap.c.orig 2023-01-31 19:27:45.000000000 -0800 +++ iconv/iconv_charmap.c 2023-08-26 06:46:31.704552956 -0700 @@ -336,7 +336,7 @@ rettbl = allocate_table (); - while (iterate_table (&from_charmap->char_table, &ptr, &key, &keylen, &data) + while (iterate_table (&from_charmap->byte_table, &ptr, &key, &keylen, &data) >= 0) { struct charseq *in = data;
<code_set_name> ATAVISTIC-PALIMPSEST <comment_char> % <escape_char> / % alias PALIMPSEST % Test the iconv charmap file (bug present in glibc 2023-08-26). % iconv uses two hash tables: char (to byte) mapping and byte (to char). % The following charmap exercises both hash tables by forcing each of % them to realloc memory, which occurs at 75% of their initial size (257). % When the 193rd entry is added, a new hash table of twice the size is % created and the old one copied in. % Usage: % echo 0123456789 | iconv -f ./palimpsest.charmap % Correct output: % 0123456789 CHARMAP % Force char_table to realloc <U0000>..<U007F> /x00 Total: 128 UCS characters have been mapped. <UAA50>..<UAA59> /x30 138. Cham digits <UA9F0>..<UA9F9> /x30 148. Myanmar Tai Laing digits <UA9D0>..<UA9D9> /x30 158. Javanese digits <UA620>..<UA629> /x30 168. Vai digits <U0F20>..<U0F29> /x30 178. Tibetan digits <U0DE6>..<U0DEF> /x30 188. Sinhala Lith digits <U0966>..<U096F> /x30 198. Devanagri digits % Force byte_table to realloc Total: 128 Byte sequences have been mapped <U07C0>..<U07C9> /d128 138. Nko digits <U09E6>..<U09EF> /d138 148. Bengali digits <U0A66>..<U0A6F> /d148 158. Gurmukhi digits <U0AE6>..<U0AEF> /d158 168. Gujarati digits <U0B66>..<U0B6F> /d168 178. Oriya digits <UA900>..<UA909> /d178 188. Kayah Li digits <U104A0>..<U104A9> /d188 198. Osmanya digits END CHARMAP % Verbose explanation. % Multiple UCS characters are allowed to map to one particular byte % encoding, but when mapping *from* the characterset, only the first % entry is supposed to be used to find the corresponding UCS character. % Before the 193rd character is added, iconv correctly maps % bytes from 0x30 to 0x39 as the digits 0 to 9: % % $ echo 0123456789 | iconv -f ./palimpsest.charmap % 0123456789 % In the buggy version of iconv, after the 193rd character is added, % the result is garbled: % % $ echo 0123456789 | iconv -f ./palimpsest.charmap % ෦꩑꧒꧓४꘥꧖෭෮෯ % To trigger this error the same byte sequence has to be used more % than once. As mentioned above, duplicate byte sequences are supposed % to be hidden in the reverse direction. After the 193rd char entry, % the buggy version of iconv acts as if some layers have been scraped % off, revealing those underlying maps: % 0x30 ෦ U+0DE6 SINHALA LITH DIGIT ZERO % 0x31 ꩑ U+AA51 CHAM DIGIT ONE % 0x32 ꧒ U+A9D2 JAVANESE DIGIT TWO % 0x33 ꧓ U+A9D3 JAVANESE DIGIT THREE % 0x34 ४ U+096A DEVANAGARI DIGIT FOUR % 0x35 ꘥ U+A625 VAI DIGIT FIVE % 0x36 ꧖ U+A9D6 JAVANESE DIGIT SIX % 0x37 ෭ U+0DED SINHALA LITH DIGIT SEVEN % 0x38 ෮ U+0DEE SINHALA LITH DIGIT EIGHT % 0x39 ෯ U+0DEF SINHALA LITH DIGIT NINE % Analysis: when the hashtable is 75% full, memory is reallocated. % Initial hashtable size is 257 (first prime after 256) and 75% of % that is 192.75. So, realloc is triggered on the 193rd character. % This bug wasn't caused by the memory reallocation, only made % visible. iconv seemed to work previously because the iteration order % of the char_table hash just happened to match the insertion order % from the file. % The problem was triggered when the char_table, which maps from a UCS % character to the byte sequence is resized, but the bug occurred in % the reverse direction. That pointed to the solution: % iconv_charmap.c:use_from_charmap() should call iterate_table() on % byte_table, not char_table.