Package: libc-bin
Version: 2.37-7
Severity: normal
File: /usr/bin/iconv
Tags: patch
X-Debbugs-Cc: bugs.debian....@wongs.net

Dear Maintainer,

The iconv program, following POSIX, allows charmap files to be used
directly for conversion without having to be compiled into a gconv
module. For example,

    iconv -f ./palimpsest.charmap

This is a very handy feature as it allows end users to quickly make
custom mappings without needing to compile a gconv module.
Unfortunately, due to a simple bug (using the wrong hash table), iconv
scrambles the conversion when the char hash table is realloc'd.

Changing `char_table` to `byte_table` in iconv/iconv_charmap.c:339
will fix this. (Patch attached.)

An example file, palimpsest.char, that exercises this bug is also
attached.

Current version of iconv:

        $ echo 0123456789 | iconv -f ./palimpsest.charmap
        ෦꩑꧒꧓४꘥꧖෭෮෯

Patched version of iconv:

        $ echo 0123456789 | iconv -f ./palimpsest.charmap
        0123456789



-- System Information:
Debian Release: trixie/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.4.0-2-amd64 (SMP w/8 CPU threads; PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libc-bin depends on:
ii  libc6  2.37-7

Versions of packages libc-bin recommends:
ii  manpages  6.03-2

libc-bin suggests no packages.

-- no debconf information
--- iconv/iconv_charmap.c.orig  2023-01-31 19:27:45.000000000 -0800
+++ iconv/iconv_charmap.c       2023-08-26 06:46:31.704552956 -0700
@@ -336,7 +336,7 @@
 
   rettbl = allocate_table ();
 
-  while (iterate_table (&from_charmap->char_table, &ptr, &key, &keylen, &data)
+  while (iterate_table (&from_charmap->byte_table, &ptr, &key, &keylen, &data)
         >= 0)
     {
       struct charseq *in = data;
<code_set_name> ATAVISTIC-PALIMPSEST
<comment_char> %
<escape_char> /
% alias PALIMPSEST

% Test the iconv charmap file (bug present in glibc 2023-08-26).

% iconv uses two hash tables: char (to byte) mapping and byte (to char).
% The following charmap exercises both hash tables by forcing each of
% them to realloc memory, which occurs at 75% of their initial size (257).
% When the 193rd entry is added, a new hash table of twice the size is
% created and the old one copied in.

% Usage: 
%       echo 0123456789 | iconv -f ./palimpsest.charmap

% Correct output:
%       0123456789

CHARMAP

% Force char_table to realloc
<U0000>..<U007F>        /x00    Total: 128 UCS characters have been mapped.
<UAA50>..<UAA59>        /x30           138. Cham digits
<UA9F0>..<UA9F9>        /x30           148. Myanmar Tai Laing digits
<UA9D0>..<UA9D9>        /x30           158. Javanese digits
<UA620>..<UA629>        /x30           168. Vai digits
<U0F20>..<U0F29>        /x30           178. Tibetan digits
<U0DE6>..<U0DEF>        /x30           188. Sinhala Lith digits
<U0966>..<U096F>        /x30           198. Devanagri digits

% Force byte_table to realloc   Total: 128 Byte sequences have been mapped
<U07C0>..<U07C9>        /d128          138. Nko digits
<U09E6>..<U09EF>        /d138          148. Bengali digits
<U0A66>..<U0A6F>        /d148          158. Gurmukhi digits
<U0AE6>..<U0AEF>        /d158          168. Gujarati digits
<U0B66>..<U0B6F>        /d168          178. Oriya digits
<UA900>..<UA909>        /d178          188. Kayah Li digits
<U104A0>..<U104A9>      /d188          198. Osmanya digits

END CHARMAP



% Verbose explanation.

% Multiple UCS characters are allowed to map to one particular byte
% encoding, but when mapping *from* the characterset, only the first
% entry is supposed to be used to find the corresponding UCS character.
 
% Before the 193rd character is added, iconv correctly maps
% bytes from 0x30 to 0x39 as the digits 0 to 9:
%
%       $ echo 0123456789 | iconv -f ./palimpsest.charmap
%       0123456789

% In the buggy version of iconv, after the 193rd character is added,
% the result is garbled:
%
%       $ echo 0123456789 | iconv -f ./palimpsest.charmap
%       ෦꩑꧒꧓४꘥꧖෭෮෯

% To trigger this error the same byte sequence has to be used more
% than once. As mentioned above, duplicate byte sequences are supposed
% to be hidden in the reverse direction. After the 193rd char entry,
% the buggy version of iconv acts as if some layers have been scraped
% off, revealing those underlying maps:

%   0x30        ෦       U+0DE6  SINHALA LITH DIGIT ZERO
%   0x31        ꩑       U+AA51  CHAM DIGIT ONE
%   0x32        ꧒       U+A9D2  JAVANESE DIGIT TWO
%   0x33        ꧓       U+A9D3  JAVANESE DIGIT THREE
%   0x34        ४       U+096A  DEVANAGARI DIGIT FOUR
%   0x35        ꘥       U+A625  VAI DIGIT FIVE
%   0x36        ꧖       U+A9D6  JAVANESE DIGIT SIX
%   0x37        ෭       U+0DED  SINHALA LITH DIGIT SEVEN
%   0x38        ෮       U+0DEE  SINHALA LITH DIGIT EIGHT
%   0x39        ෯       U+0DEF  SINHALA LITH DIGIT NINE


% Analysis: when the hashtable is 75% full, memory is reallocated.
% Initial hashtable size is 257 (first prime after 256) and 75% of
% that is 192.75. So, realloc is triggered on the 193rd character.

% This bug wasn't caused by the memory reallocation, only made
% visible. iconv seemed to work previously because the iteration order
% of the char_table hash just happened to match the insertion order
% from the file.

% The problem was triggered when the char_table, which maps from a UCS
% character to the byte sequence is resized, but the bug occurred in
% the reverse direction. That pointed to the solution:
% iconv_charmap.c:use_from_charmap() should call iterate_table() on
% byte_table, not char_table.

Reply via email to