Package: coreutils
Version: 5.97-5.3

The POSIX documentation of the expand utility is fairly clear that it
should operate in terms of "column positions", not byte counts, and
since the value of LC_CTYPE is used "for the determination of the
width in column positions each character would occupy" wide characters
are obviously supposed to be supported.

GNU coreutils expand doesn't appear to do this, as the transcript
below indicates. The test file contains a line with two Japanese
(double width) characters, then a tab, then the string 'hello',
followed by a line with two single width characters, a tab, and
the string 'hello'.

When the output is sent raw to the console the tabs are interpreted
correctly and the two 'hello's line up. expand expands the tab in
the first line into just two spaces because it treats the two
Japanese characters as six columns (because they are six bytes in
UTF-8) rather than 4 columns (two double width characters).

mnementh$ echo $LC_CTYPE
ja_JP.UTF-8
mnementh$ cat /tmp/zz9
あま    hello
am      hello
mnementh$ od -tx1 /tmp/zz9
0000000 e3 81 82 e3 81 be 09 68 65 6c 6c 6f 0a 61 6d 09
0000020 68 65 6c 6c 6f 0a
0000026
mnementh$ expand /tmp/zz9
あま  hello
am      hello
mnementh$ expand /tmp/zz9 | od -tx1
0000000 e3 81 82 e3 81 be 20 20 68 65 6c 6c 6f 0a 61 6d
0000020 20 20 20 20 20 20 68 65 6c 6c 6f 0a
0000034
mnementh$

The expected output is that the output of 'expand' should
match the output of dumping the file raw to the console.

Looking at the coreutils source the main loop of expand is
simply calling getc() and has no wide character support at all.

(FWIW, Ubuntu lenny's coreutils 6.10 has the same issue, so I
don't think it's been fixed upstream yet.)

-- PMM



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to