Package: coreutils Version: 5.97-5.3 The POSIX documentation of the expand utility is fairly clear that it should operate in terms of "column positions", not byte counts, and since the value of LC_CTYPE is used "for the determination of the width in column positions each character would occupy" wide characters are obviously supposed to be supported.
GNU coreutils expand doesn't appear to do this, as the transcript below indicates. The test file contains a line with two Japanese (double width) characters, then a tab, then the string 'hello', followed by a line with two single width characters, a tab, and the string 'hello'. When the output is sent raw to the console the tabs are interpreted correctly and the two 'hello's line up. expand expands the tab in the first line into just two spaces because it treats the two Japanese characters as six columns (because they are six bytes in UTF-8) rather than 4 columns (two double width characters). mnementh$ echo $LC_CTYPE ja_JP.UTF-8 mnementh$ cat /tmp/zz9 あま hello am hello mnementh$ od -tx1 /tmp/zz9 0000000 e3 81 82 e3 81 be 09 68 65 6c 6c 6f 0a 61 6d 09 0000020 68 65 6c 6c 6f 0a 0000026 mnementh$ expand /tmp/zz9 あま hello am hello mnementh$ expand /tmp/zz9 | od -tx1 0000000 e3 81 82 e3 81 be 20 20 68 65 6c 6c 6f 0a 61 6d 0000020 20 20 20 20 20 20 68 65 6c 6c 6f 0a 0000034 mnementh$ The expected output is that the output of 'expand' should match the output of dumping the file raw to the console. Looking at the coreutils source the main loop of expand is simply calling getc() and has no wide character support at all. (FWIW, Ubuntu lenny's coreutils 6.10 has the same issue, so I don't think it's been fixed upstream yet.) -- PMM -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org