Vincent Lefevre wrote: > > Therefore: can you also show wrong behaviour when you set > > LC_ALL=en_US.UTF-8 ? > > Yes: > > prunille:~/blah> export LC_ALL=en_US.UTF-8 > prunille:~/blah> locale > LANG="POSIX" > LC_COLLATE="en_US.UTF-8" > LC_CTYPE="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_ALL="en_US.UTF-8" > prunille:~/blah> ls > É y123456789012345678901234567890 > x123456789012345678901234567890 z123456789012345678901234567890
On MacOS X 10.3.9 I can reproduce this. Let's look at the hexdump of ls' output: 1) In an Apple Terminal 2) In an xterm, launched with "LC_ALL=en_US.UTF-8 xterm" 3) In an xterm running on Linux, with an ssh to MacOS X In all three cases the output of ls is the same: $ LC_ALL=en_US.UTF-8 ls -C | hd 000000 45 CC 81 09 09 09 09 20 79 31 32 33 34 35 36 37 E...... y1234567 000010 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 8901234567890123 000020 34 35 36 37 38 39 30 0A 78 31 32 33 34 35 36 37 4567890.x1234567 000030 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 8901234567890123 000040 34 35 36 37 38 39 30 20 20 7A 31 32 33 34 35 36 4567890 z123456 000050 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 7890123456789012 000060 33 34 35 36 37 38 39 30 0A 34567890. You see, it starts with E, the accent - on MacOS X, filenames are represented in decomposed Unicode form -, 4 tabs and a space. So that the second column of filenames should start in screen column 33 (where the leftmost is screen column 0). But the output in the terminal looks like this: 1) In an Apple Terminal É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 2), 3) É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 So what you see is that Apple Terminal has problems knowing the width of combining characters like accents when it expands tabs. If you tell 'ls' to emit spaces instead of tabs, like this: ls -C -T0 or TABSIZE=0 ls -C then the output looks the same in all kinds of terminals. Conclusion: What you see is not an ls bug, but an Apple Terminal bug with tabs. But there is an ls bug: $ ls -C -T0 É y123456789012345678901234567890 x123456789012345678901234567890 z123456789012345678901234567890 $ ls -C -T0 | hd 000000 45 CC 81 20 20 20 20 20 20 20 20 20 20 20 20 20 E.. 000010 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 000020 20 20 79 31 32 33 34 35 36 37 38 39 30 31 32 33 y1234567890123 000030 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789 000040 30 0A 78 31 32 33 34 35 36 37 38 39 30 31 32 33 0.x1234567890123 000050 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 4567890123456789 000060 30 20 20 7A 31 32 33 34 35 36 37 38 39 30 31 32 0 z123456789012 000070 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 3456789012345678 000080 39 30 0A 90. What 'ls' here outputs is: an E, a combining accent and 31 spaces - text that moves to column 32, not 33. When I set a breakpoint in wcwidth, I see that the first call to wcwidth() gives: wcwidth(0x0301) = 1. U+0301 is COMBINING ACUTE ACCENT. So here is the problem: MacOS' wcwidth is buggy for combining characters like accents. Bruno (*) 'hd' is a shell script: #!/bin/sh hexdump -e '"%06.6_ax " 16/1 "%02X "' -e '" " 16/1 "%_p" "\n"' "$@" _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
