On 10/08/16 16:15, Peter Ludikovsky wrote:
> 
> 
> Am 10.08.2016 um 16:51 schrieb Pádraig Brady:
>> On 10/08/16 15:21, Peter Ludikovsky wrote:
>>> Package: coreutils
>>> Version: 8.23-4
>>> Severity: normal
>>>
>>> Dear Maintainer,
>>>
>>> This came up due to a posting on debian-user-german [1]. Apparently
>>> certain Unicode characters, at least LEFT-TO-RIGHT EMBEDDING [2] and
>>> RIGHT-TO-LEFT EMBEDDING [3] do not trigger the escape code display for
>>> ls with the -b option.
>>>
>>> An example script is attached, output:
>>>
>>>     $ bash unicode_bidir_test.sh 
>>>     + touch LTR
>>>     + touch RTL
>>>     + /bin/ls -l
>>>     total 4
>>>     -rw-r--r--. 1 peter peter   0 Aug 10 14:00 LTR
>>>     -rw-r--r--. 1 peter peter   0 Aug 10 14:00 RTL
>>>     -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh
>>>     + /bin/ls -lb
>>>     total 4
>>>     -rw-r--r--. 1 peter peter   0 Aug 10 14:00 LTR
>>>     -rw-r--r--. 1 peter peter   0 Aug 10 14:00 RTL
>>>     -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh
>>>     + /bin/ls -lb LTR
>>>     /bin/ls: cannot access LTR: No such file or directory
>>>     + /bin/ls -lb LTR
>>>     -rw-r--r--. 1 peter peter 0 Aug 10 14:00 LTR
>>>     + /bin/ls -lb RTL
>>>     /bin/ls: cannot access RTL: No such file or directory
>>>     + /bin/ls -lb RTL
>>>     -rw-r--r--. 1 peter peter 0 Aug 10 14:00 RTL
>>>
>>> The expected output would be that those characters be shown, as they are
>>> relevant when accessing a file on the command line.
>>>
>>> [1] https://lists.debian.org/debian-user-german/2016/08/msg00049.html
>>> [2] http://www.fileformat.info/info/unicode/char/202a/index.htm
>>> [3] http://www.fileformat.info/info/unicode/char/202b/index.htm
>>>
>>> -- System Information:
>>> Debian Release: 8.5
>>>   APT prefers stable-updates
>>>   APT policy: (500, 'stable-updates'), (500, 'stable')
>>> Architecture: amd64 (x86_64)
>>>
>>> Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core)
>>> Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
>>> Shell: /bin/sh linked to /bin/dash
>>> Init: systemd (via /run/systemd/system)
>>>
>>> Versions of packages coreutils depends on:
>>> ii  libacl1      2.2.52-2
>>> ii  libattr1     1:2.4.47-2
>>> ii  libc6        2.19-18+deb8u4
>>> ii  libselinux1  2.3-2
>>>
>>> coreutils recommends no packages.
>>>
>>> coreutils suggests no packages.
>>>
>>> -- no debconf information
>>
>> Is your locale really "C" ?
>> With mine set to "C" I get:
>>
>> $ LANG=C ls -l
>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ???LTR
>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ???RTL
>>
>> $ LANG=C ls -lb
>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 \342\200\252LTR
>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 \342\200\253RTL
>>
>>
>> With the new quoting in version 8.25 you'll get a directly
>> copy and pasteable representation like:
>>
>> $ LANG=C ~/git/coreutils/src/ls -l
>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ''$'\342\200\252''LTR'
>> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ''$'\342\200\253''RTL'
>>
>>
>> I'll need to experiment a bit with non "C" locale handling,
>> and with various terminals, to see how best to handle this case.
>>
>> thanks,
>> Pádraig
>>
> 
> Not really, I haven't set any locale on my servers intentionally. Or
> rather, left it at the "POSIX"(?) default during d-i.
>     $ localectl status
>        System Locale: n/a
> 
>            VC Keymap: n/a
>           X11 Layout: de
>            X11 Model: pc105
>          X11 Variant: nodeadkeys
>     $ cat /etc/default/locale
>     #LANG="C"
>     $ env | grep LANG
>     $ env | grep LC_
>     $
> 
> With both LC_ALL=C and LANG=C it shows at least some indication that
> there are other characters. But why not when no explicit locale has been
> set?

Maybe because it's UTF8 based?
I also noticed that in gnome-terminal you can copy/paste the hidden chars
by also selecting the leading space on the file name (though that's certainly 
not obvious).
xterm gives a visual indication of an extra char, and allows selecting it.
So there is an overlap here with terminal handling of the RTL chars

Reply via email to