On 10/08/16 16:15, Peter Ludikovsky wrote: > > > Am 10.08.2016 um 16:51 schrieb Pádraig Brady: >> On 10/08/16 15:21, Peter Ludikovsky wrote: >>> Package: coreutils >>> Version: 8.23-4 >>> Severity: normal >>> >>> Dear Maintainer, >>> >>> This came up due to a posting on debian-user-german [1]. Apparently >>> certain Unicode characters, at least LEFT-TO-RIGHT EMBEDDING [2] and >>> RIGHT-TO-LEFT EMBEDDING [3] do not trigger the escape code display for >>> ls with the -b option. >>> >>> An example script is attached, output: >>> >>> $ bash unicode_bidir_test.sh >>> + touch LTR >>> + touch RTL >>> + /bin/ls -l >>> total 4 >>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 LTR >>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 RTL >>> -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh >>> + /bin/ls -lb >>> total 4 >>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 LTR >>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 RTL >>> -rw-r--r--. 1 peter peter 148 Aug 10 14:00 unicode_bidir_test.sh >>> + /bin/ls -lb LTR >>> /bin/ls: cannot access LTR: No such file or directory >>> + /bin/ls -lb LTR >>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 LTR >>> + /bin/ls -lb RTL >>> /bin/ls: cannot access RTL: No such file or directory >>> + /bin/ls -lb RTL >>> -rw-r--r--. 1 peter peter 0 Aug 10 14:00 RTL >>> >>> The expected output would be that those characters be shown, as they are >>> relevant when accessing a file on the command line. >>> >>> [1] https://lists.debian.org/debian-user-german/2016/08/msg00049.html >>> [2] http://www.fileformat.info/info/unicode/char/202a/index.htm >>> [3] http://www.fileformat.info/info/unicode/char/202b/index.htm >>> >>> -- System Information: >>> Debian Release: 8.5 >>> APT prefers stable-updates >>> APT policy: (500, 'stable-updates'), (500, 'stable') >>> Architecture: amd64 (x86_64) >>> >>> Kernel: Linux 3.16.0-4-amd64 (SMP w/1 CPU core) >>> Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) >>> Shell: /bin/sh linked to /bin/dash >>> Init: systemd (via /run/systemd/system) >>> >>> Versions of packages coreutils depends on: >>> ii libacl1 2.2.52-2 >>> ii libattr1 1:2.4.47-2 >>> ii libc6 2.19-18+deb8u4 >>> ii libselinux1 2.3-2 >>> >>> coreutils recommends no packages. >>> >>> coreutils suggests no packages. >>> >>> -- no debconf information >> >> Is your locale really "C" ? >> With mine set to "C" I get: >> >> $ LANG=C ls -l >> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ???LTR >> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ???RTL >> >> $ LANG=C ls -lb >> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 \342\200\252LTR >> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 \342\200\253RTL >> >> >> With the new quoting in version 8.25 you'll get a directly >> copy and pasteable representation like: >> >> $ LANG=C ~/git/coreutils/src/ls -l >> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ''$'\342\200\252''LTR' >> -rw-rw-r--. 1 padraig padraig 0 Aug 10 15:43 ''$'\342\200\253''RTL' >> >> >> I'll need to experiment a bit with non "C" locale handling, >> and with various terminals, to see how best to handle this case. >> >> thanks, >> Pádraig >> > > Not really, I haven't set any locale on my servers intentionally. Or > rather, left it at the "POSIX"(?) default during d-i. > $ localectl status > System Locale: n/a > > VC Keymap: n/a > X11 Layout: de > X11 Model: pc105 > X11 Variant: nodeadkeys > $ cat /etc/default/locale > #LANG="C" > $ env | grep LANG > $ env | grep LC_ > $ > > With both LC_ALL=C and LANG=C it shows at least some indication that > there are other characters. But why not when no explicit locale has been > set?
Maybe because it's UTF8 based? I also noticed that in gnome-terminal you can copy/paste the hidden chars by also selecting the leading space on the file name (though that's certainly not obvious). xterm gives a visual indication of an extra char, and allows selecting it. So there is an overlap here with terminal handling of the RTL chars

