Chet Ramey <chet.ra...@case.edu> wrote, on 16 May 2017: > > On 5/16/17 6:33 AM, Robert Elz wrote: > > > If we start having shell parsing differently depending on what locale the > > user happens to be using, we may as well all give up now, and go find > > something else to do. > > Too late: > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03 > > 7. If the current character is an unquoted <blank>, any token containing > the previous character is delimited and the current character shall be > discarded.
That would appear to be a bug in the standard, as it doesn't match existing practice in any of the shells I tried (with a UTF-8 locale): $ printf 'echo\u00a0foo\n' | grep '[[:blank:]]' echo foo $ printf 'echo\u00a0foo\n' | sh sh: echo�: not found $ printf 'echo\u00a0foo\n' | ksh ksh[1]: echo foo: not found [No such file or directory] $ printf 'echo\u00a0foo\n' | bash bash: line 1: echo foo: command not found $ printf 'echo\u00a0foo\n' | POSIXLY_CORRECT=1 bash bash: line 1: echo foo: command not found (This was on Solaris 11: "sh" is ksh88 and "ksh" is ksh93.) Judging by the ksh88 error message it looks like it treated the a0 byte of the Unicode NO-BREAK SPACE as a delimiter, so it might use all <blank> characters as delimiters in a single-byte locale, but not doing it for multibyte characters means it doesn't behave as described in the standard. We should change "<blank>" to "<space> or <tab>" in that text. -- Geoff Clare <g.cl...@opengroup.org> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England