locale-dependent shell parsing (was: sh(1): is roundtripping of the positional parameter stack possible?)

Geoff Clare Wed, 17 May 2017 01:38:19 -0700

Chet Ramey <chet.ra...@case.edu> wrote, on 16 May 2017:
>
> On 5/16/17 6:33 AM, Robert Elz wrote:
> 
> > If we start having shell parsing differently depending on what locale the
> > user happens to be using, we may as well all give up now, and go find
> > something else to do.
> 
> Too late:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_03
> 
> 7. If the current character is an unquoted <blank>, any token containing
> the previous character is delimited and the current character shall be
> discarded.


That would appear to be a bug in the standard, as it doesn't match
existing practice in any of the shells I tried (with a UTF-8 locale):

$ printf 'echo\u00a0foo\n' | grep '[[:blank:]]'
echo foo
$ printf 'echo\u00a0foo\n' | sh                
sh: echo�:  not found
$ printf 'echo\u00a0foo\n' | ksh
ksh[1]: echo foo: not found [No such file or directory]
$ printf 'echo\u00a0foo\n' | bash
bash: line 1: echo foo: command not found
$ printf 'echo\u00a0foo\n' | POSIXLY_CORRECT=1 bash
bash: line 1: echo foo: command not found

(This was on Solaris 11: "sh" is ksh88 and "ksh" is ksh93.)

Judging by the ksh88 error message it looks like it treated the a0
byte of the Unicode NO-BREAK SPACE as a delimiter, so it might use
all <blank> characters as delimiters in a single-byte locale, but
not doing it for multibyte characters means it doesn't behave as
described in the standard.

We should change "<blank>" to "<space> or <tab>" in that text.

-- 
Geoff Clare <g.cl...@opengroup.org>
The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England

locale-dependent shell parsing (was: sh(1): is roundtripping of the positional parameter stack possible?)

Reply via email to