2017-05-17 09:37:31 +0100, Geoff Clare:
[...]
> That would appear to be a bug in the standard, as it doesn't match
> existing practice in any of the shells I tried (with a UTF-8 locale):
> 
> $ printf 'echo\u00a0foo\n' | grep '[[:blank:]]'
> echo foo
> $ printf 'echo\u00a0foo\n' | sh                
> sh: echo�:  not found
> $ printf 'echo\u00a0foo\n' | ksh
> ksh[1]: echo foo: not found [No such file or directory]
> $ printf 'echo\u00a0foo\n' | bash
> bash: line 1: echo foo: command not found
> $ printf 'echo\u00a0foo\n' | POSIXLY_CORRECT=1 bash
> bash: line 1: echo foo: command not found
> 
> (This was on Solaris 11: "sh" is ksh88 and "ksh" is ksh93.)
> 
> Judging by the ksh88 error message it looks like it treated the a0
> byte of the Unicode NO-BREAK SPACE as a delimiter, so it might use
> all <blank> characters as delimiters in a single-byte locale, but
> not doing it for multibyte characters means it doesn't behave as
> described in the standard.
[...]

bash has a similar issue. It treats [:blank:] as delimiters
only in locales with single-byte charsets. See
https://lists.gnu.org/archive/html/bug-bash/2014-10/msg00098.html
and the whole discussion there.

I wasn't aware that ksh88 also honoured the locales for the
shell syntax tokenisation.

Note that the issue is not only about the shell language. See
also awk (syntax, field parsing, string numerification...), bc,
xargs, m4...

BTW, U+00A0, should really not be a [:blank:] or [:space:].
That's the whole point of that "non-breaking space" character.
That's a known oddity of Solaris. (that makes it the only
single-byte blank I'm aware of, though of course one may always
construct a rogue locale that has more).

-- 
Stephane

Reply via email to