2019-06-26 23:56:06 +0100, Harald van Dijk:
[...]
> You are proposing a fundamental change to the design of pattern matching,
> not a clarification as you previously called it, and you are now discussing
> how to allow the behaviour of one specific shell that does not behave the
> way you like, but not the other shells that also do not behave the way you
> like, when those other shells were not only changed intentionally to get
> more consistent behaviour, at least in my case as the result of a user
> request, but also because that more consistent behaviour is required by the
> current version of POSIX, solely because of theoretical problems with file
> names specifically crafted to break scripts, file names that are not
> actually used in the wild.
[...]

I'm not a shell implementer. I'm on the side of the application
writer, I want to be able to write portable shell scripts, and
POSIX (*Portable* Operating System *Interface*) is meant to work
for me. It's meant to tell me what I can and cannot write in my
script and the behaviour to expect. It's meant to help you the
implementer write your shell so that it can interpret my
portable script the way it's meant to.

Today, by your reading of the spec and I agree it can be seen as
a valid reading, the spec is telling me that:

1.

a='\.'
printf '%s\n' $a

is a portable script that is meant to output "."

2.

a='\**'
printf '%s\n' $a

is a portable script that is meant to list the filenames that
start with "*" in the current directory

3.

pattern='*;*'
case $var in ($pattern) echo yes; esac

is a non-standard, non-portable script with unspecified
behaviour because shell implementations are free to use that ";"
as an extended glob operator.

4.

string='@(foo)'
echo $string

is a non-standard, non-portable script which is not guaranteed
to output @(foo).

5.

string='@(foo)'
case $string in $string) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".

6.

pattern='@(*)'
case "@(foo)" in $pattern) echo yes; esac

is a non-standard, non-portable script which is not guaranteed
to output "yes".


1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest
away from the truth. Only bash5 exhibits that behaviour and it's
evident it's a bad idea. It's evident that it was not the
intention of the spec as no shell at the time it was written did
it. Even if POSIX made it very explicit that 1 is required to
behave as described above, I could probably not call it a
portable script in a million year, as I'd expect shell
implementations would rather keep their backward
compatibility than implement that unreasonable requirement
(which IMO doesn't help at all with consistency). So the spec is
wrong and needs to be fixed.

2 is slightly more portable, but even in those shells where it
does that, that's not because they implement \ processing the
way POSIX seems to specify it, and all do it a different way.
I'm not opposing POSIX *allows* a \ in an unquoted word
expansion to have a special meaning when it's preceding *, ? and
[ as that's what several implementations do and it's not
breaking that many common shell usages.

3 is portable in practice. And I should be able to rely on it.
I'd rather POSIX doesn't open the door for a shell (or
fnmatch()...) to choose ; to be a new glob operator, I would
rather the sh glob operators stay ?, [] and * (and \ now added
because of those shells that treat it specially), so I know
which to escape (with quoting (or \ in fnmatch()) or [...] when
in word expansions) or to look out for. Several shells have some
of those operators but they are not enabled in posix/sh mode so
they interpret sh scripts like sh is meant to.

4 is portable in practice. 5 as well but only because of the
buggy fallback string comparison in ksh93.

6 is the only one that is true. Yes, there is *one* shell (a
shell generally considered "experimental" and not in wide use)
where that won't work as expected (won't output yes) as that's
one case where ksh93's extended glob operator is conflicting
with sh compatibility. It's not consistent with 4 there. Geoff's
proposing to fix that inconsistency to allow that operator to be
used for pathname expansion, but I believe it would be more
reasonable to fix it by not allowing it for "case" (make 6 a
portable script again) to make the standard consistent and
clear. Then ksh93 could enable those extended operators wherever
it likes when called as ksh, but not when called as sh (at least
not in the result of word expansions; basically reverting to
ksh88 behaviour).

I could be convinced that it makes sense for the ksh93 X(...)
operators to be allowed if there was one non-anecdotal
implementation of fnmatch() that implemented it, but I don't
think there it. find implementations usually have a -regex
predicate to do things that basic globs can't do instead.

I also like the idea of opening up a way for shell wildcards to
be extended in the future, but it's a dangerous business. Today
in practice, scripts doing things like "find . -name
'*([0-9]).mp3'" to match on "foo(2).mp3" are likely to exist and
would be broken if "find" (fnmatch()) started to implement that
*(...) ksh operator (worse for the # or (..|..) or ~ or ^
extended zsh operators (which are not enabled in sh mode)).

Despite that pax example in the POSIX rationale that you quoted
earlier, I don't expect many people are aware that POSIX
currently leaves the behaviour unspecified and requires them to
write

find . -name '*\([0-9]\).mp3' or
find . -name '*\ *.mp3' and even
find . -name '*[\ ]*.mp3'

In practice, they don't need to escape those " ", "(" even
though they would need to quote them when used in a shell glob.

-- 
Stephane

Reply via email to