2019-06-26 23:56:06 +0100, Harald van Dijk: [...] > You are proposing a fundamental change to the design of pattern matching, > not a clarification as you previously called it, and you are now discussing > how to allow the behaviour of one specific shell that does not behave the > way you like, but not the other shells that also do not behave the way you > like, when those other shells were not only changed intentionally to get > more consistent behaviour, at least in my case as the result of a user > request, but also because that more consistent behaviour is required by the > current version of POSIX, solely because of theoretical problems with file > names specifically crafted to break scripts, file names that are not > actually used in the wild. [...]
I'm not a shell implementer. I'm on the side of the application writer, I want to be able to write portable shell scripts, and POSIX (*Portable* Operating System *Interface*) is meant to work for me. It's meant to tell me what I can and cannot write in my script and the behaviour to expect. It's meant to help you the implementer write your shell so that it can interpret my portable script the way it's meant to. Today, by your reading of the spec and I agree it can be seen as a valid reading, the spec is telling me that: 1. a='\.' printf '%s\n' $a is a portable script that is meant to output "." 2. a='\**' printf '%s\n' $a is a portable script that is meant to list the filenames that start with "*" in the current directory 3. pattern='*;*' case $var in ($pattern) echo yes; esac is a non-standard, non-portable script with unspecified behaviour because shell implementations are free to use that ";" as an extended glob operator. 4. string='@(foo)' echo $string is a non-standard, non-portable script which is not guaranteed to output @(foo). 5. string='@(foo)' case $string in $string) echo yes; esac is a non-standard, non-portable script which is not guaranteed to output "yes". 6. pattern='@(*)' case "@(foo)" in $pattern) echo yes; esac is a non-standard, non-portable script which is not guaranteed to output "yes". 1 and 2 is the reason I raised bug 1234. 1 couldn't be furthest away from the truth. Only bash5 exhibits that behaviour and it's evident it's a bad idea. It's evident that it was not the intention of the spec as no shell at the time it was written did it. Even if POSIX made it very explicit that 1 is required to behave as described above, I could probably not call it a portable script in a million year, as I'd expect shell implementations would rather keep their backward compatibility than implement that unreasonable requirement (which IMO doesn't help at all with consistency). So the spec is wrong and needs to be fixed. 2 is slightly more portable, but even in those shells where it does that, that's not because they implement \ processing the way POSIX seems to specify it, and all do it a different way. I'm not opposing POSIX *allows* a \ in an unquoted word expansion to have a special meaning when it's preceding *, ? and [ as that's what several implementations do and it's not breaking that many common shell usages. 3 is portable in practice. And I should be able to rely on it. I'd rather POSIX doesn't open the door for a shell (or fnmatch()...) to choose ; to be a new glob operator, I would rather the sh glob operators stay ?, [] and * (and \ now added because of those shells that treat it specially), so I know which to escape (with quoting (or \ in fnmatch()) or [...] when in word expansions) or to look out for. Several shells have some of those operators but they are not enabled in posix/sh mode so they interpret sh scripts like sh is meant to. 4 is portable in practice. 5 as well but only because of the buggy fallback string comparison in ksh93. 6 is the only one that is true. Yes, there is *one* shell (a shell generally considered "experimental" and not in wide use) where that won't work as expected (won't output yes) as that's one case where ksh93's extended glob operator is conflicting with sh compatibility. It's not consistent with 4 there. Geoff's proposing to fix that inconsistency to allow that operator to be used for pathname expansion, but I believe it would be more reasonable to fix it by not allowing it for "case" (make 6 a portable script again) to make the standard consistent and clear. Then ksh93 could enable those extended operators wherever it likes when called as ksh, but not when called as sh (at least not in the result of word expansions; basically reverting to ksh88 behaviour). I could be convinced that it makes sense for the ksh93 X(...) operators to be allowed if there was one non-anecdotal implementation of fnmatch() that implemented it, but I don't think there it. find implementations usually have a -regex predicate to do things that basic globs can't do instead. I also like the idea of opening up a way for shell wildcards to be extended in the future, but it's a dangerous business. Today in practice, scripts doing things like "find . -name '*([0-9]).mp3'" to match on "foo(2).mp3" are likely to exist and would be broken if "find" (fnmatch()) started to implement that *(...) ksh operator (worse for the # or (..|..) or ~ or ^ extended zsh operators (which are not enabled in sh mode)). Despite that pax example in the POSIX rationale that you quoted earlier, I don't expect many people are aware that POSIX currently leaves the behaviour unspecified and requires them to write find . -name '*\([0-9]\).mp3' or find . -name '*\ *.mp3' and even find . -name '*[\ ]*.mp3' In practice, they don't need to escape those " ", "(" even though they would need to quote them when used in a shell glob. -- Stephane