2019-06-15 06:49:27 +0700, Robert Elz: [...] > | And again, there is no such thing as a "pattern quoting char" in > | most sh implementations including ksh88 (and all its > | predecessors, > > You're right, there wasn't, but (perhaps by oversight, perhaps > deliberately) it was added to the standard sometime in the past > (I presume when all the glob style matching specification was > merged into one piece of text). > > What's more, it is a good thing, as it makes sh pattern matching > more regular, with less surprises. [...] > | Again the portable way for a pattern stored in a variable to > | match on something that starts with \ is: > | > | pattern='[\\]*' > > Today, that's probably true for script writers, but there's no > good excuse for shells not to DTRT here, even if scripts cannot > currently count upon it. [...]
You're entitled to your opinion. In mine, it is much closer to "Plain Wrong" than the "The Right Thing". Again, the shell has its quoting operators, fnmatch has its quoting operators (inspired from the shell's), bringing fnmatch quoting back to the shell as an extra layer, separate from the shell quoting is just plain wrong to me. The points I'm going to make below I've made several times here, but I'll make them again (even expand them), hopefully clearer as none of the proponents of that extra \ handling have replied to them yet IIRC. They'll show that if POSIX really want to mandate such a behaviour, they still have a lot of work to do and will probably need several pages of specification. First, I think we all agree on that: \, like '...' and "..." (and soon $'...') when literal in the shell code are quoting operators. As such, you can only have an unquoted \ when it's in the result of an unquoted parameter expansion or command substitution. In: echo $(echo "'") The ' is passed quoted to the inner echo, and unquoted to the outer one. The suggestion (the "Right Thing"/"Plain Wrong" thing we're arguing on), is for \ (not ', not ") to then be treated as a "wildcard quoting operator" *in that case*. So we're introducing a "corner case feature" That In: printf '%s\n' $(printf %s '\x') (here using printf instead of echo as echo does its own backslash processing), the inner \ is quoted so passed literally to the inner printf, but because the command substitution is not quoted, the \ is unquoted and is then treated as wildcard quoting operator. Now, I'm going to ask several questions, and attempt to answer some of them based on the behaviour I observe in the few shells that implement that extra processing. I even went as far as setting up a new system with the latest version of NetBSD (8.1) to try and find out what your idea of "The Right Thing" was. I'll call that shell kresh for short. I'll compare with bash (5.0.7), ksh93 (u+), dash (0.5.8) and zsh (5.7.1) in sh emulation (which I'll call zshsh here). So, is the unquoted-backslash "wildcard quoting operator" meant to be a wildcard operator in its own right at the same level as ?, *, [, that triggers globbing/pattern-matching wherever it is found in the same contexts as ?, *, [ or is it to be treated specially ? Well, it's complicated by the fact that in most of those shells, the behaviour differs when the pattern is used for pattern matching (in case, ${var#$pattern}, etc) and in globbing [1] (I'm numbering the exceptions and exceptions to exceptions to that corner case feature, so we can remember that we need to specify them all). Maybe I should have asked that question first. Is it considered as a wildcard operator for both or only pattern matching [2]? When it comes to pattern matching (like in "case" or ${var#$pattern}, it looks like most of those shells recognise it as a wildcard quoting operator most of the time. a='\\' b='\' sh -c 'case $b in $a) echo x; esac' a='\*' b='*' sh -c 'case $b in $a) echo y; esac' outputs x and y in all of them, and a='\\' sh -c 'case $a in $a) echo x; esac' ouputs x in none of them (even ksh93 where even a='[a]' ksh -c 'case $a in $a) echo x; esac' outputs x otherwise. [\a] matches on [a] though [3/bug]). For a='\a' b='\' sh -c 'case $b in $a) echo x; esac' however, zshsh doesn't output x [4], \ is only used to quote wildcard operators (including ^ and - even outside of bracket expressions [5]) and itself there. So it's different from fnmatch, but at least it's not breaking backward compatibility with the Bourne shell as much. There are exceptions [6] in ksh93 as well: a='[\d]' b='d' sh -c 'case $b in $a) echo x; esac' outputs nothing. (\d matches on digits instead and there are dozens more like that). All in all, if we look at pattern matching alone, it's not so bad and is actually usable. Or it wouldn't be so bad if it was documented. I've looked at the manual of all those shells, not a single one documents that extra backslash processing. For the past few decades, I've been trying to educate people to quote their expansions because of the dangerous effect when they contain IFS characters, ?, * or [. Recently, I learned that {,} was also a problem in ksh93 and pdksh. \ is new to me (even though it's been a wildcard operator probably since as far back as 1993 in ksh93 and 1996 in bash). Where it gets ugly though is when we get to globbing. So if \ is a wildcard operator, then file='\a' ls -d -- $file Should list the "a" file if it exists, like find . -name "$file" would. Well, that seems to only happen in bash (and it seems to have changed since bash4, a change that also breaks backward compatibility). [7] In the others, that doesn't happen. First, we can rule out ksh93 and dash where \ is not a wildcard quoting operator when used in globbing (the answer to one of the above questions) [8]. But even for the others (zshsh and kresh), the presence of an unquoted backslash is not enough to trigger filename generation [9]. You need to also have another (?, *, [) unquoted glob operator. But it's nowhere near the end of the story. In files='\*x' sh -c 'ls -ld -- $files' We have an unquoted * in that unquoted $files. But here, if it was going to be used as a pattern (by the fnmatch() that would be applied to the list of files in the current directory), the * would end up being quoted by that unquoted \, so the filename generation is not done, a literal \*x is passed to ls. IOW, that \ has had a quoting effect, but it ends up not being used as a wildcard operator as there's no pattern matching being done at all. If you wanted filename generation and the glob to only match on a file called *x, you'd use files='\*[x]' or simply files='[*]x' without the need for that \ operator. It gets worse. As we saw, in kresh and zshsh (and bash4), that \ is a wildcard quoting operator in globs, is not enough to trigger filename generation, and can even prevent file name generation, it's only ever effective when "combined" with other unquoted wildcard operators which it doesn't quote either. Now, what we mean by "combined" depends on the shell. In files='\\right/*ng' sh -c 'ls -ld -- $files' We have an unquoted * and we have an unquoted \ quoted by another unquoted \ operator. Yet, in kresh (but not bash4 nor zshsh) [10], that \ is not effective because it's in a different /-separated segment from the one with *. $ find . . ./\\right ./\\right/wrong ./\right ./\right/thing bash4 and zshsh report \right/thing, but kresh reports \\right/wrong How is the user to know they should not "quote" that \ there? Especially if you consider that it seems to work differently ([11]) in: $ mkdir -p '*right/thing' $ files='\*right/*ng' sh -c 'ls -ld -- $files' drwxr-xr-x 2 chazelas users 512 Jun 16 21:05 *right/thing To me, that really doesn't look like the "right thing", it looks quite ugly to me. Do we really want the user to know all those subtleties before they can use that \ wildcard quoting operator? Note that to use the good old [x] alternative, the user doesn't need to learn anything new about wildcards. Well, yes, they now need to know that some shells treat unquoted \ specially in wildcards, so they need to escape it as well (with the same good old [x] but with [\\] in that case). So now my question is: which variant is POSIX going to specify? Will it allow all of them? Does it really want to mandate some form of extra \ processing? For globbing as well? It seems the attempts so far have barely managed to scratch the surface. -- Stephane