Date:        Thu, 12 Apr 2018 12:10:20 +0200
    From:        Joerg Schilling <joerg.schill...@fokus.fraunhofer.de>
    Message-ID:  <5acf308c.yoyva4vzwwu8t7jp%joerg.schill...@fokus.fraunhofer.de>

Jörg:

  | Since '' and "" quoting in the shell is highly complex and no longer 
present at 
  | the time the shell pattern matching is called,

That's not correct (well, "highly complex" is reasonable) at least
according to the standard (rather than how things might be
implemented in any particular implementation)

In filenames, the order is tilde expansion, (field splitting is irrelevant
for present purposes), parameter expansion (and its companions)
filename expansion, and finally, quote removal.    See "The order of
word expansions" and what follows in XCU 2.6.

If it were not that way, then ls "*"* would not find files starting with a 
literal 
asterisk, but just all files.

In case patterns the old (current) standard does no quote removal on
the pattern at all - 985 tries to fix that but doesn't get it right.

In parameter expansions, the % and # (and %% and ## of course)
operators also happen before quote removal, so the pattern matching
they do also still has the quote characters.   Of course, for these ones
the standard says nothing at all about what "matched by pattern" means
and just assumes "you know it is a glob style match" and what that
means (and we all do it by comparing results from other shells and
hoping we haven't missed any weird cases...)

Are there any other uses of patterns in (standard) sh?  I can't remember
any right now/

  |  it makes no sense to add '' and "" to fnmatch().

That might be true, but assuming that we want fnmatch() to produce
the same results as sh does (given the correct flags to indicate what
kind of match it should perform) we would need to be very specific
about exactly how to translate a quoted shell string into a fnmatch
pattern.

  | To understand quoting, let me explain how the Bourne Shell does it:

Once again, this is (kind of) interesting but 100% irrelevant.

What matters is what the standard says must be done, not how some
implementation chooses to implement that.   One thing the standard
does not say that should be done is to convert one form of quoting
into another form (ever, except for the 985 bug resolution I think.)

Of course, provided the results are correct, it is fine to do that
within an implementation (ash based shells do quoting a totally
different way, but also not the posix "leave the quotes in the word")
but it is unacceptable to assume that all other implementations
must, or should, act that way - or even that their implementors
would ever consider doing it that way.

As long as posix says to leave the quotes in the word until
quote removal, and as long as quote removal happens after
pattern matching (or filename expansion for that case) the
specification of the pattern matching algorithm must handle '
and " chars in the pattern.

And if the pattern matching algorithm is just to be "call
fnmatch() with the flags..." (etc) then fnmatch needs to
handle them as well.   Alternatively, the algorithm could
be "convert quoted strings in the pattern as ..[to be
completed].. and then call fnmatch() using the modified
pattern, then fnmatch does not need to handle quotes.

Which is better largely depends upon just how flexible we
want the fnmatch() function to be - that is, must all callers
deal with quoting (if their context allows that) somehow,
before calling it ?

What the standard specifies should however match what the
implementations actually do (or at least most of them.)

kre

ps: it was interesting to see that the (ancient algol68 style) code
fragment you sent in the earlier message did not handle a ']'
as the first char of a class correctly (meaning a ']' in the class
instead of being the ending delimiter).   I don't remember ever
encountering that issue back when I used that shell - of course
wanting ']' in c char class is not common, so it is perhaps not
too surprising.   And wrt that message - for persent purposes,
it would be better to run tests using case pattern matching rather
than filename expansion - for filename expansion it is quite clear
that quote removal happens after the pattern matching, so the
shell is free to interpret the quote chars.  For case patterns it
is not so clear what should be done.


Reply via email to