Re: Mishandled backslashes in patternvar in ${var#...$patternvar...} ?

Robert Elz Sat, 09 Aug 2025 07:41:46 -0700

    Date:        Sat, 9 Aug 2025 12:56:41 +0200
    From:        Denys Vlasenko <dvlas...@redhat.com>
    Message-ID:  <e9f5e8d1-55c9-7913-4bc7-65f74ff55...@redhat.com>


  | I don't demand that.

When you send to a bug list with a bug report claiming that existed
behaviour is incorrect, that's what you're doing, even if not intentionally.

  | What I "demand" (advocate for, or trying to facilitate) is to have
  | consistent behavior across tools.

That's difficult to achieve, particularly between shells and other
tools, as shells have quoting, in addition to \ escaping, and the
difference wasn't always recognised - other tools that use glob matching
generally don't have that, all they have is \ escaping, and that makes
a differece to how things get interpreted.

And of course, regular expressions, used by other tools are a completely
different object, even if their uses are kind of similar, and there are
some common syntax elements - but comparing glob and REs is no more
rational than comparing C and C++ - different things, different rules.

Wrt glob matching, and shells, if sh if a word to be expanded is

        \**

there's no question that it means a literal asterisk, followed by
anything.   But that isn't an escaped asterisk at the start, it is
a quoted asterisk, using the shell's backslash quoting mechanism.

On the other hand if you do

        X='\**'

and then expand (unquoted) $X there the 3 chars remain, nothing quoted,
and the glob expansion does see an escaped asterisk rather than a quoted
one.   The difference is subtle, but it exists, and it affects just how
shells perform glob processing.


  | I look at it from the POV of the users. It's a particular source
  | of PITA when supposedly compatible tools have these irritating
  | corner-case differences.

It can be, though this kind of thing vary rarely ever affects real
life applications - there have been some notoriously broken behaviours
in some applications that have remained for years, as in practice,
nothing ever ventures far enough into the wild corners to encounter
the brokenness.

You also have to look at things from the POV of the users of a
particular tool.   They have become used to the way that tool works,
and to be told "we're going to change because we were different
than this other tool, and if that breaks things for you, then sorry...".
For that reason (to avoid that), no-one likes to change historic
behaviour, even if it is in edge corner cases, as the implementors
just can't be sure what user code will be broken by any change.

That makes it very difficult to ever reconcile the different cases,
and why the standards make some things unspecified - that's a warning
to the users "Don't do that, the results might not be what you expect".

  | What we (collectively in different tools) aren't doing well is:
  | * we don't always communicate well

No, most people who are implementing something new just do what they
believe is correct, and often that isn't what is actually done by
wherever they're copying from.   Then of course, different tools have
different needs, and different contexts.

  | * when we fix a bug, we don't always add a testsuite item to track it.
  |    The end result is that when we fix a bug in an obscure case,
  |    we often break another, previously fixed behavior in another obscure
  |    case.
  |
  | For example, dash does not seem to have a testsuite. It's not in
  | the git tree, at least.

We (NetBSD) do, the shell tests (including for glob matching) is fairly
extensive - I ran it against other shells from time to time, but while
running it is simple, interpreting the results is not.   Some of our tests
are testing our behaviour in what are technically unspecified cases, the
tests are there to make sure that (as you said) a fix (or just a change)
to something doesn't inadvertantly alter some other case.   So when our
tests claim that dash, or bash, or ... is "broken" because those don't
produce the same result we do, it needs careful examination to determine
if there is a real bug, or just a different version of what is unspecified.

Beyond that, I am not really a fan of adding tests in general as a
response to fixing bugs - rather the tests should find the bugs in the
first place.   If the bug exposed some previously unconsidered corner
case, which hadn't been tested for perviously, then that's a bug in the
tests as well, which should be fixed.   But if the bug being fixed is just
a coding error, which won't magically just reappear, then adding a test
for something already fixed, is just wasting everyone's time.

  | I don't care what the behavior would be deemed "correct"
  | in this case, but it'll be better to have the same behavior
  | across all Bourne shells.

It would be nice, but backwards compat means that just isn't likely
to happen - whose version of what is correct would you follow?
The most popular?  (by what measure?)  The most "correct" ?
Toss a coin?

For example, in one of your tests, I noticed that almost all shells
gave the same result (not one you were advocating), but the NetBSD
shell, and yash, gave a different one (also not one ...).   (Yes,
it was one of the unspecified cases - I've forgotten which at the
minute).   If we were to:

  | Lets just choose one behavior, and add it to our testsuites
  | to "solidify" it, so that it doesn't changes unnoticed
  | in the future.

Which one should be picked?   Whose users are we going to disappoint
when their shell alters its behaviour ... and notice that almost none
of them (users) ever actually read the standard, they just use whatever
works (and perhaps is documented) for the shell they're using - though
if you read the bash list long enough, you'll encounter users demanding
that clearly broken (buggy) code (never documented as working) not be
changed, because they had decided it was a feature, and adopted it.

And since I mentioned yash (with which I have nothing to do whatever,
except it is one I build and test against) I would suggest that you
use that for many of your tests, it is perhaps the most standards
comforming of all of the shells that claim to be Bourne Shell variants.
If it does something in a particular way, there is probably a reason.
(Which isn't to say it has no bugs, it has a few).

It is certainly more conforming than the NetBSD shell (not particularly
wrt pattern matching, but in other areas) - there are some requirements
in the standard that I won't even consider implementing (the newest is
"declaration utilities" which is one of the stupidest additions ever
made - completely unnecessary (for standard syntax), and breaks things.)

kre

Re: Mishandled backslashes in patternvar in ${var#...$patternvar...} ?

Reply via email to