On 02/08/14 20:40, Pádraig Brady wrote:
On 08/02/2014 06:03 AM, Luke Kendall wrote:
I'm hesitant to report this, but I think it's an actual bug in expr that's been
there from day one.
I believe that expr, when used to match regular expressions, should use the
success/failure of the pattern match to determine the exit code.
But instead, I believe "expr" uses the length of the matched string to
determine its exit code. So when the regexp correctly matches an empty string, expr
returns failure, despite the match. Here's a simple example:
$ expr " " : "^ *$" && echo Matched.
1
Matched.
$ expr "" : "^ *$" && echo Matched.
0
And compare that to what sed and grep do:
$ echo "" | sed -n 's/^,*$/& - yep/p'
- yep
$ printf "a\n\n" | grep '^$' && echo "A match."
A match.
I'd like to suggest that expr be changed to use the success/fail of the pattern
match to determine the exit status, as all the other unix tools do.
I don't think this alteration of semantics would break many existing scripts,
for two reasons:
1) It must be unusual to use regexps that can match an empty string, because
expr does not report a match for that corner case, so to correctly handle it,
the user must have had to add an explicit test for the input string being
empty: and this will still work (it's just that with the suggested change, that
extra code becomes redundant).
2) Based on my own experience, it's unusual to use expr ":" with patterns that
can match the empty string - it's taken me over 30 years to notice this oddity!
If you think this would be a good change, but don't have time to do anything,
let me know and I'll have a go and submit a patch.
The exit status of expr is a common gotcha:
$ expr 2 - 1; echo $?
1
0
$ expr 2 - 2; echo $?
0
1
That's a good example: and it makes sense, as by definition, expr's exit
status is 'error' (I really mean 1), if the arithmetic expression yields 0.
$ expr ' ' : '^ *$'; echo $?
1
0
$ expr '' : '^ *$'; echo $?
0
1
POSIX states that exit status of 1 is used if "the expression evaluates to null or
zero".
I guess the POSIX definition really means "the null string" when it says
"null", when it defines the exit status. (That wasn't obvious to me on
1st or 2nd reading, actually.)
This definition of the exit status for the pattern match is the oddity:
it does not allow the user to distinguish between a successful match of
a null string from a failed pattern match.
It seems a bad design decision to have chosen to use the matched string
being null as exit status 1, rather than the failure of the match.
In this case even though it is a match, the expression does evaluate to zero,
which is awkward, though conformant to POSIX (and solaris and FreeBSD FWIW).
True, though it's counter to all other regexp evaluations to state that
a successful match returns a null or zero, since True is normally
equated with a non null and non-zero quantity. (The actual matched
substring is usually a side value.)
I do understand that everyone has faithfully implemented the logic of
using the length of the match instead of the success of the match.
Though I'm not sure we can change that, which would essentially
be changing the handling of the '*' in the expression. Consider:
printf '%s\n' 1 2 '' 3 |
while read line; do
expr "$line" : '^[0-9]*$' >/dev/null || break # at first blank line
echo process "$line"
done
It would not be changing the definition of the '*' in the expression.
The above is using a regexp which would not do the same thing if used in
sed or grep (etc. etc.), because it is matching 0 or more repetitions of
a digit. It only happens to work this way in expr because of the expr
oddity.
You can choose the regexp which *will* work in those utilities *and*
expr, too, to get the same termination on the empty line, but this time
because the pattern match genuinely fails:
printf '%s\n' 1 2 '' 3 |
while read line; do
expr "$line" : '^[0-9][0-9]*$' >/dev/null || break # at 1st blank line
echo process "$line"
done
BTW, using a leading ^ in the expression is redundant and non portable
I know, I just used it to make it clearer that the same expression in
other utilities has the natural interpretation.
AFAIK, it's only expr that uses the length of the matched pattern
instead of the success of the match as the exit status.
Frankly, to me it looks like a long-standing design error, but if that's
the definition, well, so be it I guess!
thanks,
Pádraig.
Regards,
luke