>Synopsis: ksh parenthesis sub-pattern variables not parsed; [*+?@!]($pat)
>Category: user
>Environment:
System : OpenBSD 6.8
Details : OpenBSD 6.8 (GENERIC.MP) #4: Thu Aug 5 11:02:18 MDT 2021
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP
Architecture: OpenBSD.amd64
Machine : amd64
>Description:
When a ksh parenthesis sub-pattern is stored in a variable
it is not parsed. This is true for case pattern matching,
pattern matching within [[...]] (RHS), and substring expansion.
>How-To-Repeat:
Set a variable with a ksh parenthesis sub-pattern using one of
the following matchers: *, +, ?, @, or !. Then use that variable
as a pattern or part of a pattern list.
In a ksh script, for example, use * to match optional whitespace
around the ; (semicolon) operator:
pat='cmd*( );*( )cmd'
[[ 'cmd; cmd' = $pat ]] # does not match
I discovered this flaw by passing a ksh sub-pattern argument
to a shell function to handle the match processing.
>Fix:
I wish I knew. I looked at the ksh source but it's complicated!
I tested this issue in the development version of mksh and
bash 5.0.18; both shells correctly interpret parenthesis sub-
patterns embedded in variables.
I found the possibly relevant revision 1.230 (eval.c) in the
mksh CVS logs on 2020-04-07 with the log message:
implement full extglob pattern matching on [[ x = $y ]] RHS.
eliminates some eval and brings us closer to ksh93
This change added support for XSUBPAT and XSUBPATMID, which
defines these sub-patterns within variables.
For reference, here's the source:
$ cvs -qd [email protected]:/cvs co -PA mksh
The following is a script that further demonstrates the problem. It compares
and contrasts the shell glob pattern * and the ksh sub-pattern *(...).
Note that liternal sub-pattern strings work fine, but variables containing
these patterns do not; i.e., they are not interpreted as patterns.
#!/bin/ksh
# exit status for pattern matchers
alias xs='(($? == 0)) && echo MATCH || echo NOMATCH'
echo 'We start with the liternal sub-pattern.'
[[ 'cmd;cmd' = cmd*( )\;*( )cmd ]]; xs # MATCH
[[ 'cmd; cmd' = cmd*( )\;*( )cmd ]]; xs # MATCH
[[ 'cmd ;cmd' = cmd*( )\;*( )cmd ]]; xs # MATCH
[[ 'cmd ; cmd' = cmd*( )\;*( )cmd ]]; xs # MATCH
[[ 'cmd;cmd' = cmd*\;*cmd ]]; xs # MATCH
[[ 'cmd; cmd' = cmd*\;*cmd ]]; xs # MATCH
[[ 'cmd ;cmd' = cmd*\;*cmd ]]; xs # MATCH
[[ 'cmd ; cmd' = cmd*\;*cmd ]]; xs # MATCH
echo 'So far so good.'
echo 'Now put the pattern in a variable.'
pat='cmd*( );*( )cmd' # no semicolon escape
[[ 'cmd;cmd' = $pat ]]; xs # NOMATCH
[[ 'cmd; cmd' = $pat ]]; xs # NOMATCH
[[ 'cmd ;cmd' = $pat ]]; xs # NOMATCH
[[ 'cmd ; cmd' = $pat ]]; xs # NOMATCH
echo 'The ksh extended pattern did not match!'
echo -n 'But check this out, the literal pattern matches $pat: '
[[ 'cmd*( );*( )cmd' = $pat ]]; xs # MATCH
echo 'Extended sub-patterns in variables are parsed as strings.'
pat='cmd*;*cmd' # no semicolon escape
[[ 'cmd;cmd' = $pat ]]; xs # MATCH
[[ 'cmd; cmd' = $pat ]]; xs # MATCH
[[ 'cmd ;cmd' = $pat ]]; xs # MATCH
[[ 'cmd ; cmd' = $pat ]]; xs # MATCH
echo 'However the basic glob pattern matched.'
echo 'A work around for sub-patterns using eval.'
pat='cmd*( )\;*( )cmd' # escape semicolon
eval "[[ 'cmd;cmd' = $pat ]]"; xs # MATCH
eval "[[ 'cmd; cmd' = $pat ]]"; xs # MATCH
eval "[[ 'cmd ;cmd' = $pat ]]"; xs # MATCH
eval "[[ 'cmd ; cmd' = $pat ]]"; xs # MATCH
echo 'Another demonstration of the flaw using substring expansion'
x=abc123
pattern='+([0-9])'
literal='123'
echo ${x%%+([0-9])} # abc (MATCH)
echo ${x%%$literal} # abc (MATCH)
echo ${x%%$pattern} # abc123 (NOMATCH)