>Synopsis:      ksh parenthesis sub-pattern variables not parsed; [*+?@!]($pat)
>Category:      user
>Environment:
        System      : OpenBSD 6.8
        Details     : OpenBSD 6.8 (GENERIC.MP) #4: Thu Aug  5 11:02:18 MDT 2021
                         
[email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP

        Architecture: OpenBSD.amd64
        Machine     : amd64
>Description:
        When a ksh parenthesis sub-pattern is stored in a variable
        it is not parsed. This is true for case pattern matching,
        pattern matching within [[...]] (RHS), and substring expansion.
>How-To-Repeat:
        Set a variable with a ksh parenthesis sub-pattern using one of
        the following matchers: *, +, ?, @, or !. Then use that variable
        as a pattern or part of a pattern list.

        In a ksh script, for example, use * to match optional whitespace
        around the ; (semicolon) operator:

        pat='cmd*( );*( )cmd'
        [[ 'cmd; cmd' = $pat ]]  # does not match

        I discovered this flaw by passing a ksh sub-pattern argument
        to a shell function to handle the match processing.
>Fix:
        I wish I knew. I looked at the ksh source but it's complicated!
        I tested this issue in the development version of mksh and
        bash 5.0.18; both shells correctly interpret parenthesis sub-
        patterns embedded in variables.

        I found the possibly relevant revision 1.230 (eval.c) in the
        mksh CVS logs on 2020-04-07 with the log message:

            implement full extglob pattern matching on [[ x = $y ]] RHS.
            eliminates some eval and brings us closer to ksh93

        This change added support for XSUBPAT and XSUBPATMID, which
        defines these sub-patterns within variables.
        For reference, here's the source:
        $ cvs -qd [email protected]:/cvs co -PA mksh


The following is a script that further demonstrates the problem. It compares
and contrasts the shell glob pattern * and the ksh sub-pattern *(...).

Note that liternal sub-pattern strings work fine, but variables containing
these patterns do not; i.e., they are not interpreted as patterns.

#!/bin/ksh
# exit status for pattern matchers
alias xs='(($? == 0)) && echo MATCH || echo NOMATCH'

echo 'We start with the liternal sub-pattern.'

[[ 'cmd;cmd'   = cmd*( )\;*( )cmd ]]; xs # MATCH
[[ 'cmd; cmd'  = cmd*( )\;*( )cmd ]]; xs # MATCH
[[ 'cmd ;cmd'  = cmd*( )\;*( )cmd ]]; xs # MATCH
[[ 'cmd ; cmd' = cmd*( )\;*( )cmd ]]; xs # MATCH

[[ 'cmd;cmd'   = cmd*\;*cmd ]]; xs # MATCH
[[ 'cmd; cmd'  = cmd*\;*cmd ]]; xs # MATCH
[[ 'cmd ;cmd'  = cmd*\;*cmd ]]; xs # MATCH
[[ 'cmd ; cmd' = cmd*\;*cmd ]]; xs # MATCH

echo 'So far so good.'
echo 'Now put the pattern in a variable.'

pat='cmd*( );*( )cmd' # no semicolon escape
[[ 'cmd;cmd'   = $pat ]]; xs # NOMATCH
[[ 'cmd; cmd'  = $pat ]]; xs # NOMATCH
[[ 'cmd ;cmd'  = $pat ]]; xs # NOMATCH
[[ 'cmd ; cmd' = $pat ]]; xs # NOMATCH

echo 'The ksh extended pattern did not match!'
echo -n 'But check this out, the literal pattern matches $pat: '
[[ 'cmd*( );*( )cmd' = $pat ]]; xs # MATCH
echo 'Extended sub-patterns in variables are parsed as strings.'

pat='cmd*;*cmd' # no semicolon escape
[[ 'cmd;cmd'   = $pat ]]; xs # MATCH
[[ 'cmd; cmd'  = $pat ]]; xs # MATCH
[[ 'cmd ;cmd'  = $pat ]]; xs # MATCH
[[ 'cmd ; cmd' = $pat ]]; xs # MATCH

echo 'However the basic glob pattern matched.'

echo 'A work around for sub-patterns using eval.'

pat='cmd*( )\;*( )cmd' # escape semicolon
eval "[[ 'cmd;cmd'   = $pat ]]"; xs # MATCH
eval "[[ 'cmd; cmd'  = $pat ]]"; xs # MATCH
eval "[[ 'cmd ;cmd'  = $pat ]]"; xs # MATCH
eval "[[ 'cmd ; cmd' = $pat ]]"; xs # MATCH

echo 'Another demonstration of the flaw using substring expansion'
x=abc123
pattern='+([0-9])'
literal='123'
echo ${x%%+([0-9])}     # abc    (MATCH)
echo ${x%%$literal}     # abc    (MATCH)
echo ${x%%$pattern}     # abc123 (NOMATCH)

Reply via email to