On 1 July 2012 22:56, Lionel Cons <lionelcons1...@googlemail.com> wrote:
> On 27 June 2012 19:24, Glenn Fowler <g...@research.att.com> wrote:
>>
>> On Wed, 27 Jun 2012 18:15:06 +0200 Roland Mainz wrote:
>>> On Wed, Jun 27, 2012 at 6:04 PM, Glenn Fowler <g...@research.att.com> wrote:
>>> > On Wed, 27 Jun 2012 17:43:06 +0200 Roland Mainz wrote:
>>> >> How can I quote '-' in a ~(Ex)-style pattern [...] that it exactly
>>> >> matches a '-' latter ?
>>> >> I've tried the following pattern but the result is wrong (it should
>>> >> match "hello-world" and "foo-bar"):
>>> >> -- snip --
>>> >> $ ~/bin/ksh -c 's="hello-world foo-bar" ;
>>> >> dummy="${s//~(Ex)([_\-[:alnum:]]+)/D}" ; print -v .sh.match'
>>> >> (
>>> >>         (
>>> >>                 hello
>>> >>                 world
>>> >>                 foo
>>> >>                 bar
>>> >>         )
>>> >>         (
>>> >>                 hello
>>> >>                 world
>>> >>                 foo
>>> >>                 bar
>>> >>         )
>>> >> )
>>> >> -- snip --
>>> >> I tried to quote the '\' with a 2nd '\' without success (e.g. we get
>>> >> the same wrong output/matches)
>>> >> -- snip --
>>> >> $ ~/bin/ksh -c 's="hello-world foo-bar" ;
>>> >> dummy="${s//~(Ex)([_\-[:alnum:]]+)/D}" ; print -v .sh.match'
>>> >> ...
>>> >> -- snip --
>>> >
>>> >> Looking via dbx/gdb at the strings passed to the regex engine it looks
>>> >> like ksh93 is either passing no '\' to |_ast_regcomp()| (in the case
>>> >> of "~(Ex)([_\-[:alnum:]]+)") or it passes two '\' to |_ast_regcomp()|
>>> >> (in the case of "~(Ex)([_\\-[:alnum:]]+)") ... it looks like a bug in
>>> >> the ksh93 quoting mechanism for ~(E) patterns... ;-(
>>> >
>>> >> The only working workaround I found is to use \x<hex> to avoid having
>>> >> to use \ to quote the '-' (the output below is IMO the expected one
>>> >> for "${s//~(Ex)([_\-[:alnum:]]+)/D}"):
>>> >> -- snip --
>>> >> $ ~/bin/ksh -c 's="hello-world foo-bar" ;
>>> >> dummy="${s//~(Ex)([_\x2d[:alnum:]]+)/D}" ; print -v .sh.match'
>>> >> (
>>> >>         (
>>> >>                 hello-world
>>> >>                 foo-bar
>>> >>         )
>>> >>         (
>>> >>                 hello-world
>>> >>                 foo-bar
>>> >>         )
>>> >> )
>>> >> -- snip --
>>> >
>>> > its regex syntax and doesn't need a quote
>>> > at http://pubs.opengroup.org/onlinepubs/9699919799/ set 9.3.5 item 7
>>> > from that it looks like
>>> > * if you want literal ']' use one of
>>> >        []...]
>>> >        [^]...]
>>
>>> I know...
>>
>>> > * if you want literal '-' place it last
>>> >        [...-]
>>
>>> ... I didn't know that... ;-/
>>> Thanks... :-)
>>
>>> ... but could you still check why ksh93 "swallows" the single '\' but
>>> passes two '\' as "\\" to |_ast_regcomp()|, please ? Is this intended
>>> or somehow a bug or sideeffect ?
>>
>> its a side effect or the conflict betwee ksh and regex quoting
>> if a side has to win it will be ksh in that context
>> dgk can give more detail on how tricky that part is because
>> ksh can't be expected to know all of the intricacies of each ~(...) RE syntax
>> at some point when an RE gets complex enough it will have to be placed in a 
>> var
>> then referencing it as $the_re is guaranteed to get sh and RE quoting right
>> (or at least pass what everquoting is present down to regex)
>
> I don't think this is going to be useful. Either ksh can be expected
> to know all of the egrep syntax or knows nothing and passes the
> pattern through unscathed after user has provided sufficient \ escapes
> to prevent clashes with ksh syntax.
> The current situation of "guessing" which side - ksh or ere - will win
> is NOT acceptable.
>
> Try to see it from the point of a POSIX standardisation committee or a
> code generator which will generate ksh93 code. The POSIX committee
> won't accept a fuzzy situation as it is right now and a code generator
> can't be expected to do a trial&error procedure like it is required
> right now until a pattern fits the needs of ksh's guesswork.
>
> if the situation can't be improved then I'd suggest to remove the
> whole ~(E) feature. While I see the very usefulness the current
> implementation is completely unacceptable.

So what will be done here? If nothing can be done I'll post a patch to
wrap ~(E) support in SHOPT_EXPERIMENTAL_PATTERN_MATCHING so we can
disable this on production machines.

Lionel
_______________________________________________
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers

Reply via email to