Sorry about my observation 2, I forgot that I changed the
torture testing subject string mid-way.


> 2025年3月4日 14:24,Niu Danny <danny...@hotmail.com> 写道:
> 
> ...
> 
> A summary of problems/questions I have:
> 
> ----

> It is mentioned in a bug note (from @geoffclare): 
> www.austingroupbugs.net/view.php?id=1857#c6890 
> <http://www.austingroupbugs.net/view.php?id=1857#c6890>
> 
>> There is certainly no intention to require 
>> the '?' modifier to act recursively, and 
>> I can't see any way to interpret my suggested 
>> wording as implying it.
> 
> Q3: How can it simultaneously:
> 
> - not act recursively,
> - match the shortest subject string when
>  it's applied to a parenthesized subexpression
>  with a greedy quantifier in it?
> 
> e.g. `([0-9]+)+?`
> 
> ----
> 
> Observation 1:
> 
> @geoffclare replying to @dannyniu 
> www.austingroupbugs.net/view.php?id=1857#c6883 
> <http://www.austingroupbugs.net/view.php?id=1857#c6883>
> 
>>> if both greedy **AND** lazy quantifiers're nested ...
> 
>> That was the reason for wording it as "longest 
>> possible ... for which any minimal repetitions used ... 
>> have the shortest possible match". A minimal 
>> repetition nested inside a greedy one has precedence 
>> (if used); otherwise, each just follows its normal rule.
> 
> However, greedy ones nested inside minimal ones are 
> not discussed, and I think this should be added.
> 
> ----

These are my main points. Nesting a subexpression (I'll take
liberty to abuse terminology for now as it hasn't been clarified yet) 
inside another with differing greediness is seriously ambiguous,
particularly for those length-based matching semantic (PCRE family
simply use the "better/worse" match rule-set.)

In https://www.austingroupbugs.net/view.php?id=1857#c6898 , 
Geoff responded to my torture testing case with a step-by-step
broken-down analysis. However, I have some doubts:

> (([0-9][a-z]+[0-9])+?)+ matches 2abc3 with 1 repetition 
> as that's the longest match for which the minimal repetition 
> has the shortest match

Why doesn't the outer-most "+" quantifier involve "4def5"
in its match? The subexpression `([0-9][a-z]+[0-9])` can
totally match both "2abc3" and "4def5", and the greedy "+"
instruct the regex engine to repeat the immediately preceding
match.

Here's my terminal interaction:

```
// Portable Home on External Drive /
$ echo 12abc34def56 | grep -E -o '(([0-9][a-z]+[0-9])+?)+'
2abc34def5

// Portable Home on External Drive /
$
```

Which brings to the following notes:

www.austingroupbugs.net/view.php?id=1857#c6979 
<http://www.austingroupbugs.net/view.php?id=1857#c6979>
www.austingroupbugs.net/view.php?id=1857#c6982 
<http://www.austingroupbugs.net/view.php?id=1857#c6982>

These all seem like bug - less in particular implementation,
but more in the standard text itself.

What's your take on this?

Thanks, DannyNiu/NJF.

  • I still find the res... Niu Danny via austin-group-l at The Open Group
    • Re: I still fin... Niu Danny via austin-group-l at The Open Group
      • Re: I still... Geoff Clare via austin-group-l at The Open Group
        • Re: I s... Niu Danny via austin-group-l at The Open Group
    • Re: I still fin... Geoff Clare via austin-group-l at The Open Group
      • Re: I still... Niu Danny via austin-group-l at The Open Group
        • Re: I s... Geoff Clare via austin-group-l at The Open Group
          • Re:... Niu Danny via austin-group-l at The Open Group
            • ... Steffen Nurpmeso via austin-group-l at The Open Group
            • ... Geoff Clare via austin-group-l at The Open Group
              • ... Niu Danny via austin-group-l at The Open Group
                • ... Niu Danny via austin-group-l at The Open Group

Reply via email to