Sorry about my observation 2, I forgot that I changed the torture testing subject string mid-way.
> 2025年3月4日 14:24,Niu Danny <danny...@hotmail.com> 写道: > > ... > > A summary of problems/questions I have: > > ---- > It is mentioned in a bug note (from @geoffclare): > www.austingroupbugs.net/view.php?id=1857#c6890 > <http://www.austingroupbugs.net/view.php?id=1857#c6890> > >> There is certainly no intention to require >> the '?' modifier to act recursively, and >> I can't see any way to interpret my suggested >> wording as implying it. > > Q3: How can it simultaneously: > > - not act recursively, > - match the shortest subject string when > it's applied to a parenthesized subexpression > with a greedy quantifier in it? > > e.g. `([0-9]+)+?` > > ---- > > Observation 1: > > @geoffclare replying to @dannyniu > www.austingroupbugs.net/view.php?id=1857#c6883 > <http://www.austingroupbugs.net/view.php?id=1857#c6883> > >>> if both greedy **AND** lazy quantifiers're nested ... > >> That was the reason for wording it as "longest >> possible ... for which any minimal repetitions used ... >> have the shortest possible match". A minimal >> repetition nested inside a greedy one has precedence >> (if used); otherwise, each just follows its normal rule. > > However, greedy ones nested inside minimal ones are > not discussed, and I think this should be added. > > ---- These are my main points. Nesting a subexpression (I'll take liberty to abuse terminology for now as it hasn't been clarified yet) inside another with differing greediness is seriously ambiguous, particularly for those length-based matching semantic (PCRE family simply use the "better/worse" match rule-set.) In https://www.austingroupbugs.net/view.php?id=1857#c6898 , Geoff responded to my torture testing case with a step-by-step broken-down analysis. However, I have some doubts: > (([0-9][a-z]+[0-9])+?)+ matches 2abc3 with 1 repetition > as that's the longest match for which the minimal repetition > has the shortest match Why doesn't the outer-most "+" quantifier involve "4def5" in its match? The subexpression `([0-9][a-z]+[0-9])` can totally match both "2abc3" and "4def5", and the greedy "+" instruct the regex engine to repeat the immediately preceding match. Here's my terminal interaction: ``` // Portable Home on External Drive / $ echo 12abc34def56 | grep -E -o '(([0-9][a-z]+[0-9])+?)+' 2abc34def5 // Portable Home on External Drive / $ ``` Which brings to the following notes: www.austingroupbugs.net/view.php?id=1857#c6979 <http://www.austingroupbugs.net/view.php?id=1857#c6979> www.austingroupbugs.net/view.php?id=1857#c6982 <http://www.austingroupbugs.net/view.php?id=1857#c6982> These all seem like bug - less in particular implementation, but more in the standard text itself. What's your take on this? Thanks, DannyNiu/NJF.