Geoff Clare wrote, on 26 Sep 2024: > > However, if perl is the origin of the non-greedy modifier then that > would point to perl as the origin of the shortest vs. least repetitions > issue. And that does indeed seem to be the case. Looking at > https://perldoc.perl.org/perlre it says: > > By default, a quantified subpattern is "greedy", that is, it will > match as many times as possible (given a particular starting > location) while still allowing the rest of the pattern to match. > If you want it to match the minimum number of times possible, > follow the quantifier with a "?". > > This, of course, states incorrectly how greedy subpatterns work. They > don't match "as many times as possible", they give the longest > possible match. The code doesn't match the documentation. > > There are two conventions for greedy/non-greedy that make sense: > > 1. Greedy is longest, non-greedy is shortest. > > 2. Greedy is as many times as possible, non-greedy as few times as possible. > > Convention 1 is used for greedy everywhere (as far as I know). By > mixing up the conventions when implementing non-greedy REs, perl has a > design flaw that others have copied, but tre has not copied and instead > done it right.
I retract this statement. As per the email I just sent in another part of the thread, perl tries alternatives in order, which means there is no difference between the two conventions. POSIX implementations choose between alternatives based on which gives the longest match (with greedy repetitions) and so does have a difference between the two conventions. It is imperative that we do not mix up the two conventions in POSIX, and therefore should continue to specify the macOS/tre behaviour of shortest match for non-greedy repetitions. -- Geoff Clare <[email protected]> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
