While many ISO standards are specified in terms of performance and quality metrics, IT standards like ours are for promoting interoperability, so the interaction between applications and implementations needs to be considered.
To better understand the problem I have and we may be facing, let's consider a regular expression implementation where 1. the first match is found in the same manner as Perl/PHP/Python/PCRE semantic where quantifiers starts with their "best" (Perl terminology) value, 2. the implementation adjust the quantifiers from right to left to discover potentially longer matches where minimal quantifiers would have shorter matches. 3. repeat 2 until all combinations are exhausted, then return the best (in POSIX sense, i.e. length) match. A summary of problems/questions I have: ---- With the current resolution of Bug-1857 // <//www.austingroupbugs.net/view.php?id=1857#c6881>www.austingroupbugs.net/view.php?id=1857#c6881 <http://www.austingroupbugs.net/view.php?id=1857#c6881> , we have: > Consistent with the match for the entire > regular expression being the leftmost and > longest for which any minimal repetitions > used in the match have the shortest possible > match, Q1: does the "for which" clause imply that if there are any minimal quantifiers, the overall match may *Not Necessarily* be the longest? The examples from the previous paragraph seem to confirm this: > However, the ERE "(aaa??)*" matches only > the first four characters of the string "aaaaa", > not all five, because in order to match all five, > "a??" would match with length one instead of zero; > the ERE "(aaa??)*|(aaa?)*" matches all five because > the longest match is one which does not use > any minimal repetitions. In which case, I think the length of the overall match is ambiguous. ---- > each BRE or ERE in a concatenated set, > from left to right, shall match the longest > possible string for which any minimal repetitions > used in the match for that BRE or ERE have > the shortest possible match. Q2: are the said BRE and ERE parenthesized? It is mentioned in a bug note (from @geoffclare): www.austingroupbugs.net/view.php?id=1857#c6890 <http://www.austingroupbugs.net/view.php?id=1857#c6890> ---- > There is certainly no intention to require > the '?' modifier to act recursively, and > I can't see any way to interpret my suggested > wording as implying it. Q3: How can it simultaneously: - not act recursively, - match the shortest subject string when it's applied to a parenthesized subexpression with a greedy quantifier in it? e.g. `([0-9]+)+?` ---- Observation 1: @geoffclare replying to @dannyniu www.austingroupbugs.net/view.php?id=1857#c6883 <http://www.austingroupbugs.net/view.php?id=1857#c6883> >> if both greedy **AND** lazy quantifiers're nested ... > That was the reason for wording it as "longest > possible ... for which any minimal repetitions used ... > have the shortest possible match". A minimal > repetition nested inside a greedy one has precedence > (if used); otherwise, each just follows its normal rule. However, greedy ones nested inside minimal ones are not discussed, and I think this should be added. ---- Observation 2: @steffen did experiment on PCRE and TRE, and the result seem to conflict with Geoff's interpretation of Danny's torture testing regular expression and subject string Steffen's note: https://www.austingroupbugs.net/view.php?id=1857#c6888 Geoff's note: https://www.austingroupbugs.net/view.php?id=1857#c6898