Niu Danny wrote, on 23 Sep 2024: > > Just for clarification, > > Do you agree that the behavior I wrote down matches that from the > implementation you use?
What you wrote does not match macOS behaviour. > Do you disagree that most/least- repetition should replace longest/shortest > as terminology when used in the standard? > Yes I disagree. The standard should continue to say longest/shortest. Regards, Geoff. > ________________________________ > From: [email protected] <[email protected]> on behalf > of Austin Group Bug Tracker via austin-group-l at The Open Group > <[email protected]> > Sent: Monday, September 23, 2024 4:56:40 PM > To: [email protected] <[email protected]> > Subject: [1003.1(2024)/Issue8 0001857]: Several problems with the new "lazy" > regex quantifier. > > > A NOTE has been added to this issue. > ====================================================================== > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857&data=05%7C02%7C%7Cac18dae9f2004a85542c08dcdbadf49e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626787382941308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=BR47HNymVbSQLaMwEn9t%2Fyw6K0%2Bps%2F6vb2by1Q2gEiQ%3D&reserved=0<https://austingroupbugs.net/view.php?id=1857> > ====================================================================== > Reported By: dannyniu > Assigned To: > ====================================================================== > Project: 1003.1(2024)/Issue8 > Issue ID: 1857 > Category: Base Definitions and Headers > Type: Error > Severity: Objection > Priority: normal > Status: New > Name: DannyNiu/NJF > Organization: Individual > User Reference: > Section: 9.1 Regular Expression Definitions # and others. > Page Number: 179-180 and others > Line Number: 6366-6368 and others. > Interp Status: --- > Final Accepted Text: > ====================================================================== > Date Submitted: 2024-09-14 12:54 UTC > Last Modified: 2024-09-23 08:56 UTC > ====================================================================== > Summary: Several problems with the new "lazy" regex > quantifier. > ====================================================================== > > ---------------------------------------------------------------------- > (0006880) geoffclare (manager) - 2024-09-23 08:56 > > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faustingroupbugs.net%2Fview.php%3Fid%3D1857%23c6880&data=05%7C02%7C%7Cac18dae9f2004a85542c08dcdbadf49e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626787382961273%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=GJ5S92iychuDnZrllSQAznOsqFiwHINIEzU6HyuKU4Q%3D&reserved=0<https://austingroupbugs.net/view.php?id=1857#c6880> > ---------------------------------------------------------------------- > > For quantifiers without the `?` lazy quantifier, the most number of > possible repetition is the fittest in terms of length; likewise, for > quantifiers with the `?` lazy quantifier, the least number of possible > repetition is the fittest in terms of length. > > This would change the established-for-decades "longest" requirement to > "most repetitions", which is not the same thing. And it turns out that on > macOS the '?' modifier does not change to matching the least repetitions, > it is shortest match; the re_format(7) man page is wrong. Tested using the > program at the end of > https://jpn01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fposix.rhansen.org%2Fp%2F2020-11-09&data=05%7C02%7C%7Cac18dae9f2004a85542c08dcdbadf49e%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638626787382974911%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=1Y0kq1gBg7NjtIK6pupL6XAHH4aR1u0emZqfQ5F8x9Y%3D&reserved=0<https://posix.rhansen.org/p/2020-11-09> > with > REG_MINIMAL removed: > <pre>$ ./a.out '([ab]{6}|a)*?b' aaaabbbb > regexec() returned 0 > rm_so 0, rm_eo 5</pre> > (Least repetitions would give rm_eo 7.) > > Same test with grep, using -o to see what matched: > <pre>$ echo aaaabbbb | grep -E -o '([ab]{6}|a)*?b' > aaaab > b > b > b</pre> > This behaviour makes sense as the whole point of REG_MINIMAL and the '?' > modifier is to change to the opposite greediness, and the opposite of > longest is shortest. Having the default as longest and REG_MINIMAL/'?' as > least repetitions would produce the same output in the above tests with and > without the '?', making them pointless in such cases. -- Geoff Clare <[email protected]> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
