Paul Eggert <[email protected]> writes:
> On 2025-12-05 22:54, Paul Eggert wrote:
>> you can't simplify a{3}{6,10} or a{3,4}{6} in similar ways
>
> Oops, I got that wrong as the latter example a{3,4}{6} simplifies to
> a{18,24}. Only my former example refuses to simplify like that.
POSIX leaves the behavior undefined:
The behavior of multiple adjacent duplication symbols ('+', '*',
'?', and intervals, possibly suffixed by the repetition modifier
'?') produces undefined results.
So, we can protect our egos and say our wrong answers are right. :)
> If you're interested in this stuff, here's a modern intro to the
> theory; look for its discussion of automata:
>
> Chistikov D. An introduction to the theory of linear integer
> arithmetic. FSTTCS 2024, 323 1:1-1:36.
> <https://doi.org/10.4230/LIPIcs.FSTTCS.2024.1>
> <https://wrap.warwick.ac.uk/id/eprint/188924/1/LIPIcs.FSTTCS.2024.1.pdf>
>
> I haven't read that intro or pursued the basic (decades-old) idea,
> though, as I doubt whether implementing regex optimizations along
> these lines would interest anybody other than automata theorists.
Thanks, I'll take a look. Automata are interesting, but my memory of the
math notation is likely too poor to understand it.
>> It would also be nice to add the '?' operator added by POSIX.1-2024 to
>> get the leftmost shortest match [2]. But my impression is that few
>> people understand the regex code enough to add it.
>
> Yup. Some years ago I wrote to that code's author but got no answer.
> It might be wise, at some point, to ditch the code entirely and use
> something more understandable and maintainable.
Interesting idea. I wonder if there are any maintainable regex
implementations, though. At least, I don't remember the Perl and Python
code being much easier to read.
Collin