On 8/25/20 2:01 PM, John Hardin wrote:
On Tue, 25 Aug 2020, RW wrote:
On Tue, 25 Aug 2020 10:11:13 -0700 (PDT)
John Hardin wrote:
Does anybody know of a command-line (NOT interactive!) tool that will
generate a minimal "or" RE from a list of terms?
For example, given input like:
17118720
17159892
17179275
17180740
17182828
...it would generate:
(?:171(?:18720|59892|79275|8(?:0740|2828)))
I wonder whether it does anything useful at runtime. I would have
thought the compiler would do that itself from simple alternation.
That would be nice. I'm looking at Regexp::Trie as Giovanni suggested.
It's possible that was created before the Perl RE compiler got that
smart...
You can also look at the TLD regex for detecting urls that don't match
/^https?/. It uses the same logic, and somewhere in that code's history
is how to generate it on the command line.