On 8/25/20 2:01 PM, John Hardin wrote:
On Tue, 25 Aug 2020, RW wrote:

On Tue, 25 Aug 2020 10:11:13 -0700 (PDT)
John Hardin wrote:

Does anybody know of a command-line (NOT interactive!) tool that will
generate a minimal "or" RE from a list of terms?

For example, given input like:

     17118720
     17159892
     17179275
     17180740
     17182828

...it would generate:

     (?:171(?:18720|59892|79275|8(?:0740|2828)))


I wonder whether it does anything useful at runtime. I would have
thought the compiler would do that itself from simple alternation.

That would be nice. I'm looking at Regexp::Trie as Giovanni suggested. It's possible that was created before the Perl RE compiler got that smart...


You can also look at the TLD regex for detecting urls that don't match /^https?/. It uses the same logic, and somewhere in that code's history is how to generate it on the command line.

Reply via email to