On Tue, 25 Aug 2020, Joe Quinn wrote:

On 8/25/20 2:01 PM, John Hardin wrote:
On Tue, 25 Aug 2020, RW wrote:

On Tue, 25 Aug 2020 10:11:13 -0700 (PDT)
John Hardin wrote:

Does anybody know of a command-line (NOT interactive!) tool that will
generate a minimal "or" RE from a list of terms?

For example, given input like:

     17118720
     17159892
     17179275
     17180740
     17182828

...it would generate:

     (?:171(?:18720|59892|79275|8(?:0740|2828)))


I wonder whether it does anything useful at runtime. I would have
thought the compiler would do that itself from simple alternation.

That would be nice. I'm looking at Regexp::Trie as Giovanni suggested. It's possible that was created before the Perl RE compiler got that smart...

You can also look at the TLD regex for detecting urls that don't match /^https?/. It uses the same logic, and somewhere in that code's history is how to generate it on the command line.

The one-liner in the Bayes stop words wiki post that Giovanna suggested worked wonderfully as a starting point; the only customization needed for my purpose was adding some rule syntax around the RE it generated.


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]                         pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Before Adolph Hitler came to power, there was a black market in
  firearms, but the German people had been so conditioned to be law
  abiding, that they would never consider buying an unregistered
  gun. The German people really believed that only hoodlums own such
  guns. What fools we were.         -- Theodore Haas, Dachau survivor
-----------------------------------------------------------------------
 Today: the 1941st anniversary of the destruction of Pompeii

Reply via email to