On Tue, 25 Aug 2020, Joe Quinn wrote:
On 8/25/20 2:01 PM, John Hardin wrote:
On Tue, 25 Aug 2020, RW wrote:
On Tue, 25 Aug 2020 10:11:13 -0700 (PDT)
John Hardin wrote:
Does anybody know of a command-line (NOT interactive!) tool that will
generate a minimal "or" RE from a list of terms?
For example, given input like:
17118720
17159892
17179275
17180740
17182828
...it would generate:
(?:171(?:18720|59892|79275|8(?:0740|2828)))
I wonder whether it does anything useful at runtime. I would have
thought the compiler would do that itself from simple alternation.
That would be nice. I'm looking at Regexp::Trie as Giovanni suggested. It's
possible that was created before the Perl RE compiler got that smart...
You can also look at the TLD regex for detecting urls that don't match
/^https?/. It uses the same logic, and somewhere in that code's history is
how to generate it on the command line.
The one-liner in the Bayes stop words wiki post that Giovanna suggested
worked wonderfully as a starting point; the only customization needed for
my purpose was adding some rule syntax around the RE it generated.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
[email protected] pgpk -a [email protected]
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Before Adolph Hitler came to power, there was a black market in
firearms, but the German people had been so conditioned to be law
abiding, that they would never consider buying an unregistered
gun. The German people really believed that only hoodlums own such
guns. What fools we were. -- Theodore Haas, Dachau survivor
-----------------------------------------------------------------------
Today: the 1941st anniversary of the destruction of Pompeii