I am unabashedly posting a quiz question I have about regular expressions:

Looking for suggestions.

I am thinking

1) make the set of regular expressions into one big expression?
2) search the seach strings, somehow, for common substrings.  "acme.org" would 
be example. Each hit on acme.org would indicate a match on one of the original 
search strings?
comments invited.


You have 100,000 strings which are regex patterns
intended to match URLs (including hostname and possibly a URI). When you
receive a message, you need to see if it matches at least one of these
patterns.

The naive approach would be to go through your list of
patterns linearly and attempt a regex match on each one.  Suggest an 
alternative that would be more
efficient.

 

Sample URLs

        ⁃www.amazon.com?x=123

        ⁃books.amazon.com?y=123

        ⁃acme.org

        ⁃acme.org/stuff?sub=7

        ⁃acme.org?w=1&sub=7

        ⁃vcl.com/suba/subb?qs=abc

        ⁃a1.vcl.com?w=3

        ⁃a2.vcl.com/xyz?w=1

 

Sample Regex Patterns (assumes protocol prefix of URL has
already been stripped off)

        ⁃ *amazon.com*

        ⁃ *craigslist.*

        ⁃ acme.org*sub=7*

        ⁃ a1.vcl.com*

        ⁃ *.ebay.com/sports*

 

Note: The particular regex syntax is not really
important, but for simplicity assume a glob-like syntax (i.e. a single *
matches any number of character.)

 
_____________

Matthew Sacks

14153855724

matthewsacks1...@yahoo.com

"Yankees Suck"


      

Reply via email to