Andrzej Bialecki wrote:
100k regexps is still alot, so I'm not totally sure it would be much
faster, but perhaps worth checking.
I have worked with this type of technology before (minimized,
determinized FSAs, constructed from large sets of strings & expressions)
and it should be very fast to perform lookups, even in large, complex
FSAs. Construction of the FSA can be time consuming and should probably
be done offline, not at fetcher startup time, so that it is only
performed once for a number of fetcher runs.
Doug
- Re: [jira] Updated: (NUTCH-100) New plugin urlfilter-db Doug Cutting
-