Julien Nioche created NUTCH-3025:
------------------------------------
Summary: urlfilter-fast to filter based on the length of the URL
Key: NUTCH-3025
URL: https://issues.apache.org/jira/browse/NUTCH-3025
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.19
Reporter: Julien Nioche
Fix For: 1.20
There currently is no filter implementation to remove URLs based on their
length or the length of their path / query.
Doing so with the regex filter would be inefficient, instead we could implement
it in _urlfilter-fast _
--
This message was sent by Atlassian Jira
(v8.20.10#820010)