Julien Nioche created NUTCH-3025: ------------------------------------ Summary: urlfilter-fast to filter based on the length of the URL Key: NUTCH-3025 URL: https://issues.apache.org/jira/browse/NUTCH-3025 Project: Nutch Issue Type: Improvement Affects Versions: 1.19 Reporter: Julien Nioche Fix For: 1.20
There currently is no filter implementation to remove URLs based on their length or the length of their path / query. Doing so with the regex filter would be inefficient, instead we could implement it in _urlfilter-fast _ -- This message was sent by Atlassian Jira (v8.20.10#820010)