Marcos Gomez created NUTCH-3091:
-----------------------------------
Summary: Allow URL filters to flag an existing URL to delete from
index
Key: NUTCH-3091
URL: https://issues.apache.org/jira/browse/NUTCH-3091
Project: Nutch
Issue Type: New Feature
Components: indexer, urlfilter
Affects Versions: 1.20
Reporter: Marcos Gomez
When in the crawldb there are already URLs that when updating the configuration
of one of the URLFilter plugins are rejected, in the index phase, but they are
not removed from the index as is done with the ‘gone’ or ‘redirects’.
Currently there is a ‘-filter’ flag that prevents these URLs from being
processed, but they are not removed, it should be possible to apply a new
option or parameter.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)