Marcos Gomez created NUTCH-3091: ----------------------------------- Summary: Allow URL filters to flag an existing URL to delete from index Key: NUTCH-3091 URL: https://issues.apache.org/jira/browse/NUTCH-3091 Project: Nutch Issue Type: New Feature Components: indexer, urlfilter Affects Versions: 1.20 Reporter: Marcos Gomez
When in the crawldb there are already URLs that when updating the configuration of one of the URLFilter plugins are rejected, in the index phase, but they are not removed from the index as is done with the ‘gone’ or ‘redirects’. Currently there is a ‘-filter’ flag that prevents these URLs from being processed, but they are not removed, it should be possible to apply a new option or parameter. -- This message was sent by Atlassian Jira (v8.20.10#820010)