Marcos Gomez created NUTCH-3091:
-----------------------------------

             Summary: Allow URL filters to flag an existing URL to delete from 
index
                 Key: NUTCH-3091
                 URL: https://issues.apache.org/jira/browse/NUTCH-3091
             Project: Nutch
          Issue Type: New Feature
          Components: indexer, urlfilter
    Affects Versions: 1.20
            Reporter: Marcos Gomez


When in the crawldb there are already URLs that when updating the configuration 
of one of the URLFilter plugins are rejected, in the index phase, but they are 
not removed from the index as is done with the ‘gone’ or ‘redirects’.
Currently there is a ‘-filter’ flag that prevents these URLs from being 
processed, but they are not removed, it should be possible to apply a new 
option or parameter.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to