Add support for deleting Solr documents with ProtocolStatusCodes.NOTFOUND
-------------------------------------------------------------------------
Key: NUTCH-979
URL: https://issues.apache.org/jira/browse/NUTCH-979
Project: Nutch
Issue Type: New Feature
Components: indexer
Affects Versions: 2.0
Reporter: Markus Jelsma
Assignee: Markus Jelsma
Priority: Minor
Fix For: 2.0
When issuing recrawls it can happen that certain urls have expired (i.e. URLs
that don't exist anymore and return 404).
This issue creates a new command in the indexer that scans for WebPages with
ProtocolStatusCodes.NOTFOUND and issues delete commands to Solr.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira