[ https://issues.apache.org/jira/browse/NUTCH-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel updated NUTCH-2214: ----------------------------------- Fix Version/s: (was: 1.14) 1.15 > Index clean to be flexible on what it deletes > --------------------------------------------- > > Key: NUTCH-2214 > URL: https://issues.apache.org/jira/browse/NUTCH-2214 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.11 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.15 > > > Nutch clean removes all useless records, but if Nutch is configured correctly > (-deleteGone etc), the index should only contain duplicates, if existing. On > a large index, this could result in Nutch sending millions of getById's to > Solr, for records that don't exist in the first place. > This issue will make it configurable on what to delete, e.g. useless records > (404, 30x) or duplicates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)