[ https://issues.apache.org/jira/browse/NUTCH-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549310#comment-16549310 ]
Hudson commented on NUTCH-2616: ------------------------------- SUCCESS: Integrated in Jenkins build Nutch-trunk #3547 (See [https://builds.apache.org/job/Nutch-trunk/3547/]) NUTCH-2616 Review routing of deletions by Exchange component - send (snagel: [https://github.com/apache/nutch/commit/718f51bf185e07bff37bb716564c8eb56f637a2b]) * (edit) src/java/org/apache/nutch/indexer/IndexWriters.java * (edit) src/java/org/apache/nutch/indexer/IndexerOutputFormat.java > Review routing of deletions by Exchange component > ------------------------------------------------- > > Key: NUTCH-2616 > URL: https://issues.apache.org/jira/browse/NUTCH-2616 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.15 > Reporter: Sebastian Nagel > Priority: Major > Fix For: 1.15 > > > If the exchange component (NUTCH-2412) is enabled it must also route > deletions (404, etc.) to the configured index writers. Deletions are done > alone using the document ID (URL), there is no NutchDocument (or it's null) > which needs to handled to avoid an NPE in the Exchanges class or the exchange > plugins. > NUTCH-2412 has added a new delete method in the IndexWriters class: > - {{delete(String, NutchDocument)}} is now called from the indexing job > ({{bin/nutch index ... -deleteGone}}). However, the NutchDocument is always > null in case of deletions, see IndexerMapReduce.DELETE_ACTION. > - {{delete(String)}} is now a no-op but is still called from CleaningJob > ({{bin/nutch clean ...}}) > We could ([~roannel], are there better options?) > - send deletions to all index writers. This causes a certain overhead (could > be critical if deletion lists are long). > - pass a document containing only a single field (the document ID / URL) to > the exchange component. -- This message was sent by Atlassian JIRA (v7.6.3#76005)