[ 
https://issues.apache.org/jira/browse/NUTCH-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546622#comment-16546622
 ] 

ASF GitHub Bot commented on NUTCH-2616:
---------------------------------------

r0ann3l commented on issue #363: NUTCH-2616 Review routing of deletions by 
Exchange component
URL: https://github.com/apache/nutch/pull/363#issuecomment-405587261
 
 
   +1 lgtm. Thanks @sebastian-nagel 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Review routing of deletions by Exchange component
> -------------------------------------------------
>
>                 Key: NUTCH-2616
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2616
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.15
>
>
> If the exchange component (NUTCH-2412) is enabled it must also route 
> deletions (404, etc.) to the configured index writers. Deletions are done 
> alone using the document ID (URL), there is no NutchDocument (or it's null) 
> which needs to handled to avoid an NPE in the Exchanges class or the exchange 
> plugins.
> NUTCH-2412 has added a new delete method in the IndexWriters class:
> - {{delete(String, NutchDocument)}} is now called from the indexing job 
> ({{bin/nutch index ... -deleteGone}}). However, the NutchDocument is always 
> null in case of deletions, see IndexerMapReduce.DELETE_ACTION.
> - {{delete(String)}} is now a no-op but is still called from CleaningJob 
> ({{bin/nutch clean ...}})
> We could ([~roannel], are there better options?)
> - send deletions to all index writers. This causes a certain overhead (could 
> be critical if deletion lists are long).
> - pass a document containing only a single field (the document ID / URL) to 
> the exchange component.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to