[ 
https://issues.apache.org/jira/browse/NUTCH-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14623871#comment-14623871
 ] 

Ashish Nerkar commented on NUTCH-2060:
--------------------------------------

Hi, I am a new to Nutch & want to  start contributing to this project. I am 
interested in working on this issue. Can anyone please update any specific 
details (like how to reproduce etc) about the issue which will help me to start 
working on it.
Thanks!!!

> dedup is removing entries with status db_gone
> ---------------------------------------------
>
>                 Key: NUTCH-2060
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2060
>             Project: Nutch
>          Issue Type: Bug
>          Components: crawldb
>    Affects Versions: 1.9
>            Reporter: Steven Hayles
>            Priority: Minor
>
> Using the standard bin/crawl script, Solr is never informed when a previously 
> indexed document has been deleted.
> "bin/nutch update" sets db_gone status in the crawl db for requests returning 
> HTTP 404 status.
> "bin/nutch dedup" remove entries with status db_gone from the crawl db .
> As a result "bin/nutch clean" never sees the db_gone status, so does not 
> inform Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to