[ 
https://issues.apache.org/jira/browse/NUTCH-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719462#comment-13719462
 ] 

Markus Jelsma commented on NUTCH-1617:
--------------------------------------

Ok, it is not alright. Before this fix the number of indexed documented 
fluctuated always (see test report NUTCH-1616). Tests this morning showed the 
fix for NUTCH-1617 also has issues, it sometimes indexes a different amount of 
documents. The number are off by less percent and it happens less regular. I 
usually get 106,684 but sometimes 106,683 and sometimes 106,682.

So this fix fixes `something` a `little bit`.


                
> IndexerMapReduce to consider latest fetchDatum
> ----------------------------------------------
>
>                 Key: NUTCH-1617
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1617
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.7
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.8
>
>
> IndexerMapReduce can skip not_modified or delete redirects and gone records 
> but it only considers the first incoming fetchDatum. Instead, it should 
> consider the last fetchDatum only based on CrawlDatum.fetchTime.
> This affect indexing of multiple segments only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to