[ 
https://issues.apache.org/jira/browse/NUTCH-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12517913
 ] 

Andrzej Bialecki  commented on NUTCH-532:
-----------------------------------------

The compatibility code in CrawlDatum is misplaced, I think - in version 5 we 
_have_ to read the float right after we read the retries. So, the section "if 
(version > 5)" should be put right after we read the retries.

The changes in CrawlDbReducer actually point out to another bug - the old 
version worked only with the old definition of fetch interval, expressed in 
days. Your version is incorrect, because it will use the number of days as the 
number of seconds, if the property is present in the config (without 
multiplying it by SECONDS_PER_DAY). Instead, you should copy & paste the same 
logic that is present in AbstractFetchSchedule, which tries to discover the 
right value based on old and new property values (with different units).

> CrawlDbMerger: wrong computation of last fetch time
> ---------------------------------------------------
>
>                 Key: NUTCH-532
>                 URL: https://issues.apache.org/jira/browse/NUTCH-532
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Emmanuel Joke
>            Assignee: Emmanuel Joke
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-532.patch, NUTCH-532_v2.patch, NUTCH-532_v3.patch
>
>
> CrawlDbMerger.reduce analyse the last fetch time of each record and keep the 
> more recent record.
> This comparison is based on a FetchInterval in days : resTime = 
> res.getFetchTime() - Math.round(res.getFetchInterval() * 3600 * 24 * 1000);
> It was not really a noticeable as the Math.Round method return the 
> INTEGER.MAX_VALUE i.e 25 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to