[
https://issues.apache.org/jira/browse/NUTCH-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-2297.
------------------------------------
Resolution: Fixed
See comments in NUTCH-2474.
> CrawlDbReader -stats wrong values for earliest fetch time and shortest
> interval
> -------------------------------------------------------------------------------
>
> Key: NUTCH-2297
> URL: https://issues.apache.org/jira/browse/NUTCH-2297
> Project: Nutch
> Issue Type: Bug
> Components: crawldb
> Affects Versions: 1.13
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.14
>
>
> NUTCH-2286 added min, max and average for fetch interval and fetch time.
> When running in distributed mode (not reproducible in local mode), the values
> for the minimum (earliest fetch time and shortest fetch interval) may be
> wrong with implausible values:
> {noformat}
> TOTAL urls: 7180518032
> shortest fetch interval: 175 days, 00:00:00 <<<<<< ????
> avg fetch interval: 10 days, 08:01:36
> longest fetch interval: 15 days, 18:00:00
> earliest fetch time: Thu Dec 20 05:30:00 UTC 3106 <<<<<< ????
> avg of fetch times: Fri Feb 19 00:07:00 UTC 2016
> latest fetch time: Mon Jul 18 05:22:00 UTC 2016
> retry 0: 6907984913
> retry 1: 148125397
> retry 2: 82761892
> retry 3: 41645830
> min score: 0.0
> avg score: 0.014360981
> max score: 9.25
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)