Sebastian Nagel created NUTCH-2297:
--------------------------------------
Summary: CrawlDbReader -stats wrong values for earliest fetch time
and shortest interval
Key: NUTCH-2297
URL: https://issues.apache.org/jira/browse/NUTCH-2297
Project: Nutch
Issue Type: Bug
Components: crawldb
Affects Versions: 1.13
Reporter: Sebastian Nagel
Assignee: Sebastian Nagel
Priority: Minor
Fix For: 1.13
NUTCH-2286 added min, max and average for fetch interval and fetch time.
When running in distributed mode (not reproducible in local mode), the values
for the minimum (earliest fetch time and shortest fetch interval) may be wrong
with implausible values:
{noformat}
TOTAL urls: 7180518032
shortest fetch interval: 175 days, 00:00:00 <<<<<< ????
avg fetch interval: 10 days, 08:01:36
longest fetch interval: 15 days, 18:00:00
earliest fetch time: Thu Dec 20 05:30:00 UTC 3106 <<<<<< ????
avg of fetch times: Fri Feb 19 00:07:00 UTC 2016
latest fetch time: Mon Jul 18 05:22:00 UTC 2016
retry 0: 6907984913
retry 1: 148125397
retry 2: 82761892
retry 3: 41645830
min score: 0.0
avg score: 0.014360981
max score: 9.25
...
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)