[
https://issues.apache.org/jira/browse/NUTCH-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284278#comment-16284278
]
ASF GitHub Bot commented on NUTCH-2474:
---------------------------------------
sebastian-nagel opened a new pull request #255: NUTCH-2474 CrawlDbReader -stats
fails with ClassCastException
URL: https://github.com/apache/nutch/pull/255
- replace CrawlDbStatCombiner by CrawlDbStatReducer and ensure
that data is properly processed independently whether and
how often combiner is called
- simplify calculation of minimum and maximum
Tested in local mode. Large scale test on multi-billion CrawlDb (distributed
mode) is scheduled. I'll report the results after the weekend.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> CrawlDbReader -stats fails with ClassCastException
> --------------------------------------------------
>
> Key: NUTCH-2474
> URL: https://issues.apache.org/jira/browse/NUTCH-2474
> Project: Nutch
> Issue Type: Bug
> Components: crawldb
> Affects Versions: 1.14
> Environment: Java 8, distributed mode: Hadoop CDH 5.13.0
> Reporter: Sebastian Nagel
> Priority: Critical
> Fix For: 1.14
>
>
> In distributed mode CrawlDbReader / readdb -stats fails with a
> ClassCastException in the combiner:
> {noformat}
> 17/12/08 04:57:13 INFO mapreduce.Job: Task Id :
> attempt_1512553291624_0022_m_000039_0, Status : FAILED
> Error: java.lang.ClassCastException: org.apache.hadoop.io.FloatWritable
> cannot be cast to org.apache.hadoop.io.LongWritable
> at
> org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatCombiner.reduce(CrawlDbReader.java:296)
> at
> org.apache.nutch.crawl.CrawlDbReader$CrawlDbStatCombiner.reduce(CrawlDbReader.java:222)
> at
> org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1639)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1946)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1514)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:466)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> {noformat}
> FloatWritables are used since NUTCH-2470, so that's when this bug was
> introduced.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)