[
https://issues.apache.org/jira/browse/NUTCH-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448041#comment-16448041
]
ASF GitHub Bot commented on NUTCH-2571:
---------------------------------------
sebastian-nagel opened a new pull request #325: NUTCH-2571 SegmentReader -list
fails to read segment
URL: https://github.com/apache/nutch/pull/325
- fix type of value (CrawlDatum not Text)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> SegmentReader -list fails to read segment
> -----------------------------------------
>
> Key: NUTCH-2571
> URL: https://issues.apache.org/jira/browse/NUTCH-2571
> Project: Nutch
> Issue Type: Bug
> Components: segment
> Affects Versions: 1.15
> Environment: local + pseudo-distributed mode
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.15
>
>
> The -list command of SegmentReader fails to read data from segments:
> {noformat}
> % bin/nutch readseg -list crawl/segments/20180409100315/
> Exception in thread "main" java.io.IOException: wrong value class: is not
> class org.apache.nutch.crawl.CrawlDatum
> at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2379)
> at
> org.apache.nutch.segment.SegmentReader.getStats(SegmentReader.java:524)
> at org.apache.nutch.segment.SegmentReader.list(SegmentReader.java:482)
> at org.apache.nutch.segment.SegmentReader.run(SegmentReader.java:670)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:736)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)