[
https://issues.apache.org/jira/browse/NUTCH-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741933#comment-14741933
]
Nadeem Douba commented on NUTCH-1084:
-------------------------------------
I think I found the issue and I don't think it's related to Nutch.
AbstractMapWritable uses the Class.forName method which throws the CNFE. This
is because Class.forName uses the system class loader which is different than
the current thread's class loader in that it does not include the job jar as
part of its class path. I recompiled hadoop-common to see if it would fix the
issue by replacing the Class.forName call with
Thread.currentThread().getContextClassLoader().loadClass(class). This seems to
fix the issue.
> ReadDB url throws exception
> ---------------------------
>
> Key: NUTCH-1084
> URL: https://issues.apache.org/jira/browse/NUTCH-1084
> Project: Nutch
> Issue Type: Bug
> Affects Versions: 1.3
> Reporter: Markus Jelsma
> Assignee: Markus Jelsma
> Attachments: NUTCH-1084.patch
>
>
> Readdb -url suffers from two problems:
> 1. it trips over the _SUCCESS file generated by newer Hadoop version
> 2. throws can't find class: org.apache.nutch.protocol.ProtocolStatus (???)
> The first problem can be remedied by not allowing the injector or updater to
> write the _SUCCESS file. Until now that's the solution implemented for
> similar issues. I've not been successful as to make the Hadoop readers simply
> skip the file.
> The second issue seems a bit strange and did not happen on a local check out.
> I'm not yet sure whether this is a Hadoop issue or something being corrupt in
> the CrawlDB. Here's the stack trace:
> {code}
> Exception in thread "main" java.io.IOException: can't find class:
> org.apache.nutch.protocol.ProtocolStatus because
> org.apache.nutch.protocol.ProtocolStatus
> at
> org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:204)
> at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:146)
> at org.apache.nutch.crawl.CrawlDatum.readFields(CrawlDatum.java:278)
> at
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751)
> at org.apache.hadoop.io.MapFile$Reader.get(MapFile.java:524)
> at
> org.apache.hadoop.mapred.MapFileOutputFormat.getEntry(MapFileOutputFormat.java:105)
> at org.apache.nutch.crawl.CrawlDbReader.get(CrawlDbReader.java:383)
> at
> org.apache.nutch.crawl.CrawlDbReader.readUrl(CrawlDbReader.java:389)
> at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:514)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)