[
https://issues.apache.org/jira/browse/HADOOP-12406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli updated HADOOP-12406:
---------------------------------------------
Assignee: Nadeem Douba
Status: Open (was: Patch Available)
Hi [~ndouba],
I'm about to do a 2.7.3 Apache Hadoop release and finally got around to this
again.
h4. Analysis
To make progress, I had to read up a bit on nutch and about how to run this so
that I can reproduce the bug in order to rationalize your patch. I finally
succeeded in doing so! Tested this with 2.7.2 release and nutch 1.11 and using
the URL feed [given at
NUTCH-1084|https://issues.apache.org/jira/browse/NUTCH-1084?focusedCommentId=13882771&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13882771]
{code}
~/tmp/common/hadoop-common-2.7.2/bin/hadoop jar apache-nutch-1.11.job
org.apache.nutch.crawl.CrawlDbReader
file:///tmp/nutch/apache-nutch-1.11/runtime/local/crawl/crawldb/ -url
http://bappenas.go.id/
{code}
I can reproduce all the problems listed at NUTCH-1084 - with readdb, MR
local-job-runner based job for crawling etc.
The real issue is that Nutch's readdb is client-only and *not* running a
MapReduce job which was my question before. For regular MR jobs, the job-jar
*is* on the system class-loader. For the client-only invocations using "hadoop
jar" and local-job-runner, the job-jar is actually *not* on the
system-classpath - that is why you are running into the issue.
h4. Summary
Your patch looks good to me. Clearly, the thread context-loader falls back to
system class-loader where it is not overridden - so we are fine for all the
ways of loading the classes in readFields.
I'll resubmit your patch with minor commenting related changes to Jenkins and
commit if Mr.Jenkins is also fine.
> AbstractMapWritable.readFields throws ClassNotFoundException with custom
> writables
> ----------------------------------------------------------------------------------
>
> Key: HADOOP-12406
> URL: https://issues.apache.org/jira/browse/HADOOP-12406
> Project: Hadoop Common
> Issue Type: Bug
> Components: io
> Affects Versions: 2.7.1
> Environment: Ubuntu Linux 14.04 LTS amd64
> Reporter: Nadeem Douba
> Assignee: Nadeem Douba
> Priority: Blocker
> Labels: bug, hadoop, io, newbie, patch-available
> Attachments: HADOOP-12406.patch
>
>
> Note: I am not an expert at JAVA, Class loaders, or Hadoop. I am just a
> hacker. My solution might be entirely wrong.
> AbstractMapWritable.readFields throws a ClassNotFoundException when reading
> custom writables. Debugging the job using remote debugging in IntelliJ
> revealed that the class loader being used in Class.forName() is different
> than that used by the Thread's current context
> (Thread.currentThread().getContextClassLoader()). The class path for the
> system class loader does not include the libraries of the job jar. However,
> the class path for the context class loader does. The proposed patch changes
> the class loading mechanism in readFields to use the Thread's context class
> loader instead of the system's default class loader.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)