Hadoop cannot find custom Demux class
-------------------------------------
Key: CHUKWA-488
URL: https://issues.apache.org/jira/browse/CHUKWA-488
Project: Hadoop Chukwa
Issue Type: Bug
Components: MR Data Processors
Affects Versions: 0.4.0
Environment: Linux x86-64
Java 1.6.0_20
Reporter: Kirk True
I'm getting ClassNotFoundException errors when running inside Hadoop's map
phase, unable to find my class
org.apache.hadoop.chukwa.extraction.demux.processor.mapper.XmlBasedDemux which
I've packaged in a JAR named data-collection-demux-0.1.jar.
The problem seems to be in the values of these two properties in the Hadoop job
configuration:
{code}
<property>
<name>mapred.job.classpath.files</name>
<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
</property>
<property>
<name>mapred.cache.files</name>
<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
</property>
{code}
The problem seems to stem from the fact that the call to
DistributedCache.addFileToClassPath is passing in a Path that is in URI form,
i.e. hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar whereas
the DistributedCache API expects it to be a filesystem-based path (i.e.
/chukwa/demux/data-collection-demux-0.1.jar). I'm not sure why, but the
FileStatus object returned by FileSystem.listStatus is returning a URL-based
path instead of a filesystem-based path.
I kludged the Demux class' addParsers to strip the "hdfs://localhost:9000"
portion of the string and now my class is found. I will attempt to provide a
patch today that determines the value of Hadoop's fs.default.name and strips
that from the value returned in Demux.java.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.