Hi Kirk,
Check chukwa-common.xml and make sure that chukwa.data.dir does not
have hdfs://localhost:9000 pre-append to it. It's best to leave
namenode address out of this path for portability.
Regards,
Eric
On 4/28/10 6:19 PM, "Kirk True" <k...@mustardgrain.com> wrote:
Hi all,
The problem seems to stem from the fact that the call to
DistributedCache.addFileToClassPath is passing in a Path that is
in URI form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar
whereas the DistributedCache API expects it to be a
filesystem-based path (i.e. /chukwa/demux/mydemux.jar). I'm not
sure why, but the FileStatus object returned by
FileSystem.listStatus is returning a URL-based path instead of a
filesystem-based path.
I kludged the Demux class' addParsers to strip the
"hdfs://localhost:9000" portion of the string and now my class is
found.
It's frustrating when stuff silently fails :) I even turned up the
logging in Hadoop and Chukwa to TRACE and nothing was reported.
So, my question is, do I have something misconfigured that causes
FileSystem.listStatus to return a URL-based path? Or does the code
need to be changed?
Thanks,
Kirk
On 4/28/10 5:41 PM, Kirk True wrote:
Hi all,
Just for grins I copied the Java source byte-for-byte to the
Chukwa source folder and then ran:
ant clean main && cp build/*.jar .
And it worked, as expected.
When one adds custom demux classes to a JAR, sticks it in
hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR
somehow magically merged with chukwa-core-0.4.0.jar to produce
"job.jar" or do they remain separate?
Thanks,
Kirk
On 4/28/10 5:09 PM, Kirk True wrote:
Hi Jerome,
Yes, they're all using $JAVA_HOME which is 1.6.0_18.
I did notice that the JAVA_PLATFORM environment variable
in chukwa-env.sh was set to 32-bit but Hadoop was
defaulting to 64-bit (this is a 64-bit machine), but
setting that to Linux-amd64-64 didn't make any difference.
Thanks,
Kirk
On 4/28/10 4:00 PM, Jerome Boulon wrote:
Re: Chukwa can't find Demux class Are you using the
same version of Java for your jar and Hadoop?
/Jerome.
On 4/28/10 3:33 PM, "Kirk True"
<k...@mustardgrain.com> wrote:
Hi Eric,
I added these to Hadoop's mapred-site.xml:
<property>
<name>keep.failed.task.files</name>
<value>true</value>
</property>
<property>
<name>mapred.job.tracker.persist.jobstatus.active</name>
<value>true</value>
</property>
This seems to have caused the task tracker
directory to stick around after the job is
complete. So, for example, I have this directory:
/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001
Under this directory I have the following files:
jars/
job.jar
org/ . . .
job.xml
My Demux (XmlBasedDemux) doesn't appear in the
job.jar or the (apparently exploded job.jar)
jars/org/... directory. However, my demux JAR
appears in three places in the job.xml:
<property>
<name>mapred.job.classpath.files</name>
<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
</property>
<property>
<name>mapred.jar</name>
<value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001/jars/job.jar</value>
</property>
<property>
<name>mapred.cache.files</name>
<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
</property>
So it looks like when Demux.addParsers calls
DistributedCache.addFileToClassPath it's working
as the above job conf properties include my JAR.
Here's my JAR contents:
[k...@skinner data-collection]$ unzip -l
data-collection-demux/target/data-collection-demux-0.1.jar
Archive:
data-collection-demux/target/data-collection-demux-0.1.jar
Length Date Time Name
-------- ---- ---- ----
0 04-28-10 15:19 META-INF/
123 04-28-10 15:19 META-INF/MANIFEST.MF
0 04-28-10 15:19 org/
0 04-28-10 15:19 org/apache/
0 04-28-10 15:19 org/apache/hadoop/
0 04-28-10 15:19 org/apache/hadoop/chukwa/
0 04-28-10 15:19
org/apache/hadoop/chukwa/extraction/
0 04-28-10 15:19
org/apache/hadoop/chukwa/extraction/demux/
0 04-28-10 15:19
org/apache/hadoop/chukwa/extraction/demux/processor/
0 04-28-10 15:19
org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
1697 04-28-10 15:19
org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux.class
0 04-28-10 15:19 META-INF/maven/
0 04-28-10 15:19
META-INF/maven/com.cisco.flip.datacollection/
0 04-28-10 15:19
META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
1448 04-28-10 00:23
META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.xml
133 04-28-10 15:19
META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.properties
-------- -------
3401 16 files
Here's how I'm copying the JAR into HDFS:
hadoop fs -mkdir /chukwa/demux
hadoop fs -copyFromLocal
/path/to/data-collection-demux-0.1.jar /chukwa/demux
Any ideas of more things to try?
Thanks,
Kirk
On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"
<ey...@yahoo-inc.com> wrote:
> Kirk,
>
> The shell script and job related information are
stored temporarily in
>
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0xx
> x/, while the job is running.
>
> You should go into the jars directory and find
out if the compressed jar
> contains your class file.
>
> Regards,
> Eric
>
> On 4/28/10 1:57 PM, "Kirk True"
<k...@mustardgrain.com> wrote:
>
> > Hi Eric,
> >
> > I updated MapProcessorFactory.getProcessor to
dump the URLs from the
> > URLClassLoader from the
MapProcessorFactory.class. This is what I see:
> >
> >
> > file:/home/kirk/bin/hadoop-0.20.2/conf/
> > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
> > file:/home/kirk/bin/hadoop-0.20.2/
> >
file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
> > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
> >
file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
> >
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
> > attempt_201004281320_0001_m_000000_0/work/
> >
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
> > jars/classes
> >
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
> > jars/
> >
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
> > attempt_201004281320_0001_m_000000_0/work/
> >
> >
> > Is that the expected classpath? I don't see any
reference to my JAR or the
> > Chukwa JARs.
> >
> > Also, when I try to view the contents of my
"job_<timestamp>_0001" directory,
> > it's automatically removed, so I can't really
do any forensics after the fact.
> > I know this is probably a Hadoop question, is
it possible to prevent that
> > auto-removal from occurring?
> >
> > Thanks,
> > Kirk
> >
> > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"
<k...@mustardgrain.com> wrote:
> >> Hi Eric,
> >>
> >> On 4/28/10 10:23 AM, Eric Yang wrote:
> >>> Hi Kirk,
> >>>
> >>> Is the ownership of the jar file setup
correctly as the user that runs
> >>> demux?
> >>
> >> When browsing via the NameNode web UI, it
lists permissions of
> >> "rw-r--r--" and "kirk" as the owner (which is
also the user ID running
> >> the Hadoop and Chukwa processes).
> >>
> >>> You may find more information by looking
at running mapper task or
> >>> reducer task, and try to find out the task
attempt shell script.
> >>
> >> Where is the task attempt shell script located?
> >>
> >>> Make sure
> >>> the files are downloaded correctly from
distributed cache, and referenced in
> >>> the locally generated jar file. Hope this helps.
> >>>
> >>
> >> Sorry for asking such basic questions, but
where is the locally
> >> generated JAR file found? I'm assuming under
/tmp/hadoop-<user>, by
> >> default? I saw one file named
job_<timstamp>.jar but it appeared to be a
> >> byte-for-byte copy of chukwa-core-0.4.0.jar,
i.e. my "XmlBasedDemux"
> >> class was nowhere to be found.
> >>
> >> Thanks,
> >> Kirk
> >>
> >>> Regards,
> >>> Eric
> >>>
> >>> On 4/28/10 9:37 AM, "Kirk
True"<k...@mustardgrain.com> wrote:
> >>>
> >>>
> >>>> Hi guys,
> >>>>
> >>>> I have a custom Demux that I need to run to
process my input, but I'm
> >>>> getting
> >>>> ClassNotFoundException when running in
Hadoop. This is with the released
> >>>> 0.4.0
> >>>> build.
> >>>>
> >>>> I've done the following:
> >>>>
> >>>> 1. I put my Demux class in the correct package
> >>>>
(org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
> >>>> 2. I've added the JAR containing the Demux
implementation to HDFS at
> >>>> /chuka/demux
> >>>> 3. I've added an alias to it in
chukwa-demux-conf.xml
> >>>>
> >>>> The map/reduce job is picking up on the fact
that I have a custom Demux and
> >>>> is
> >>>> trying to load it, but I get a
ClassNotFoundException. The HDFS-based URL
> >>>> to
> >>>> the JAR is showing up in the job
configuration in Hadoop, which is another
> >>>> evidence that Chukwa and Hadoop know where
the JAR lives and that it's part
> >>>> of
> >>>> the Chukwa-initiated job.
> >>>>
> >>>> My Demux is very simple. I've stripped it
down to a System.out.println with
> >>>> dependencies on no other classes/JARs other
than Chukwa, Hadoop, and the
> >>>> core
> >>>> JDK. I've double-checked that my JAR is
being built up correctly. I'm
> >>>> completely flummoxed as to what I'm doing wrong.
> >>>>
> >>>> Any ideas what I'm missing? What other
information can I provide?
> >>>>
> >>>> Thanks!
> >>>> Kirk
> >>>>
> >>>>
> >>>
> >>
> >
> >
>
>