Are you using the same version of Java for your jar and Hadoop? /Jerome. On 4/28/10 3:33 PM, "Kirk True" <k...@mustardgrain.com> wrote:
> Hi Eric, > > I added these to Hadoop's mapred-site.xml: > > > <property> > <name>keep.failed.task.files</name> > <value>true</value> > </property> > <property> > <name>mapred.job.tracker.persist.jobstatus.active</name> > <value>true</value> > </property> > > > This seems to have caused the task tracker directory to stick around after the > job is complete. So, for example, I have this directory: > > > /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001 > > > Under this directory I have the following files: > > > jars/ > job.jar > org/ . . . > job.xml > > My Demux (XmlBasedDemux) doesn't appear in the job.jar or the (apparently > exploded job.jar) jars/org/... directory. However, my demux JAR appears in > three places in the job.xml: > > > <property> > <name>mapred.job.classpath.files</name> > > <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value> > > </property> > <property> > <name>mapred.jar</name> > > <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_000 > 1/jars/job.jar</value> > </property> > <property> > <name>mapred.cache.files</name> > > <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value> > > </property> > > > So it looks like when Demux.addParsers calls > DistributedCache.addFileToClassPath it's working as the above job conf > properties include my JAR. > > Here's my JAR contents: > > > [k...@skinner data-collection]$ unzip -l > data-collection-demux/target/data-collection-demux-0.1.jar > Archive: data-collection-demux/target/data-collection-demux-0.1.jar > Length Date Time Name > -------- ---- ---- ---- > 0 04-28-10 15:19 META-INF/ > 123 04-28-10 15:19 META-INF/MANIFEST.MF > 0 04-28-10 15:19 org/ > 0 04-28-10 15:19 org/apache/ > 0 04-28-10 15:19 org/apache/hadoop/ > 0 04-28-10 15:19 org/apache/hadoop/chukwa/ > 0 04-28-10 15:19 org/apache/hadoop/chukwa/extraction/ > 0 04-28-10 15:19 org/apache/hadoop/chukwa/extraction/demux/ > 0 04-28-10 15:19 > org/apache/hadoop/chukwa/extraction/demux/processor/ > 0 04-28-10 15:19 > org/apache/hadoop/chukwa/extraction/demux/processor/mapper/ > 1697 04-28-10 15:19 > org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux.class > 0 04-28-10 15:19 META-INF/maven/ > 0 04-28-10 15:19 META-INF/maven/com.cisco.flip.datacollection/ > 0 04-28-10 15:19 > META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/ > 1448 04-28-10 00:23 > META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.xml > 133 04-28-10 15:19 > META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.propert > ies > -------- ------- > 3401 16 files > > > Here's how I'm copying the JAR into HDFS: > > > hadoop fs -mkdir /chukwa/demux > hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar /chukwa/demux > > Any ideas of more things to try? > > Thanks, > Kirk > > > On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang" <ey...@yahoo-inc.com> wrote: >> > Kirk, >> > >> > The shell script and job related information are stored temporarily in >> > >> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0xx >> > x/, while the job is running. >> > >> > You should go into the jars directory and find out if the compressed jar >> > contains your class file. >> > >> > Regards, >> > Eric >> > >> > On 4/28/10 1:57 PM, "Kirk True" <k...@mustardgrain.com> wrote: >> > >>> > > Hi Eric, >>> > > >>> > > I updated MapProcessorFactory.getProcessor to dump the URLs from the >>> > > URLClassLoader from the MapProcessorFactory.class. This is what I see: >>> > > >>> > > >>> > > file:/home/kirk/bin/hadoop-0.20.2/conf/ >>> > > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/ >>> > > file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar >>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar >>> > > >>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/ >>> > > attempt_201004281320_0001_m_000000_0/work/ >>> > > >>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/ >>> > > jars/classes >>> > > >>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/ >>> > > jars/ >>> > > >>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/ >>> > > attempt_201004281320_0001_m_000000_0/work/ >>> > > >>> > > >>> > > Is that the expected classpath? I don't see any reference to my JAR or the >>> > > Chukwa JARs. >>> > > >>> > > Also, when I try to view the contents of my "job_<timestamp>_0001" >>> directory, >>> > > it's automatically removed, so I can't really do any forensics after the >>> fact. >>> > > I know this is probably a Hadoop question, is it possible to prevent >>> that >>> > > auto-removal from occurring? >>> > > >>> > > Thanks, >>> > > Kirk >>> > > >>> > > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True" <k...@mustardgrain.com> >>> wrote: >>>> > >> Hi Eric, >>>> > >> >>>> > >> On 4/28/10 10:23 AM, Eric Yang wrote: >>>>> > >>> Hi Kirk, >>>>> > >>> >>>>> > >>> Is the ownership of the jar file setup correctly as the user that runs >>>>> > >>> demux? >>>> > >> >>>> > >> When browsing via the NameNode web UI, it lists permissions of >>>> > >> "rw-r--r--" and "kirk" as the owner (which is also the user ID running >>>> > >> the Hadoop and Chukwa processes). >>>> > >> >>>>> > >>> You may find more information by looking at running mapper task or >>>>> > >>> reducer task, and try to find out the task attempt shell script. >>>> > >> >>>> > >> Where is the task attempt shell script located? >>>> > >> >>>>> > >>> Make sure >>>>> > >>> the files are downloaded correctly from distributed cache, and >>>>> referenced in >>>>> > >>> the locally generated jar file. Hope this helps. >>>>> > >>> >>>> > >> >>>> > >> Sorry for asking such basic questions, but where is the locally >>>> > >> generated JAR file found? I'm assuming under /tmp/hadoop-<user>, by >>>> > >> default? I saw one file named job_<timstamp>.jar but it appeared to be a >>>> > >> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my "XmlBasedDemux" >>>> > >> class was nowhere to be found. >>>> > >> >>>> > >> Thanks, >>>> > >> Kirk >>>> > >> >>>>> > >>> Regards, >>>>> > >>> Eric >>>>> > >>> >>>>> > >>> On 4/28/10 9:37 AM, "Kirk True"<k...@mustardgrain.com> wrote: >>>>> > >>> >>>>> > >>> >>>>>> > >>>> Hi guys, >>>>>> > >>>> >>>>>> > >>>> I have a custom Demux that I need to run to process my input, but I'm >>>>>> > >>>> getting >>>>>> > >>>> ClassNotFoundException when running in Hadoop. This is with the >>>>>> released >>>>>> > >>>> 0.4.0 >>>>>> > >>>> build. >>>>>> > >>>> >>>>>> > >>>> I've done the following: >>>>>> > >>>> >>>>>> > >>>> 1. I put my Demux class in the correct package >>>>>> > >>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper) >>>>>> > >>>> 2. I've added the JAR containing the Demux implementation to HDFS at >>>>>> > >>>> /chuka/demux >>>>>> > >>>> 3. I've added an alias to it in chukwa-demux-conf.xml >>>>>> > >>>> >>>>>> > >>>> The map/reduce job is picking up on the fact that I have a custom >>>>>> Demux and >>>>>> > >>>> is >>>>>> > >>>> trying to load it, but I get a ClassNotFoundException. The >>>>>> HDFS-based URL >>>>>> > >>>> to >>>>>> > >>>> the JAR is showing up in the job configuration in Hadoop, which is >>>>>> another >>>>>> > >>>> evidence that Chukwa and Hadoop know where the JAR lives and that >>>>>> it's part >>>>>> > >>>> of >>>>>> > >>>> the Chukwa-initiated job. >>>>>> > >>>> >>>>>> > >>>> My Demux is very simple. I've stripped it down to a >>>>>> System.out.println with >>>>>> > >>>> dependencies on no other classes/JARs other than Chukwa, Hadoop, >>>>>> and the >>>>>> > >>>> core >>>>>> > >>>> JDK. I've double-checked that my JAR is being built up correctly. I'm >>>>>> > >>>> completely flummoxed as to what I'm doing wrong. >>>>>> > >>>> >>>>>> > >>>> Any ideas what I'm missing? What other information can I provide? >>>>>> > >>>> >>>>>> > >>>> Thanks! >>>>>> > >>>> Kirk >>>>>> > >>>> >>>>>> > >>>> >>>>> > >>> >>>> > >> >>> > > >>> > > >> > >> > > >