Hi Eric,

If I grep "hdfs://" in $CHUKWA_HOME/conf, the string shows up in two places: one is in the README and the other is in chukwa-collector-conf.xml for the writer.hdfs.filesystem property. I didn't change this file, so that should be the default. chukwa-common.xml's chukwa.data.dir is still just "/chukwa".

Thanks,
Kirk

On 4/28/10 6:34 PM, Eric Yang wrote:
Hi Kirk,

Check chukwa-common.xml and make sure that chukwa.data.dir does not have hdfs://localhost:9000 pre-append to it. It's best to leave namenode address out of this path for portability.

Regards,
Eric


On 4/28/10 6:19 PM, "Kirk True" <k...@mustardgrain.com> wrote:

    Hi all,

    The problem seems to stem from the fact that the call to
    DistributedCache.addFileToClassPath is passing in a Path that is
    in URI form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar
    whereas the DistributedCache API expects it to be a
    filesystem-based path (i.e. /chukwa/demux/mydemux.jar). I'm not
    sure why, but the FileStatus object returned by
    FileSystem.listStatus is returning a URL-based path instead of a
    filesystem-based path.

    I kludged the Demux class' addParsers to strip the
    "hdfs://localhost:9000" portion of the string and now my class is
    found.

    It's frustrating when stuff silently fails :) I even turned up the
    logging in Hadoop and Chukwa to TRACE and nothing was reported.

    So, my question is, do I have something misconfigured that causes
    FileSystem.listStatus to return a URL-based path? Or does the code
    need to be changed?

    Thanks,
    Kirk

    On 4/28/10 5:41 PM, Kirk True wrote:

        Hi all,

        Just for grins I copied the Java source byte-for-byte to the
        Chukwa source folder and then ran:


            ant clean main && cp build/*.jar .


        And it worked, as expected.

        When one adds custom demux classes to a JAR, sticks it in
        hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR
        somehow magically merged with chukwa-core-0.4.0.jar to produce
        "job.jar" or do they remain separate?

        Thanks,
        Kirk

        On 4/28/10 5:09 PM, Kirk True wrote:

             Hi Jerome,

            Yes, they're all using $JAVA_HOME which is 1.6.0_18.

            I did notice that the JAVA_PLATFORM environment variable
            in chukwa-env.sh was set to 32-bit but Hadoop was
            defaulting to 64-bit (this is a 64-bit machine), but
            setting that to Linux-amd64-64 didn't make any difference.

            Thanks,
            Kirk

            On 4/28/10 4:00 PM, Jerome Boulon wrote:

                Re: Chukwa can't find Demux class Are you using the
                same version of Java for your jar and Hadoop?
                /Jerome.

                On 4/28/10 3:33 PM, "Kirk True"
                <k...@mustardgrain.com> wrote:


                    Hi Eric,

                    I added these to Hadoop's mapred-site.xml:


                    <property>
                    <name>keep.failed.task.files</name>
                    <value>true</value>
                    </property>
                    <property>
                    <name>mapred.job.tracker.persist.jobstatus.active</name>
                    <value>true</value>
                    </property>


                    This seems to have caused the task tracker
                    directory to stick around after the job is
                    complete. So, for example, I have this directory:


                    
/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001


                    Under this directory I have the following files:


                    jars/
                    job.jar
                    org/ . . .
                    job.xml

                    My Demux (XmlBasedDemux) doesn't appear in the
                    job.jar or the (apparently exploded job.jar)
                    jars/org/... directory. However, my demux JAR
                    appears in three places in the job.xml:


                    <property>
                    <name>mapred.job.classpath.files</name>
                    
<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
                    </property>
                    <property>
                    <name>mapred.jar</name>
                    
<value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001/jars/job.jar</value>
                    </property>
                    <property>
                    <name>mapred.cache.files</name>
                    
<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
                    </property>


                    So it looks like when Demux.addParsers calls
                    DistributedCache.addFileToClassPath it's working
                    as the above job conf properties include my JAR.

                    Here's my JAR contents:


                    [k...@skinner data-collection]$ unzip -l
                    data-collection-demux/target/data-collection-demux-0.1.jar

                    Archive:
                     data-collection-demux/target/data-collection-demux-0.1.jar
                      Length     Date   Time    Name
                     --------    ----   ----    ----
                            0  04-28-10 15:19   META-INF/
                          123  04-28-10 15:19   META-INF/MANIFEST.MF
                            0  04-28-10 15:19   org/
                            0  04-28-10 15:19   org/apache/
                            0  04-28-10 15:19   org/apache/hadoop/
                            0  04-28-10 15:19   org/apache/hadoop/chukwa/
                            0  04-28-10 15:19
                      org/apache/hadoop/chukwa/extraction/
                            0  04-28-10 15:19
                      org/apache/hadoop/chukwa/extraction/demux/
                            0  04-28-10 15:19
                      org/apache/hadoop/chukwa/extraction/demux/processor/
                            0  04-28-10 15:19
                      
org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
                         1697  04-28-10 15:19
                      
org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux.class
                            0  04-28-10 15:19   META-INF/maven/
                            0  04-28-10 15:19
                      META-INF/maven/com.cisco.flip.datacollection/
                            0  04-28-10 15:19
                      
META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
                         1448  04-28-10 00:23
                      
META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.xml
                          133  04-28-10 15:19
                      
META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.properties
                     --------                   -------
                         3401                   16 files


                    Here's how I'm copying the JAR into HDFS:


                    hadoop fs -mkdir /chukwa/demux
                    hadoop fs -copyFromLocal
                    /path/to/data-collection-demux-0.1.jar /chukwa/demux

                    Any ideas of more things to try?

                    Thanks,
                    Kirk


                    On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"
                    <ey...@yahoo-inc.com> wrote:
                    > Kirk,
                    >
                    > The shell script and job related information are
                    stored temporarily in
                    > 
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0xx
                    > x/, while the job is running.
                    >
                    > You should go into the jars directory and find
                    out if the compressed jar
                    > contains your class file.
                    >
                    > Regards,
                    > Eric
                    >
                    > On 4/28/10 1:57 PM, "Kirk True"
                    <k...@mustardgrain.com> wrote:
                    >
                    > > Hi Eric,
                    > >
                    > > I updated MapProcessorFactory.getProcessor to
                    dump the URLs from the
                    > > URLClassLoader from the
                    MapProcessorFactory.class. This is what I see:
                    > >
                    > >
                    > > file:/home/kirk/bin/hadoop-0.20.2/conf/
                    > > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
                    > > file:/home/kirk/bin/hadoop-0.20.2/
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
                    > >
                    
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
                    > >
                    
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
                    > >
                    
file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
                    > >
                    
file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
                    > >
                    
file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
                    > > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
                    > > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
                    > >
                    
file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
                    > >
                    
file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
                    > >
                    file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
                    > >
                    
file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
                    > >
                    
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
                    > > attempt_201004281320_0001_m_000000_0/work/
                    > >
                    
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
                    > > jars/classes
                    > >
                    
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
                    > > jars/
                    > >
                    
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
                    > > attempt_201004281320_0001_m_000000_0/work/
                    > >
                    > >
                    > > Is that the expected classpath? I don't see any
                    reference to my JAR or the
                    > > Chukwa JARs.
                    > >
                    > > Also, when I try to view the contents of my
                    "job_<timestamp>_0001" directory,
                    > > it's automatically removed, so I can't really
                    do any forensics after the fact.
                    > > I know this is probably a Hadoop question, is
                    it possible to prevent that
                    > > auto-removal from occurring?
                    > >
                    > > Thanks,
                    > > Kirk
                    > >
                    > > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"
                    <k...@mustardgrain.com> wrote:
                    > >> Hi Eric,
                    > >>
                    > >> On 4/28/10 10:23 AM, Eric Yang wrote:
                    > >>> Hi Kirk,
                    > >>>
                    > >>> Is the ownership of the jar file setup
                    correctly as the user that runs
                    > >>> demux?
                    > >>
                    > >> When browsing via the NameNode web UI, it
                    lists permissions of
                    > >> "rw-r--r--" and "kirk" as the owner (which is
                    also the user ID running
                    > >> the Hadoop and Chukwa processes).
                    > >>
                    > >>>    You may find more information by looking
                    at running mapper task or
                    > >>> reducer task, and try to find out the task
                    attempt shell script.
                    > >>
                    > >> Where is the task attempt shell script located?
                    > >>
                    > >>>    Make sure
                    > >>> the files are downloaded correctly from
                    distributed cache, and referenced in
                    > >>> the locally generated jar file.  Hope this helps.
                    > >>>
                    > >>
                    > >> Sorry for asking such basic questions, but
                    where is the locally
                    > >> generated JAR file found? I'm assuming under
                    /tmp/hadoop-<user>, by
                    > >> default? I saw one file named
                    job_<timstamp>.jar but it appeared to be a
                    > >> byte-for-byte copy of chukwa-core-0.4.0.jar,
                    i.e. my "XmlBasedDemux"
                    > >> class was nowhere to be found.
                    > >>
                    > >> Thanks,
                    > >> Kirk
                    > >>
                    > >>> Regards,
                    > >>> Eric
                    > >>>
                    > >>> On 4/28/10 9:37 AM, "Kirk
                    True"<k...@mustardgrain.com>  wrote:
                    > >>>
                    > >>>
                    > >>>> Hi guys,
                    > >>>>
                    > >>>> I have a custom Demux that I need to run to
                    process my input, but I'm
                    > >>>> getting
                    > >>>> ClassNotFoundException when running in
                    Hadoop. This is with the released
                    > >>>> 0.4.0
                    > >>>> build.
                    > >>>>
                    > >>>> I've done the following:
                    > >>>>
                    > >>>> 1. I put my Demux class in the correct package
                    > >>>>
                    (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
                    > >>>> 2. I've added the JAR containing the Demux
                    implementation to HDFS at
                    > >>>> /chuka/demux
                    > >>>> 3. I've added an alias to it in
                    chukwa-demux-conf.xml
                    > >>>>
                    > >>>> The map/reduce job is picking up on the fact
                    that I have a custom Demux and
                    > >>>> is
                    > >>>> trying to load it, but I get a
                    ClassNotFoundException. The HDFS-based URL
                    > >>>> to
                    > >>>> the JAR is showing up in the job
                    configuration in Hadoop, which is another
                    > >>>> evidence that Chukwa and Hadoop know where
                    the JAR lives and that it's part
                    > >>>> of
                    > >>>> the Chukwa-initiated job.
                    > >>>>
                    > >>>> My Demux is very simple. I've stripped it
                    down to a System.out.println with
                    > >>>> dependencies on no other classes/JARs other
                    than Chukwa, Hadoop, and the
                    > >>>> core
                    > >>>> JDK. I've double-checked that my JAR is
                    being built up correctly. I'm
                    > >>>> completely flummoxed as to what I'm doing wrong.
                    > >>>>
                    > >>>> Any ideas what I'm missing? What other
                    information can I provide?
                    > >>>>
                    > >>>> Thanks!
                    > >>>> Kirk
                    > >>>>
                    > >>>>
                    > >>>
                    > >>
                    > >
                    > >
                    >
                    >






Reply via email to