Hi Philip,

Thanks for the pointer, and I implemented
DistributedCache.addFileToClassPath like this for testing:

    String parserPath = conf.get("chukwa.data.dir")+File.separator+"demux";
    try {
      FileSystem fs = FileSystem.get(new Configuration());
      FileStatus[] fstatus = fs.listStatus(new Path(parserPath));
      if(fstatus!=null) {
        for(FileStatus parser : fstatus) {
          DistributedCache.addFileToClassPath(parser.getPath(), conf);
        }
      }
    } catch (IOException e) {
      log.error(ExceptionUtil.getStackTrace(e));
    }

In job.xml file, it shows:

mapred.job.classpath.files hdfs://abc.example.com/chukwa/demux/parsers.jar

mapred.cache.files hdfs://abc.example.com/chukwa/demux/parsers.jar

But the task still failed with:

Error: java.lang.ClassNotFoundException:
org.apache.hadoop.chukwa.extraction.demux.processor.mapper.MapProcessorFacto
ry

The MapProcessorFactory is in parsers.jar file, and file permission are the
same for the running user and parsers.jar file.

Any idea?  What are the difference between addArchiveToClassPath and
addFiletoClassPath?

Regards,
Eric


On 12/31/09 3:25 PM, "Philip Zeyliger" <phi...@cloudera.com> wrote:

> You should use the DistributedCache.
>  http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Distributed
> Cache
> 
> 
> 
> On Thu, Dec 31, 2009 at 3:21 PM, Eric Yang <ey...@yahoo-inc.com> wrote:
>> Hi,
>> 
>> I have a mapreduce program embeded in a java application, and I am trying to
>> load additional jar files as add-on for the execution of my map reduce job.
>> My program works like this:
>> 
>> JobConf conf = new JobConf(new Configuration(), Demux.class);
>> conf.setBoolean("mapred.used.genericoptionsparser",true);
>> 
>> args[x]="-libjars"
>> args[x+1]="/path/to/addon.jar";
>> 
>> int res = ToolRunner.run(conf, new Demux(), args);
>> 
>> When the mapreduce job is running, the task failed because it is unable to
>> find the additional classes inside addon.jar.  If I extract
>> /mapredsystem/hadoop/mapredsystem/job_200912171752_25474/job.jar, it looks
>> like only the main jar file was in the job.jar.  Does the addon.jar needs to
>> be preloaded into all task tracker node?  Could the addon.jar get uploaded
>> via the client?  If yes, how to do this properly?
>> 
>> I am using Hadoop 0.20.0 rc7 from yahoo with capacity scheduler.  Thanks in
>> advance.
>> 
>> Regards,
>> Eric
>> 
> 
> 

Reply via email to