Hi Philip, Thanks for the pointer, and I implemented DistributedCache.addFileToClassPath like this for testing:
String parserPath = conf.get("chukwa.data.dir")+File.separator+"demux"; try { FileSystem fs = FileSystem.get(new Configuration()); FileStatus[] fstatus = fs.listStatus(new Path(parserPath)); if(fstatus!=null) { for(FileStatus parser : fstatus) { DistributedCache.addFileToClassPath(parser.getPath(), conf); } } } catch (IOException e) { log.error(ExceptionUtil.getStackTrace(e)); } In job.xml file, it shows: mapred.job.classpath.files hdfs://abc.example.com/chukwa/demux/parsers.jar mapred.cache.files hdfs://abc.example.com/chukwa/demux/parsers.jar But the task still failed with: Error: java.lang.ClassNotFoundException: org.apache.hadoop.chukwa.extraction.demux.processor.mapper.MapProcessorFacto ry The MapProcessorFactory is in parsers.jar file, and file permission are the same for the running user and parsers.jar file. Any idea? What are the difference between addArchiveToClassPath and addFiletoClassPath? Regards, Eric On 12/31/09 3:25 PM, "Philip Zeyliger" <phi...@cloudera.com> wrote: > You should use the DistributedCache. > http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Distributed > Cache > > > > On Thu, Dec 31, 2009 at 3:21 PM, Eric Yang <ey...@yahoo-inc.com> wrote: >> Hi, >> >> I have a mapreduce program embeded in a java application, and I am trying to >> load additional jar files as add-on for the execution of my map reduce job. >> My program works like this: >> >> JobConf conf = new JobConf(new Configuration(), Demux.class); >> conf.setBoolean("mapred.used.genericoptionsparser",true); >> >> args[x]="-libjars" >> args[x+1]="/path/to/addon.jar"; >> >> int res = ToolRunner.run(conf, new Demux(), args); >> >> When the mapreduce job is running, the task failed because it is unable to >> find the additional classes inside addon.jar. If I extract >> /mapredsystem/hadoop/mapredsystem/job_200912171752_25474/job.jar, it looks >> like only the main jar file was in the job.jar. Does the addon.jar needs to >> be preloaded into all task tracker node? Could the addon.jar get uploaded >> via the client? If yes, how to do this properly? >> >> I am using Hadoop 0.20.0 rc7 from yahoo with capacity scheduler. Thanks in >> advance. >> >> Regards, >> Eric >> > >