Does it work if you use addArchiveToClassPath()? Also, it may be more convenient to use GenericOptionsParser's -libjars option.
Tom On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball <[email protected]> wrote: > Hi all, > > I'm stumped as to how to use the distributed cache's classpath feature. I > have a library of Java classes I'd like to distribute to jobs and use in my > mapper; I figured the DCache's addFileToClassPath() method was the correct > means, given the example at > http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html. > > > I've boiled it down to the following non-working example: > > in TestDriver.java: > > > private void runJob() throws IOException { > JobConf conf = new JobConf(getConf(), TestDriver.class); > > // do standard job configuration. > FileInputFormat.addInputPath(conf, new Path("input")); > FileOutputFormat.setOutputPath(conf, new Path("output")); > > conf.setMapperClass(TestMapper.class); > conf.setNumReduceTasks(0); > > // load aaronTest2.jar into the dcache; this contains the class > ValueProvider > FileSystem fs = FileSystem.get(conf); > fs.copyFromLocalFile(new Path("aaronTest2.jar"), new > Path("tmp/aaronTest2.jar")); > DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"), > conf); > > // run the job. > JobClient.runJob(conf); > } > > > .... and then in TestMapper: > > public void map(LongWritable key, Text value, > OutputCollector<LongWritable, Text> output, > Reporter reporter) throws IOException { > > try { > ValueProvider vp = (ValueProvider) > Class.forName("ValueProvider").newInstance(); > Text val = vp.getValue(); > output.collect(new LongWritable(1), val); > } catch (ClassNotFoundException e) { > throw new IOException("not found: " + e.toString()); // newInstance() > throws to here. > } catch (Exception e) { > throw new IOException("Exception:" + e.toString()); > } > } > > > The class "ValueProvider" is to be loaded from aaronTest2.jar. I can verify > that this code works if I put ValueProvider into the main jar I deploy. I > can verify that aaronTest2.jar makes it into the > ${mapred.local.dir}/taskTracker/archive/ > > But when run with ValueProvider in aaronTest2.jar, the job fails with: > > $ bin/hadoop jar aaronTest1.jar TestDriver > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process > : 10 > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process > : 10 > 09/03/01 22:36:04 INFO mapred.JobClient: Running job: job_200903012210_0005 > 09/03/01 22:36:05 INFO mapred.JobClient: map 0% reduce 0% > 09/03/01 22:36:14 INFO mapred.JobClient: Task Id : > attempt_200903012210_0005_m_000000_0, Status : FAILED > java.io.IOException: not found: java.lang.ClassNotFoundException: > ValueProvider > at TestMapper.map(Unknown Source) > at TestMapper.map(Unknown Source) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) > > > Do I need to do something else (maybe in Mapper.configure()?) to actually > classload the jar? The documentation makes me believe it should already be > in the classpath by doing only what I've done above. I'm on Hadoop 0.18.3. > > Thanks, > - Aaron >
