Does it work if you use addArchiveToClassPath()?

Also, it may be more convenient to use GenericOptionsParser's -libjars option.

Tom

On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball <[email protected]> wrote:
> Hi all,
>
> I'm stumped as to how to use the distributed cache's classpath feature. I
> have a library of Java classes I'd like to distribute to jobs and use in my
> mapper; I figured the DCache's addFileToClassPath() method was the correct
> means, given the example at
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html.
>
>
> I've boiled it down to the following non-working example:
>
> in TestDriver.java:
>
>
>  private void runJob() throws IOException {
>    JobConf conf = new JobConf(getConf(), TestDriver.class);
>
>    // do standard job configuration.
>    FileInputFormat.addInputPath(conf, new Path("input"));
>    FileOutputFormat.setOutputPath(conf, new Path("output"));
>
>    conf.setMapperClass(TestMapper.class);
>    conf.setNumReduceTasks(0);
>
>    // load aaronTest2.jar into the dcache; this contains the class
> ValueProvider
>    FileSystem fs = FileSystem.get(conf);
>    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
> Path("tmp/aaronTest2.jar"));
>    DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"),
> conf);
>
>    // run the job.
>    JobClient.runJob(conf);
>  }
>
>
> .... and then in TestMapper:
>
>  public void map(LongWritable key, Text value,
> OutputCollector<LongWritable, Text> output,
>      Reporter reporter) throws IOException {
>
>    try {
>      ValueProvider vp = (ValueProvider)
> Class.forName("ValueProvider").newInstance();
>      Text val = vp.getValue();
>      output.collect(new LongWritable(1), val);
>    } catch (ClassNotFoundException e) {
>      throw new IOException("not found: " + e.toString()); // newInstance()
> throws to here.
>    } catch (Exception e) {
>      throw new IOException("Exception:" + e.toString());
>    }
>  }
>
>
> The class "ValueProvider" is to be loaded from aaronTest2.jar. I can verify
> that this code works if I put ValueProvider into the main jar I deploy. I
> can verify that aaronTest2.jar makes it into the
> ${mapred.local.dir}/taskTracker/archive/
>
> But when run with ValueProvider in aaronTest2.jar, the job fails with:
>
> $ bin/hadoop jar aaronTest1.jar TestDriver
> 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
> : 10
> 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
> : 10
> 09/03/01 22:36:04 INFO mapred.JobClient: Running job: job_200903012210_0005
> 09/03/01 22:36:05 INFO mapred.JobClient:  map 0% reduce 0%
> 09/03/01 22:36:14 INFO mapred.JobClient: Task Id :
> attempt_200903012210_0005_m_000000_0, Status : FAILED
> java.io.IOException: not found: java.lang.ClassNotFoundException:
> ValueProvider
>    at TestMapper.map(Unknown Source)
>    at TestMapper.map(Unknown Source)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>    at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>
>
> Do I need to do something else (maybe in Mapper.configure()?) to actually
> classload the jar? The documentation makes me believe it should already be
> in the classpath by doing only what I've done above. I'm on Hadoop 0.18.3.
>
> Thanks,
> - Aaron
>

Reply via email to