Ooh. The other DCache-based operations assume that you're dcaching files already resident in HDFS. I guess this assumes that the filenames are on the local filesystem.
- Aaron On Wed, Apr 8, 2009 at 8:32 AM, Brian MacKay <[email protected]>wrote: > > I use addArchiveToClassPath, and it works for me. > > DistributedCache.addArchiveToClassPath(new Path(path), conf); > > I was curious about this block of code. Why are you coping to tmp? > > > FileSystem fs = FileSystem.get(conf); > > fs.copyFromLocalFile(new Path("aaronTest2.jar"), new > > Path("tmp/aaronTest2.jar")); > > -----Original Message----- > From: Tom White [mailto:[email protected]] > Sent: Wednesday, April 08, 2009 9:36 AM > To: [email protected] > Subject: Re: Example of deploying jars through DistributedCache? > > Does it work if you use addArchiveToClassPath()? > > Also, it may be more convenient to use GenericOptionsParser's -libjars > option. > > Tom > > On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball <[email protected]> wrote: > > Hi all, > > > > I'm stumped as to how to use the distributed cache's classpath feature. I > > have a library of Java classes I'd like to distribute to jobs and use in > my > > mapper; I figured the DCache's addFileToClassPath() method was the > correct > > means, given the example at > > > http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html > . > > > > > > I've boiled it down to the following non-working example: > > > > in TestDriver.java: > > > > > > private void runJob() throws IOException { > > JobConf conf = new JobConf(getConf(), TestDriver.class); > > > > // do standard job configuration. > > FileInputFormat.addInputPath(conf, new Path("input")); > > FileOutputFormat.setOutputPath(conf, new Path("output")); > > > > conf.setMapperClass(TestMapper.class); > > conf.setNumReduceTasks(0); > > > > // load aaronTest2.jar into the dcache; this contains the class > > ValueProvider > > FileSystem fs = FileSystem.get(conf); > > fs.copyFromLocalFile(new Path("aaronTest2.jar"), new > > Path("tmp/aaronTest2.jar")); > > DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"), > > conf); > > > > // run the job. > > JobClient.runJob(conf); > > } > > > > > > .... and then in TestMapper: > > > > public void map(LongWritable key, Text value, > > OutputCollector<LongWritable, Text> output, > > Reporter reporter) throws IOException { > > > > try { > > ValueProvider vp = (ValueProvider) > > Class.forName("ValueProvider").newInstance(); > > Text val = vp.getValue(); > > output.collect(new LongWritable(1), val); > > } catch (ClassNotFoundException e) { > > throw new IOException("not found: " + e.toString()); // > newInstance() > > throws to here. > > } catch (Exception e) { > > throw new IOException("Exception:" + e.toString()); > > } > > } > > > > > > The class "ValueProvider" is to be loaded from aaronTest2.jar. I can > verify > > that this code works if I put ValueProvider into the main jar I deploy. I > > can verify that aaronTest2.jar makes it into the > > ${mapred.local.dir}/taskTracker/archive/ > > > > But when run with ValueProvider in aaronTest2.jar, the job fails with: > > > > $ bin/hadoop jar aaronTest1.jar TestDriver > > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to > process > > : 10 > > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to > process > > : 10 > > 09/03/01 22:36:04 INFO mapred.JobClient: Running job: > job_200903012210_0005 > > 09/03/01 22:36:05 INFO mapred.JobClient: map 0% reduce 0% > > 09/03/01 22:36:14 INFO mapred.JobClient: Task Id : > > attempt_200903012210_0005_m_000000_0, Status : FAILED > > java.io.IOException: not found: java.lang.ClassNotFoundException: > > ValueProvider > > at TestMapper.map(Unknown Source) > > at TestMapper.map(Unknown Source) > > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227) > > at > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) > > > > > > Do I need to do something else (maybe in Mapper.configure()?) to actually > > classload the jar? The documentation makes me believe it should already > be > > in the classpath by doing only what I've done above. I'm on Hadoop > 0.18.3. > > > > Thanks, > > - Aaron > > > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > The information transmitted is intended only for the person or entity to > which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipient is prohibited. If you received > this message in error, please contact the sender and delete the material > from any computer. > > >
