Thanks David, 

I've been trying to use DistributedCache as I've had it suggested to me twice 
but I'm afraid I'm just not getting it. 

It appears I need to associate my use of DistributedCache.addFileToClassPath() 
with a specific JobConf instance. If this is the case what does 
addFileToClassPath() give me that I don't already get with setJar()? The 
performance hit from using setJar() fir every Job is huge so I assume having to 
use addFileToClassPath() for every Job will also be huge.

I'm looking to add a jar to my Hadoop classpath just once and then use it for 
many different map/reduce jobs. Effectively I am trying to dynamically have the 
same impact as hardcoding my jar file to HADOOP_CLASSPATH in hadoop-env.sh for 
every node in my system. I still can't see how to do this :(

On Sep 15, 2010, at 11:46 AM, David Rosenstrauch <[email protected]> wrote:

> On 09/14/2010 10:10 PM, Pete Tyler wrote:
>> I'm trying to figure out how to achieve the following from a Java client,
>> 1. My app (which is a web server) starts up
>> 2. As part of startup my jar file, which includes my map reduce classes are 
>> distributed to hadoop nodes
>> 3. My web app uses map reduce to extract data without the performance 
>> overhead of each job deploying a jar file, via setJar(), setJarByClass()
>> 
>> It looks like DistributedCache() has potential but the need for commands 
>> like 'hadoop fs -copyFromLocal ...' and the API methods like 
>> '.getLocalCacheArchives()' look to be at odds with my scenario. Any thoughts?
>> 
>> -Peter
> 
> For step 2, you have 2 options on how to implement:
> a) call DistributedCache.addFileToClassPath(jarFileURI, conf);
> b) have your app implement Tool, use ToolRunner to launch it, and specify a 
> -libjars command line parm which will achieve the same effect as in (a).  See 
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/util/Tool.html
>  and 
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/util/GenericOptionsParser.html#GenericOptions
>  for details.
> 
> HTH,
> 
> DR

Reply via email to