On 09/14/2010 10:10 PM, Pete Tyler wrote:
I'm trying to figure out how to achieve the following from a Java client,
1. My app (which is a web server) starts up
2. As part of startup my jar file, which includes my map reduce classes are
distributed to hadoop nodes
3. My web app uses map reduce to extract data without the performance overhead
of each job deploying a jar file, via setJar(), setJarByClass()
It looks like DistributedCache() has potential but the need for commands like
'hadoop fs -copyFromLocal ...' and the API methods like
'.getLocalCacheArchives()' look to be at odds with my scenario. Any thoughts?
-Peter
For step 2, you have 2 options on how to implement:
a) call DistributedCache.addFileToClassPath(jarFileURI, conf);
b) have your app implement Tool, use ToolRunner to launch it, and
specify a -libjars command line parm which will achieve the same effect
as in (a). See
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/util/Tool.html
and
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/util/GenericOptionsParser.html#GenericOptions
for details.
HTH,
DR