On Fri, Apr 3, 2009 at 11:39 PM, Foss User <[email protected]> wrote:
> If I have written a WordCount.java job in this manner: > > conf.setMapperClass(Map.class); > conf.setCombinerClass(Combine.class); > conf.setReducerClass(Reduce.class); > > So, you can see that three classes are being used here. I have > packaged these classes into a jar file called wc.jar and I run it like > this: > > $ bin/hadoop jar wc.jar WordCountJob > > 1) I want to know when the job runs in a 5 machine cluster, is the > whole JAR file distributed across the 5 machines or the individual > class files are distributed individually? The whole jar. > > > 2) Also, let us say the number of reducers are 2 while the number of > mappers are 5. What happens in this case? How are the class files or > jar files distributed? It's uploaded into HDFS; specifically into a subdirectory of wherever you configured mapred.system.dir. > > > 3) Are they distributed via RPC or HTTP? The client uses the HDFS protocol to inject its jar file into HDFS. Then all the TaskTrackers retrieve it with the same protocol
