On Fri, Apr 3, 2009 at 11:39 PM, Foss User <[email protected]> wrote:

> If I have written a WordCount.java job in this manner:
>
>        conf.setMapperClass(Map.class);
>        conf.setCombinerClass(Combine.class);
>        conf.setReducerClass(Reduce.class);
>
> So, you can see that three classes are being used here.  I have
> packaged these classes into a jar file called wc.jar and I run it like
> this:
>
> $ bin/hadoop jar wc.jar WordCountJob
>
> 1) I want to know when the job runs in a 5 machine cluster, is the
> whole JAR file distributed across the 5 machines or the individual
> class files are distributed individually?


The whole jar.

>
>
> 2) Also, let us say the number of reducers are 2 while the number of
> mappers are 5. What happens in this case? How are the class files or
> jar files distributed?


It's uploaded into HDFS; specifically into a subdirectory of wherever you
configured mapred.system.dir.

>
>
> 3) Are they distributed via RPC or HTTP?


The client uses the HDFS protocol to inject its jar file into HDFS. Then all
the TaskTrackers retrieve it with the same protocol

Reply via email to