Hi Aishwarya,

Temporary output of mapper is used for reducer. And number of Reduce jobs
are based on the output keys of Mapper. It has nothing to do with
replication factor.  It is writing to three nodes because at least three
keys has been generated from mapper and assigned reducer to three different
nodes.

Regards,
Abhishek

On Thu, May 17, 2012 at 2:06 PM, Aishwarya Venkataraman <
avenk...@cs.ucsd.edu> wrote:

> Hello,
>
> I have a 4-node cluster. One namenode and 3 other datanodes. I want to
> explicitly set the dfs.replication factor to 1 inorder to run some
> experiments. I tried setting this via the hdfs-site.xml file and via
> the command line as well (hadoop dfs -setrep -R -w 1 /). But I have a
> feeling that the replication factor that hdfs is seeing is 3. It seems
> to be writing the temporary mapper outputs to all the 3 datanodes. Is
> this the default configuration for MR jobs ? If no, how can I set this
> to 1 ?
>
> Thanks,
> Aishwarya
>

Reply via email to