Intermediate data size of Sort example

Virajith Jalaparti Wed, 29 Jun 2011 02:30:02 -0700

Hi,

I was running the Sort example in Hadoop 0.20.2 (hadoop-0.20.2-examples.jar)
over an input data size of 100GB (generated using randomwriter) with
800mappers (I was using 128MB of HDFS block size) and 4 reducers over a 3
machine cluster with 2 slave nodes. While the input and output were 100GB, I
found that the intermediate data sent to each reducer was around 78GB,
making the total intermediate data around 310GB. I dont really understand
why there is an increase in data size given that the sort example just uses
the identity mapper and identity reducer.
Could someone please help me out with this?


Thanks!!

Intermediate data size of Sort example

Reply via email to