Re: Intermediate data size of Sort example

Harsh J Wed, 29 Jun 2011 06:11:07 -0700

Virajith,

The FILE_BYTES_READ also counts all the reads of spilled records done
during sorting of the various outputs between the MR phases.


On Wed, Jun 29, 2011 at 6:30 PM, Virajith Jalaparti
<virajit...@gmail.com> wrote:
> I would like to clarify my earlier question: I found that each reducer
> reports FILE_BYTES_READ as around 78GB and HDFS_BYTES_WRITTEN as 25GB and
> REDUCE_SHUFFLE_BYTES as 25GB. So, why is the FILE_BYTES_READ  78GB and not
> just 25GB?
>
> Thanks,
> Virajith
>
> On Wed, Jun 29, 2011 at 10:29 AM, Virajith Jalaparti <virajit...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I was running the Sort example in Hadoop 0.20.2
>> (hadoop-0.20.2-examples.jar) over an input data size of 100GB (generated
>> using randomwriter) with 800mappers (I was using 128MB of HDFS block size)
>> and 4 reducers over a 3 machine cluster with 2 slave nodes. While the input
>> and output were 100GB, I found that the intermediate data sent to each
>> reducer was around 78GB, making the total intermediate data around 310GB. I
>> dont really understand why there is an increase in data size given that the
>> sort example just uses the identity mapper and identity reducer.
>> Could someone please help me out with this?
>>
>> Thanks!!
>
>



-- 
Harsh J

Re: Intermediate data size of Sort example

Reply via email to