Logically it 'should' increase time as its an extra step beyond the
Mapper/Reducer. But while your processing time would slightly (very
very slightly) increase, your IO and Network Transfers time would
decrease by a large margin -- giving you a clear impression that your
total job time has decreased overall. The difference being in writing
out say 10 GB before, and writing out 5-7 GB this time (a crude
example).

With the fast CPUs available these days, compressing and decompressing
should hardly take a noticeable amount of extra time. Its almost
negligible in case of using gzip, lzo or plain deflate.

On Thu, Aug 26, 2010 at 9:13 AM, Ted Yu <[email protected]> wrote:
> Compressed data would increase processing time in mapper/reducer but
> decrease the amount of data transferred between tasktracker nodes.
> Normally you should consider applying some form of compression.
>
> On Wed, Aug 25, 2010 at 7:32 PM, shangan <[email protected]> wrote:
>
>> will data stored in  compression format affect mapreduce job speed?
>> increase or decrease? or more complex relationship between these two ?  can
>> anybody give some explanation in detail?
>>
>> 2010-08-26
>>
>>
>>
>> shangan
>>
>



-- 
Harsh J
www.harshj.com

Reply via email to