Logically it 'should' increase time as its an extra step beyond the Mapper/Reducer. But while your processing time would slightly (very very slightly) increase, your IO and Network Transfers time would decrease by a large margin -- giving you a clear impression that your total job time has decreased overall. The difference being in writing out say 10 GB before, and writing out 5-7 GB this time (a crude example).
With the fast CPUs available these days, compressing and decompressing should hardly take a noticeable amount of extra time. Its almost negligible in case of using gzip, lzo or plain deflate. On Thu, Aug 26, 2010 at 9:13 AM, Ted Yu <[email protected]> wrote: > Compressed data would increase processing time in mapper/reducer but > decrease the amount of data transferred between tasktracker nodes. > Normally you should consider applying some form of compression. > > On Wed, Aug 25, 2010 at 7:32 PM, shangan <[email protected]> wrote: > >> will data stored in compression format affect mapreduce job speed? >> increase or decrease? or more complex relationship between these two ? can >> anybody give some explanation in detail? >> >> 2010-08-26 >> >> >> >> shangan >> > -- Harsh J www.harshj.com
