I have some purely subjective experience. I invite anyone with empirical evidence to pipe up if possible.
It can be used, but there are a couple of current important caveats: 1] If your maps have a tremendous amount of output, the TaskTrackers will start producing OutOfMemory exceptions (and depending on which version you're using, subsequently hang). 2] In our experience, you MUST compile native compression libraries, and include those in your distribution. If you use Java's compression, you will get wildly unpredictable performance, ranging from slow to "why do we even bother with computers!?" -- Marco On 8/2/07 08:53, "Emmanuel" <[EMAIL PROTECTED]> wrote: > I notice that the process reduce > copy is very slow. > > I would like to configure hadoop to compress the map ouput. > <property> > <name>mapred.compress.map.output</name> > <value>true</value> > <description></description> > </property> > > <property> > <name>map.output.compression.type</name> > <value>RECORD</value> > <description></description> > </property> > > I'm wondering if someone already use it or if you have some statistics about > the improvement. > > Any advice or feedback are welcome. > > Thanks -- Marco Nicosia - Kryptonite Grid Systems, Tools, and Services Group
