RE: Multi-level aggregation with combining the result of maps per node/rack

Bikas Saha Tue, 31 Jul 2012 10:32:28 -0700

Can you please share a brief note on the design. Just a few sentences on
the main changes.




What you are saying sounds similar to multi-level aggregation done in the
Dryad <http://www.cs.cmu.edu/~./15712/papers/isard07.pdf> runtime. That is
useful to reduce the input arity (as you suggest) and also helps with
reducing the chance of seeing failures.



Bikas



-----Original Message-----
From: Tsuyoshi OZAWA [mailto:[email protected]]
Sent: Monday, July 30, 2012 6:11 PM
To: [email protected]
Subject: Multi-level aggregation with combining the result of maps per
node/rack



Hi,



We consider the shuffle cost is a main concern in MapReduce, in particular,
aggregation processing.

The shuffle costs is also expensive in Hadoop in spite of the existence of
combiner, because the scope of combining is limited within only one MapTask.



To solve this problem, I've implemented the prototype that combines the
result of multiple maps per node[1].

This is the first step to make hadoop faster with multi-level aggregation
technique like Google Dremel[2].



I took a benchmark with the prototype.

We used WordCount program with in-mapper combining optimization as the
benchmark. The benchmark is taken under 40 nodes [3].

The input data set is 300GB, 500GB, 1TB, and 2TB texts which is generated
by default RandomTextWriter. Reducer is configured as 1 on the assumption
that some workload forces 1 reducer like Google Dremel. The result is as
follows:



                         | 300GB | 500GB |   1TB |   2TB |

            Normal (sec) |  4004 |  5551 | 12177 | 27608 | Combining per
node (sec) |  3678 |  3844 |  7440 | 15591 |



Note that a MapTask runs combiner per node every 3 minutes in the current
prototype, so the aggregation rate is very limited.



"Normal" is the result of current hadoop, and "Combining per node"

is the result with my optimization.  Regardless of the 3-minutes
restriction, the prototype is 1.7 times faster than normal hadoop in 2TB
case.  Another benchmark also shows that the shuffle costs is cut down by
50%.



I want to know from you guys, do you think is it a useful feature?

If yes, I will work for contributing it.

It is also welcome to tell me the benchmark that you want me to do with my
prototype.



Regards,

Tsuyoshi





[1] The idea is also described in Hadoop wiki:

    http://wiki.apache.org/hadoop/HadoopResearchProjects

[2] Dremel paper is available at:

    http://research.google.com/pubs/pub36632.html

[3] The specification of each nodes is as follows:

    CPU Core(TM)2 Duo CPU E7400 2.80GHz x 2

    Memory 8 GB

    Network 1 GbE

RE: Multi-level aggregation with combining the result of maps per node/rack

Reply via email to