Bikas, Yes, this feature is similar to Dryad approach. The design of current prototype is a bit hacky to minimize the costs of implementation. The modified file is only MapTask.java. Processing flow is as follows:
1. Move the results of output files to temporary directory after doing mapper function. 2. A leader of the container run combiner each 3 minutes if there are some files under the temporary directory. A leader of the container is elected by file lock. I know that this design breaks the fault-tolerance of MapReduce, and isn't acceptable for hadoop. I'll renew the design and document it. > That is useful to reduce the input arity (as you suggest) and also helps with > reducing the chance of seeing failures. What does the latter mean? Could you explain with an example? Thank you for your comment, Tsuyoshi OZAWA On Wed, Aug 1, 2012 at 2:32 AM, Bikas Saha <[email protected]> wrote: > Can you please share a brief note on the design. Just a few sentences on > the main changes. > > > > What you are saying sounds similar to multi-level aggregation done in the > Dryad <http://www.cs.cmu.edu/~./15712/papers/isard07.pdf> runtime. That is > useful to reduce the input arity (as you suggest) and also helps with > reducing the chance of seeing failures. > > > > Bikas > > > > -----Original Message----- > From: Tsuyoshi OZAWA [mailto:[email protected]] > Sent: Monday, July 30, 2012 6:11 PM > To: [email protected] > Subject: Multi-level aggregation with combining the result of maps per > node/rack > > > > Hi, > > > > We consider the shuffle cost is a main concern in MapReduce, in particular, > aggregation processing. > > The shuffle costs is also expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > > > > To solve this problem, I've implemented the prototype that combines the > result of multiple maps per node[1]. > > This is the first step to make hadoop faster with multi-level aggregation > technique like Google Dremel[2]. > > > > I took a benchmark with the prototype. > > We used WordCount program with in-mapper combining optimization as the > benchmark. The benchmark is taken under 40 nodes [3]. > > The input data set is 300GB, 500GB, 1TB, and 2TB texts which is generated > by default RandomTextWriter. Reducer is configured as 1 on the assumption > that some workload forces 1 reducer like Google Dremel. The result is as > follows: > > > > | 300GB | 500GB | 1TB | 2TB | > > Normal (sec) | 4004 | 5551 | 12177 | 27608 | Combining per > node (sec) | 3678 | 3844 | 7440 | 15591 | > > > > Note that a MapTask runs combiner per node every 3 minutes in the current > prototype, so the aggregation rate is very limited. > > > > "Normal" is the result of current hadoop, and "Combining per node" > > is the result with my optimization. Regardless of the 3-minutes > restriction, the prototype is 1.7 times faster than normal hadoop in 2TB > case. Another benchmark also shows that the shuffle costs is cut down by > 50%. > > > > I want to know from you guys, do you think is it a useful feature? > > If yes, I will work for contributing it. > > It is also welcome to tell me the benchmark that you want me to do with my > prototype. > > > > Regards, > > Tsuyoshi > > > > > > [1] The idea is also described in Hadoop wiki: > > http://wiki.apache.org/hadoop/HadoopResearchProjects > > [2] Dremel paper is available at: > > http://research.google.com/pubs/pub36632.html > > [3] The specification of each nodes is as follows: > > CPU Core(TM)2 Duo CPU E7400 2.80GHz x 2 > > Memory 8 GB > > Network 1 GbE -- OZAWA Tsuyoshi
