TianyiMa created MAPREDUCE-7470:
-----------------------------------

             Summary: hadoop MR multi-thread committer
                 Key: MAPREDUCE-7470
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7470
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2
            Reporter: TianyiMa


In cloud environment, such as aws, aliyun etc., the internet delay is 
non-trival when we commit thounds of files.

In our situation, the ping delay is about 0.03ms in IDC, but when move to Coud, 
the ping delay is about 3ms, which is roughly 100x slower. We found that, 
committing tens thounds of files will cost a few tens of minutes. The more 
files there are, the logger it takes.

So we propose a new committer algorithm, which is a variant of committer 
algorithm version 1, called 3. In this new algorithm 3, in order to decrease 
the committer time, we use a thread pool to commit job's final output.

Our test result in Cloud production shows that, the new algorithm 3 has 
decrease the committer time by serveral tens of times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

Reply via email to