[ https://issues.apache.org/jira/browse/MAPREDUCE-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17808974#comment-17808974 ]
ASF GitHub Bot commented on MAPREDUCE-7470: ------------------------------------------- slfan1989 commented on PR #6469: URL: https://github.com/apache/hadoop/pull/6469#issuecomment-1902458149 @lastbus Thanks for the contribution! we need to fix the checkstyle issue. > multi-thread mapreduce committer > -------------------------------- > > Key: MAPREDUCE-7470 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7470 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 > Reporter: TianyiMa > Priority: Major > Labels: mapreduce, pull-request-available > Attachments: MAPREDUCE-7470.0.patch > > > In cloud environment, such as aws, aliyun etc., the internet delay is > non-trival when we commit thounds of files. > In our situation, the ping delay is about 0.03ms in IDC, but when move to > Coud, the ping delay is about 3ms, which is roughly 100x slower. We found > that, committing tens thounds of files will cost a few tens of minutes. The > more files there are, the logger it takes. > So we propose a new committer algorithm, which is a variant of committer > algorithm version 1, called 3. In this new algorithm 3, in order to decrease > the committer time, we use a thread pool to commit job's final output. > Our test result in Cloud production shows that, the new algorithm 3 has > decrease the committer time by serveral tens of times. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org