[jira] [Commented] (MAPREDUCE-7470) multi-thread mapreduce committer

ASF GitHub Bot (Jira) Thu, 01 Feb 2024 02:34:04 -0800


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813160#comment-17813160
 ]


ASF GitHub Bot commented on MAPREDUCE-7470:
-------------------------------------------

steveloughran commented on PR #6469:
URL: https://github.com/apache/hadoop/pull/6469#issuecomment-1921008573

   Like I said on the jira, I don't want this. It has the same scale issues 
encountered on abfs as #[6399](https://github.com/apache/hadoop/pull/6399) and 
#[6378](https://github.com/apache/hadoop/pull/6378), the same correctness 
problems on GCS as v2, as in "incorrect task commit semantics" unless v1 commit 
can made to not rely on atomic directory rename, but instead "atomic file 
rename", which does work there.
   
   * which cloud store have you tested this against? Does it actually have the 
semantics of rename for v1 task commit?
   * what was the depth/width of the directory structure?
   * did you try a terasort?
   * did you try multiple jobs through spark at the same time? as there memory 
is a problem: #5728 
   
   Even if the store meets the v1 correctness pre-requisites I would like to 
see a comparison of the same job you have tested through the manifest 
committer. Ideally with any profiling to highlight where it could be improved.
   
   
   




> multi-thread mapreduce committer
> --------------------------------
>
>                 Key: MAPREDUCE-7470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7470
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>            Reporter: TianyiMa
>            Priority: Major
>              Labels: mapreduce, pull-request-available
>         Attachments: MAPREDUCE-7470.0.patch
>
>
> In cloud environment, such as aws, aliyun etc., the internet delay is 
> non-trival when we commit thounds of files.
> In our situation, the ping delay is about 0.03ms in IDC, but when move to 
> Coud, the ping delay is about 3ms, which is roughly 100x slower. We found 
> that, committing tens thounds of files will cost a few tens of minutes. The 
> more files there are, the logger it takes.
> So we propose a new committer algorithm, which is a variant of committer 
> algorithm version 1, called 3. In this new algorithm 3, in order to decrease 
> the committer time, we use a thread pool to commit job's final output.
> Our test result in Cloud production shows that, the new algorithm 3 has 
> decrease the committer time by serveral tens of times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (MAPREDUCE-7470) multi-thread mapreduce committer

Reply via email to