Re: [PR] MAPREDUCE-7470: multi-thread mapreduce committer [hadoop]

via GitHub Thu, 01 Feb 2024 02:33:30 -0800


steveloughran commented on PR #6469:
URL: https://github.com/apache/hadoop/pull/6469#issuecomment-1921008573


   Like I said on the jira, I don't want this. It has the same scale issues 
encountered on abfs as #[6399](https://github.com/apache/hadoop/pull/6399) and 
#[6378](https://github.com/apache/hadoop/pull/6378), the same correctness 
problems on GCS as v2, as in "incorrect task commit semantics" unless v1 commit 
can made to not rely on atomic directory rename, but instead "atomic file 
rename", which does work there.
   
   * which cloud store have you tested this against? Does it actually have the 
semantics of rename for v1 task commit?
   * what was the depth/width of the directory structure?
   * did you try a terasort?
   * did you try multiple jobs through spark at the same time? as there memory 
is a problem: #5728 
   
   Even if the store meets the v1 correctness pre-requisites I would like to 
see a comparison of the same job you have tested through the manifest 
committer. Ideally with any profiling to highlight where it could be improved.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] MAPREDUCE-7470: multi-thread mapreduce committer [hadoop]

Reply via email to