steveloughran commented on PR #6469: URL: https://github.com/apache/hadoop/pull/6469#issuecomment-1921008573
Like I said on the jira, I don't want this. It has the same scale issues encountered on abfs as #[6399](https://github.com/apache/hadoop/pull/6399) and #[6378](https://github.com/apache/hadoop/pull/6378), the same correctness problems on GCS as v2, as in "incorrect task commit semantics" unless v1 commit can made to not rely on atomic directory rename, but instead "atomic file rename", which does work there. * which cloud store have you tested this against? Does it actually have the semantics of rename for v1 task commit? * what was the depth/width of the directory structure? * did you try a terasort? * did you try multiple jobs through spark at the same time? as there memory is a problem: #5728 Even if the store meets the v1 correctness pre-requisites I would like to see a comparison of the same job you have tested through the manifest committer. Ideally with any profiling to highlight where it could be improved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
