[jira] [Commented] (MAPREDUCE-7267) During commitJob, enable merge paths with multi threads

Steve Loughran (Jira) Thu, 26 Mar 2020 13:44:22 -0700


    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068028#comment-17068028
 ]


Steve Loughran commented on MAPREDUCE-7267:
-------------------------------------------


I am scared of anything which goes near the existing code. It's complicated: 
two different co-recursive algorithms with little in the way of documentation. 

The only change I've considered would be to a move to FileContext for rename/3, 
or we finish off the "make rename/3 public" Work so that Filesystem gets a 
version of rename which throws exceptions rather than returns "false" for 
little things like the source not existing.

So: we have a plugin point for committers, ones Spark & Parquet can pick up. 
Could you add a high-performance rename commit algorithm that way. And 
HADOOP-11452 is for the rename work

BTW, Igors work had GCS in mind, which is O(1) per file but O(files) per 
directory. Dir rename is not atomic. So something done there would benefit 
everyone without needing some brand-new algorithm. It just needs to be done 
safely. 

> During commitJob, enable merge paths with multi threads
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-7267
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7267
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>            Reporter: feiwang
>            Priority: Major
>         Attachments: MAPREDUCE-7267.000.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7267) During commitJob, enable merge paths with multi threads

Reply via email to