[ https://issues.apache.org/jira/browse/MAPREDUCE-7267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068028#comment-17068028 ]
Steve Loughran commented on MAPREDUCE-7267: ------------------------------------------- I am scared of anything which goes near the existing code. It's complicated: two different co-recursive algorithms with little in the way of documentation. The only change I've considered would be to a move to FileContext for rename/3, or we finish off the "make rename/3 public" Work so that Filesystem gets a version of rename which throws exceptions rather than returns "false" for little things like the source not existing. So: we have a plugin point for committers, ones Spark & Parquet can pick up. Could you add a high-performance rename commit algorithm that way. And HADOOP-11452 is for the rename work BTW, Igors work had GCS in mind, which is O(1) per file but O(files) per directory. Dir rename is not atomic. So something done there would benefit everyone without needing some brand-new algorithm. It just needs to be done safely. > During commitJob, enable merge paths with multi threads > ------------------------------------------------------- > > Key: MAPREDUCE-7267 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7267 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client > Reporter: feiwang > Priority: Major > Attachments: MAPREDUCE-7267.000.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org