[ https://issues.apache.org/jira/browse/HIVE-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971913#action_12971913 ]
Ning Zhang commented on HIVE-1852: ---------------------------------- The directory structures should not be changed by the patch (I tested with a simple insert overwrite and dynamic partition inserts on production and verified the results). The full unit tests are still running. The rationale is that the old code actually go over each file (item_i) in src and rename it to tmpPath/item_i. The new code simply rename src to tmpPath. About the name collision, the tmpPath need to be created first because it's parent may not exists (in case of multiple levels of dynamic partitions), in which case rename will fail. If the directory tmpPath is created first and then rename src to tmpPath, it won't fail and in reality it move all subdirectories/files in src to tmpPath. There is no documentation about the collision semantics but it is not guaranteed I can change the code. > Reduce unnecessary DFSClient.rename() calls > ------------------------------------------- > > Key: HIVE-1852 > URL: https://issues.apache.org/jira/browse/HIVE-1852 > Project: Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: HIVE-1852.patch > > > In Hive client side (MoveTask etc), DFSCleint.rename() is called for every > file inside a directory. It is very expensive for a large directory in a busy > DFS namenode. We should replace it with a single rename() call on the whole > directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.