[ https://issues.apache.org/jira/browse/HIVE-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ning Zhang updated HIVE-1852: ----------------------------- Attachment: HIVE-1852.2.patch @Joydeep, replaceFiles() is called by loadTable with the "replace" flag turned on, which mean it should overwrite the destination directory. Also tmpPath is a temporary path that should not exist before this call. Later in the function tmpPath is rename again to the destination path (where the existing files in destf will be removed). The non-overwriting version is implemented in copyFiles(). So I think we don't need another function. I also added a test case to test the load data (overwrite or not) works as expected to the new patch (no code changes from the first one). > Reduce unnecessary DFSClient.rename() calls > ------------------------------------------- > > Key: HIVE-1852 > URL: https://issues.apache.org/jira/browse/HIVE-1852 > Project: Hive > Issue Type: Improvement > Reporter: Ning Zhang > Assignee: Ning Zhang > Attachments: HIVE-1852.2.patch, HIVE-1852.patch > > > In Hive client side (MoveTask etc), DFSCleint.rename() is called for every > file inside a directory. It is very expensive for a large directory in a busy > DFS namenode. We should replace it with a single rename() call on the whole > directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.