[
https://issues.apache.org/jira/browse/HIVE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Dere updated HIVE-17963:
------------------------------
Resolution: Fixed
Fix Version/s: 3.0.0
Status: Resolved (was: Patch Available)
Committed to master
> Fix for HIVE-17113 can be improved for non-blobstore filesystems
> ----------------------------------------------------------------
>
> Key: HIVE-17963
> URL: https://issues.apache.org/jira/browse/HIVE-17963
> Project: Hive
> Issue Type: Bug
> Reporter: Jason Dere
> Assignee: Jason Dere
> Fix For: 3.0.0
>
> Attachments: HIVE-17963.1.patch, HIVE-17963.2.patch
>
>
> HIVE-17113/HIVE-17813 fix the duplicate file issue by performing file moves
> on a file-by-file basis. For non-blobstore filesystems this results in many
> more filesystem/namenode operations compared to the previous
> Utilities.mvFileToFinalPath() behavior (dedup files in src dir, rename src
> dir to final dir).
> For non-blobstore filesystems, a better solution would be the one described
> [here|https://issues.apache.org/jira/browse/HIVE-17113?focusedCommentId=16100564&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16100564]:
> 1) Move the temp directory to a new directory name, to prevent additional
> files from being added by any runaway processes.
> 2) Run removeTempOrDuplicateFiles() on this renamed temp directory
> 3) Run renameOrMoveFiles() to move the renamed temp directory to the final
> location.
> This results in only one additional file operation in non-blobstore FSes
> compared to the original Utilities.mvFileToFinalPath() behavior.
> The proposal is to do away with the config setting
> hive.exec.move.files.from.source.dir and always have behavior that should
> take care of the duplicate file issue described in HIVE-17113. For
> non-blobstore filesystems we will do steps 1-3 described above. For blobstore
> filesystems we will do the solution done in HIVE-17113/HIVE-17813 which does
> the file-by-file copy - this should have the same number of file operations
> as doing a rename directory on blobstore, which effectively results in file
> moves on a file-by-file basis.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)