Harsh J created HIVE-13704:
------------------------------
Summary: Don't call DistCp.execute() instead of DistCp.run()
Key: HIVE-13704
URL: https://issues.apache.org/jira/browse/HIVE-13704
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 2.0.0, 1.3.0
Reporter: Harsh J
Priority: Critical
HIVE-11607 switched DistCp from using {{run}} to {{execute}}. The {{run}}
method runs added logic that drives the state of {{SimpleCopyListing}} which
runs in the driver, and of {{CopyCommitter}} which runs in the job runtime.
When Hive ends up running DistCp for copy work (Between non matching FS or
between encrypted/non-encrypted zones, for sizes above a configured value) this
state not being set causes wrong paths to appear on the target (subdirs named
after the file, instead of just the file).
Hive should call DistCp's Tool {{run}} method and not the {{execute}} method
directly, to not skip the target exists flag that the {{setTargetPathExists}}
call would set:
https://github.com/apache/hadoop/blob/release-2.7.1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java#L108-L126
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)