[ 
https://issues.apache.org/jira/browse/HIVE-16918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-16918:
------------------------------------
    Description: 
With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. 
This, however, is incorrect for copying _metadata generated from a temporary 
scratch directory to hdfs. We need to change that so that routes to using a 
regular CopyTask. The issue with using distcp for this is that distcp launches 
from another job which may be queued on another machine, which does not have 
access to this file:// uri. Distcp should only ever be used when copying from 
non-localfilesystems.

Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a 
default for invocations of distcp from hive. Adding that in. This would not be 
necessary if HADOOP-8143 had made it in, but till it doesn't go in, we need it.

  was:
With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. 
This, however, is incorrect for copying _metadata generated from a temporary 
scratch directory to hdfs. We need to change that so that routes to using a 
regular CopyTask.

Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a 
default for invocations of distcp from hive. Adding that in. This would not be 
necessary if HADOOP-8143 had made it in, but till it doesn't go in, we need it.


> Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp
> --------------------------------------------------------------------------
>
>                 Key: HIVE-16918
>                 URL: https://issues.apache.org/jira/browse/HIVE-16918
>             Project: Hive
>          Issue Type: Bug
>          Components: repl
>    Affects Versions: 3.0.0
>            Reporter: Sushanth Sowmyan
>            Assignee: Sushanth Sowmyan
>         Attachments: HIVE-16918.patch
>
>
> With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. 
> This, however, is incorrect for copying _metadata generated from a temporary 
> scratch directory to hdfs. We need to change that so that routes to using a 
> regular CopyTask. The issue with using distcp for this is that distcp 
> launches from another job which may be queued on another machine, which does 
> not have access to this file:// uri. Distcp should only ever be used when 
> copying from non-localfilesystems.
> Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a 
> default for invocations of distcp from hive. Adding that in. This would not 
> be necessary if HADOOP-8143 had made it in, but till it doesn't go in, we 
> need it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to