[ 
https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123051#comment-16123051
 ] 

Sankar Hariappan edited comment on HIVE-17289 at 8/11/17 1:59 PM:
------------------------------------------------------------------

Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the 
user config hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to 
hive.distcp.privileged.doAs if lazy copy is true and null if false. This is 
just to avoid passing this argument from multiple flows and also, the 
incremental REPL LOAD shares common code with IMPORT.
- Enabled distcp for copy within same file systems in case of large number of 
files or large size files.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the 
CopyUtils implementation which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files 
read and actual data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common 
for dump/load.
- No tests added as the existing tests itself will cover the changes except 
distcp flow (due to hive.in.test) which needs to be tested manually.

Request [~thejas]/[~daijy] to please review it!


was (Author: sankarh):
Added 01.patch with below changes.
- Used CopyUtils to copy files from ReplCopyTask (IMPORT/REPL LOAD)
- Set distcp doAs input as null in case of EXPORT and IMPORT flow. Will use the 
user config hive.distcp.privileged.doAs in case of REPL LOAD.
- Assumed lazy copy is set only for REPL LOAD and hence set doAs user input to 
hive.distcp.privileged.doAs if lazy copy is true and null if false. This is 
just to avoid passing this argument from multiple flows and also, the 
incremental REPL LOAD shares common code with IMPORT.
- Enabled distcp for copy within same file systems in case of large number of 
files or large size files.
- Removed redundant code in ReplCopyTask/ReplCopyWork as it re-uses the 
CopyUtils implementation which does the same.
- Refactored ReplCopyTask.execute to properly distinguish code path for _files 
read and actual data files.
- Set the default value of hive.distcp.privileged.doAs to "hive".
- Moved CopyUtils from parse.repl.dump.io to parse.repl package as it is common 
for dump/load.
- No tests added as the existing tests itself will cover the changes except 
distcp flow (due to hive.in.test) which needs to be tested manually.

> EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
> ---------------------------------------------------------------------
>
>                 Key: HIVE-17289
>                 URL: https://issues.apache.org/jira/browse/HIVE-17289
>             Project: Hive
>          Issue Type: Sub-task
>          Components: HiveServer2, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>              Labels: DR, Export, Import, replication
>             Fix For: 3.0.0
>
>         Attachments: HIVE-17289.01.patch
>
>
> Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT 
> uses distcp to copy the larger files/large number of files from dump 
> directory to table staging directory. But, this copy fails as distcp is 
> always done with doAs user specified in hive.distcp.privileged.doAs, which is 
> "hdfs' by default.
> Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow.
> Privileged user based distcp should be done only for REPL DUMP/LOAD commands.
> Also, need to set the default config for hive.distcp.privileged.doAs to 
> "hive" as "hdfs" super-user is never allowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to