[ 
https://issues.apache.org/jira/browse/HIVE-18341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-18341:
---------------------------
    Attachment: HIVE-18341.1.patch

[~thejas] I have included the changes as provided in the distcp page for 
"/.reserved/raw", however it looks like distcp copy fails with 
"checksum-mistmatch" exception. this shouldnt have happened since the two 
different zones are using the same keys, output from logs :
{code}
Check-sum mismatch between 
hdfs://localhost:53536/.reserved/raw/warehouse0/targetandsourcehavesameencryptionzonekeys_1514449998552.db/encrypted_table/000000_0_copy_1
 and 
hdfs://localhost:53536/.reserved/raw/warehouse1/replicated_targetandsourcehavesameencryptionzonekeys_1514449998552.db/encrypted_table/.hive-staging_hive_2017-12-28_00-33-30_893_6165151359381350374-1/-ext-10001/.distcp.tmp.attempt_local327098851_0003_m_000000_0
{code}
The test case is "targetAndSourceHaveSameEncryptionZoneKeys".

Additionally i have also included changes to do the regular file copies ( when 
either just 1 file or if file size is small ) to be done under *doAs* using the 
user configuration provided for distcp ("hive.distcp.privileged.doAs").  

One thing to note is since for regular file copies we use the fileSystem copy, 
even for TDE deployments with same keys we wont be able to leverage the 
optimization that distcp does, this will be of particular interest for ACID 
table replications where we will mostly transfer 1 delta file per table with in 
a transaction.

> Add repl load support for adding "raw" namespace for TDE with same encryption 
> keys
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-18341
>                 URL: https://issues.apache.org/jira/browse/HIVE-18341
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: anishek
>            Assignee: anishek
>             Fix For: 3.0.0
>
>         Attachments: HIVE-18341.0.patch, HIVE-18341.1.patch
>
>
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html#Running_as_the_superuser
> "a new virtual path prefix, /.reserved/raw/, that gives superusers direct 
> access to the underlying block data in the filesystem. This allows superusers 
> to distcp data without needing having access to encryption keys, and also 
> avoids the overhead of decrypting and re-encrypting data."
> We need to introduce a new option in "Repl Load" command that will change the 
> files being copied in distcp to have this "/.reserved/raw/" namespace before 
> the file paths.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to