[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757433#comment-13757433
 ] 

Larry McCay commented on HIVE-5207:
-----------------------------------

This seems to be a duplicate of HIVE-4227. I am actually in the process of 
working on that functionality and plan to leverage HADOOP-9331 as appropriate. 
We will need to rationalize these Jiras. Maybe you calling out the difference 
between the Jiras as the entire table being encrypted here rather than the 
individual columns in 4227? I think that if we need both levels of granularity 
that they need to be based on the same solution.

The key management aspect is one that we will need to sync on. The patch in 
HADOOP-9534 (CMF) is being refactored in order to support our API needs for 
acquiring keys for Hive encryption and presumably for CryptoFS. Generally 
speaking, the nonce/iv, alias and version indicator will be stored within the 
colstore in Hive for decryption. That is the current thinking anyway.

Support for multiple key revisions per alias will allow for rotation and 
rolling of keys within the datastores.

CMF will provide pluggability for talking to key management/data protection 
providers: initially a JCEKS keystore and eventually a central key 
management/data protection service for Hadoop. The central service will also 
provide pluggability for integrating third party providers/solutions.

TableProperties is one way to indicate the need for data protection - we are 
looking at others as well - but of course I am currently looking at column 
level indicators too.

Let's figure out how to combine or consolidate these Jiras so that we can 
hopefully get a coherent set of patches to collaborate with in a branch.
                
> Support data encryption for Hive tables
> ---------------------------------------
>
>                 Key: HIVE-5207
>                 URL: https://issues.apache.org/jira/browse/HIVE-5207
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.12.0
>            Reporter: Jerry Chen
>              Labels: Rhino
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> For sensitive and legally protected data such as personal information, it is 
> a common practice that the data is stored encrypted in the file system. To 
> enable Hive with the ability to store and query the encrypted data is very 
> crucial for Hive data analysis in enterprise. 
>  
> When creating table, user can specify whether a table is an encrypted table 
> or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
> created, query on the encrypted table is transparent as long as the 
> corresponding key management facilities are set in the running environment of 
> query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
> encryption and decryption. 
>  
> As to key management, we would support several common key management use 
> cases. First, the table key (data key) can be stored in the Hive metastore 
> associated with the table in properties. The table key can be explicit 
> specified or auto generated and will be encrypted with a master key. There 
> are cases that the data being processed is generated by other applications, 
> we need to support externally managed or imported table keys. Also, the data 
> generated by Hive may be consumed by other applications in the system. We 
> need to a tool or command for exporting the table key to a java keystore for 
> using externally.
>  
> To handle versions of Hadoop that do not have crypto support, we can avoid 
> compilation problems by segregating crypto API usage into separate files 
> (shims) to be included only if a flag is defined on the Ant command line 
> (something like –Dcrypto=true).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to