[ 
https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13794816#comment-13794816
 ] 

Jerry Chen commented on HIVE-5207:
----------------------------------

{quote}This patch won't compile, because Hive has to work when used with Hadoop 
1.x. The shims are used to support multiple versions of Hadoop (Hadoop 0.20, 
Hadoop 1.x, Hadoop 0.23, Hadoop 2.x) depending on what is install on the host 
system.{quote}
This patch depends on the crypto feature added by HADOOP-9331 and others. For 
this patch to compile with the Hadoop versions that does included crypto 
feature, we need to add flag –Dcrypto=true to disable this feature. I do 
understand that this approach still doesn’t align with the target of support 
multiple versions for a single compile of Hive. 
{quote}Furthermore, this seems likes the wrong direction. What is the advantage 
of this rather large patch over using the cfs work? If the user defines a table 
in cfs all of the table's data will be encrypted.{quote}
I agree that cfs work has its value on transparency on API and it is a good 
stuff. We are working on CFS and it is currently not available yet. And the 
work here is already in use by our users who are using this to protect 
sensitive data on their clusters while being able to transparently decrypt the 
data while running jobs that process this encrypted data. 

On the other hand, we see that compression codec is already widely used for 
various file formats used by Hive. The issue here may be the current approach 
depends on the changes to specific file formats for handling encryption key 
contexts. One possible direction is to make the encryption codec strictly the 
same as compression codec so that Hive can utilize a codec doing encryption or 
decryption without any changes to file formats and Hive. If we can do that, it 
adds the same value as compression codec.  


> Support data encryption for Hive tables
> ---------------------------------------
>
>                 Key: HIVE-5207
>                 URL: https://issues.apache.org/jira/browse/HIVE-5207
>             Project: Hive
>          Issue Type: New Feature
>    Affects Versions: 0.12.0
>            Reporter: Jerry Chen
>              Labels: Rhino
>         Attachments: HIVE-5207.patch, HIVE-5207.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> For sensitive and legally protected data such as personal information, it is 
> a common practice that the data is stored encrypted in the file system. To 
> enable Hive with the ability to store and query the encrypted data is very 
> crucial for Hive data analysis in enterprise. 
>  
> When creating table, user can specify whether a table is an encrypted table 
> or not by specify a property in TBLPROPERTIES. Once an encrypted table is 
> created, query on the encrypted table is transparent as long as the 
> corresponding key management facilities are set in the running environment of 
> query. We can use hadoop crypto provided by HADOOP-9331 for underlying data 
> encryption and decryption. 
>  
> As to key management, we would support several common key management use 
> cases. First, the table key (data key) can be stored in the Hive metastore 
> associated with the table in properties. The table key can be explicit 
> specified or auto generated and will be encrypted with a master key. There 
> are cases that the data being processed is generated by other applications, 
> we need to support externally managed or imported table keys. Also, the data 
> generated by Hive may be consumed by other applications in the system. We 
> need to a tool or command for exporting the table key to a java keystore for 
> using externally.
>  
> To handle versions of Hadoop that do not have crypto support, we can avoid 
> compilation problems by segregating crypto API usage into separate files 
> (shims) to be included only if a flag is defined on the Ant command line 
> (something like –Dcrypto=true).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to