[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433401#comment-13433401
 ] 

Benoy Antony commented on MAPREDUCE-4491:
-----------------------------------------

One of the goals of this feature is to achieve encryption of files in transit 
and at rest(when stored on disk). One way to achieve this goal is to depend on 
a software/hardware which allows encryption in the local file system plus rely 
on HDFS-3637  and MR shuffle encryption.

This jira  explores an alternative approach to the problem without depending on 
s special software to do local file system encryption. 

The key advantages of this approach over the local file system encryption 
approach are

1)  A file can be decrypted only if the user provides the correct key. So even 
if someone managed to read the file, he cannot read its contents without key. 
So user's possession of the key is required in addition to his read permission. 
So there are two levels of protection. 

There could be cases where a user accidentally set "read" permissions for 
everyone. There could be cases where a superuser reads the file. But  this 
scheme protects the data.

2) No dependency on local file system encryption software.  This approach 
allows encryption without such special setup.

3) A file is decrypted/encrypted only during processing and not when it is 
read.  So this results in a less number of encryption/decryption.


Other key points will be :

1) Encrypted and plain text files can coexist in a normal file system. 

2) Developers can plugin other encryption algorithms/standards - CMS, AES, 
custom encryption and thus have more flexibility.

3) Allows transporting keys/password/tokens  from JobClient to tasks for use 
cases other than encryption like connecting to a webservice . MAPREDUCE-4491 
adds keyProtection and encryption uses it.

4) Can manage keys in one central location. JobClient  gets on behalf of user 
like any other application. 

If we look at these two approaches from a higher level, we can see that one 
local file system approach is an internal approach to encryption and 
MAPREDUCE-4491 approach is an external approach. These two choices are 
available in normal (non-distributed) application development also where 
developers can rely on the file system to provide encryption or do encryption 
themselves. There are tradeoffs and flexibilities in the both the approaches 
and we choose it based on our use cases and needs.  So I believe , we should 
provide  these two alternatives  in Hadoop.

In addition, this feature allows key protection in general, which can be used 
for purposes other than encryption. The keys also will be encrypted when stored 
on disk and decrypted only in memory.

                
> Encryption and Key Protection
> -----------------------------
>
>                 Key: MAPREDUCE-4491
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: documentation, security, task-controller, tasktracker
>            Reporter: Benoy Antony
>            Assignee: Benoy Antony
>         Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted 
> wherever it is stored. Common use case is to pull encrypted data out of a 
> datasource and store in HDFS for analysis. The keys are stored in an external 
> keystore. 
> The feature adds a customizable framework to integrate different types of 
> keystores, support for Java KeyStore, read keys from keystores, and transport 
> keys from JobClient to Tasks.
> The feature adds PGP encryption as a codec and additional utilities to 
> perform encryption related steps.
> The design document is attached. It explains the requirement, design and use 
> cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial 
> work for further refinement.
> Update: The patches are uploaded to subtasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to