[ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433401#comment-13433401 ]
Benoy Antony commented on MAPREDUCE-4491: ----------------------------------------- One of the goals of this feature is to achieve encryption of files in transit and at rest(when stored on disk). One way to achieve this goal is to depend on a software/hardware which allows encryption in the local file system plus rely on HDFS-3637 and MR shuffle encryption. This jira explores an alternative approach to the problem without depending on s special software to do local file system encryption. The key advantages of this approach over the local file system encryption approach are 1) A file can be decrypted only if the user provides the correct key. So even if someone managed to read the file, he cannot read its contents without key. So user's possession of the key is required in addition to his read permission. So there are two levels of protection. There could be cases where a user accidentally set "read" permissions for everyone. There could be cases where a superuser reads the file. But this scheme protects the data. 2) No dependency on local file system encryption software. This approach allows encryption without such special setup. 3) A file is decrypted/encrypted only during processing and not when it is read. So this results in a less number of encryption/decryption. Other key points will be : 1) Encrypted and plain text files can coexist in a normal file system. 2) Developers can plugin other encryption algorithms/standards - CMS, AES, custom encryption and thus have more flexibility. 3) Allows transporting keys/password/tokens from JobClient to tasks for use cases other than encryption like connecting to a webservice . MAPREDUCE-4491 adds keyProtection and encryption uses it. 4) Can manage keys in one central location. JobClient gets on behalf of user like any other application. If we look at these two approaches from a higher level, we can see that one local file system approach is an internal approach to encryption and MAPREDUCE-4491 approach is an external approach. These two choices are available in normal (non-distributed) application development also where developers can rely on the file system to provide encryption or do encryption themselves. There are tradeoffs and flexibilities in the both the approaches and we choose it based on our use cases and needs. So I believe , we should provide these two alternatives in Hadoop. In addition, this feature allows key protection in general, which can be used for purposes other than encryption. The keys also will be encrypted when stored on disk and decrypted only in memory. > Encryption and Key Protection > ----------------------------- > > Key: MAPREDUCE-4491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: documentation, security, task-controller, tasktracker > Reporter: Benoy Antony > Assignee: Benoy Antony > Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf > > > When dealing with sensitive data, it is required to keep the data encrypted > wherever it is stored. Common use case is to pull encrypted data out of a > datasource and store in HDFS for analysis. The keys are stored in an external > keystore. > The feature adds a customizable framework to integrate different types of > keystores, support for Java KeyStore, read keys from keystores, and transport > keys from JobClient to Tasks. > The feature adds PGP encryption as a codec and additional utilities to > perform encryption related steps. > The design document is attached. It explains the requirement, design and use > cases. > Kindly review and comment. Collaboration is very much welcome. > I have a tested patch for this for 1.1 and will upload it soon as an initial > work for further refinement. > Update: The patches are uploaded to subtasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira