[ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451050#comment-13451050 ]
Benoy Antony commented on MAPREDUCE-4491: ----------------------------------------- Key Protection is simple to explain. JobClient retrieves keys from a configured Keystore ,encrypts the keys along with jobId using cluster public key , submits the encrypted blob as part of the job credentials. TaskTrackers decrypts the encrypted blob using cluster private key during job localization, verifies that jobId inside the encrypted blob matches the JobId of the task. During Task Launch, the keys are made available to the child (task) process as an environment variable. Since the JobId is part of the encrypted blob, the replay attack is prevented with the JobId verification. It is easy to add integrity protection also. Now, the scheme was designed to be used in a secure cluster. It is good to explore whether it can be used in a non-secure cluster. One issue was with the cluster private key. It should be made accessible only to TaskTracker process. If the access is determined by the user's permissions, then tasks should be run as a different user. But it need not be the job owner. It can be a fixed user. I believe , you are bringing up another issue in this regard. If a rogue task can make a TT launch another rogue task with a jobId matching the one inside encrypted blob, then the keys area available to the newly launched rogue task. That's a good point. Basically the rogue task is acting as a JT/AppMaster. I am not sure whether that is possible. Even if its possible, there should be ways to detect it. > Encryption and Key Protection > ----------------------------- > > Key: MAPREDUCE-4491 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: documentation, security, task-controller, tasktracker > Reporter: Benoy Antony > Assignee: Benoy Antony > Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf > > > When dealing with sensitive data, it is required to keep the data encrypted > wherever it is stored. Common use case is to pull encrypted data out of a > datasource and store in HDFS for analysis. The keys are stored in an external > keystore. > The feature adds a customizable framework to integrate different types of > keystores, support for Java KeyStore, read keys from keystores, and transport > keys from JobClient to Tasks. > The feature adds PGP encryption as a codec and additional utilities to > perform encryption related steps. > The design document is attached. It explains the requirement, design and use > cases. > Kindly review and comment. Collaboration is very much welcome. > I have a tested patch for this for 1.1 and will upload it soon as an initial > work for further refinement. > Update: The patches are uploaded to subtasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira