[jira] [Comment Edited] (HADOOP-10150) Hadoop cryptographic file system

Uma Maheswara Rao G (JIRA) Thu, 10 Apr 2014 23:05:45 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966140#comment-13966140
 ]


Uma Maheswara Rao G edited comment on HADOOP-10150 at 4/11/14 6:03 AM:
-----------------------------------------------------------------------

Todd, thanks for your comments.

{quote}A few questions here...
First, let me confirm my understanding of the key structure and storage:
Client master key: this lives on the Key Management Server, and might be 
different from application to application. {quote} 
Yes.

{quote}In many cases there may be just one per cluster, though in a multitenant 
cluster, perhaps we could have one per tenant.{quote} 
It depends on the KeyProvider implementation, these kinds of details can be 
encapsulated into the KeyProvider implementation which could be pluggable in 
CFS. Thus, the user can use their own strategy to deploy one master key or 
multiple master key, by application or by user-group etc.


{quote}Data key: this is set per encrypted directory. This key is stored in the 
directory xattr on the NN, but encrypted by the client master key (which the NN 
doesn't know).{quote} 
Yes.

{quote}So, when a client wants to read a file, the following is the process:
  1) Notices that the file is in an encrypted directory. Fetches the encrypted 
data key from the NN's xattr on the directory.
  2) Somehow associates this encrypted data key with the master key that was 
used to encrypt it (perhaps it's tagged with some identifier). > Fetches the 
appropriate master key from the key store.
  2a) The keystore somehow authenticates and authorizes the client's access to 
this key
  3) The client decrypts the data key using the master key, and is now able to 
set up a decrypting stream for the file itself. (I've ignored the IV here, but 
assume it's also stored in an xattr) {quote} 
Yes.


{quote}In terms of attack vectors:
  let's say that the NN disk is stolen. The thief now has access to a bunch of 
keys, but they're all encrypted by various master keys. So we're OK.{quote} 
Yes.

{quote}let's say that a client is malicious. It can get whichever master keys 
it has access to from the KMS. If we only have one master key per cluster, then 
the combination of one malicious client plus stealing the fsimage will give up 
all the keys{quote} 
When a client get access to master key and fsimage, there is nothing we can do 
to protected those data. The separation of data encryption key and master key 
is for master key rotation. So that one does not need to decrypt all data file 
then encrypt it with new encryption key again. 

{quote}let's say that a client has escalated to root access on one of the slave 
nodes in the cluster, or otherwise has malicious access to a NodeManager 
process. By looking at a running MR task, it could steal whatever credentials 
the task is using to access the KMS, and/or dump the memory of the client 
process in order to give up the master key above.{quote} 
When a client has root access, all information can be dumped from any process, 
right? I remember Nicholas asked the similar question on HDFS-6134. If a client 
has escalated to root access on slave nodes, how can we assume the namenode, 
standby namenode/secondary namenode are secure in the same cluster? On the 
other hand, as long as data keys remain in encrypted form in the process memory 
of the NameNode and DataNodes, and they don't have access to the wrapping keys, 
then there is no attack vector there. 

{quote}How does the MR task in this context get the credentials to fetch keys 
from the KMS? If the KMS accepts the same authentication tokens as the 
NameNode, then is there any reason that this is more secure than having the 
NameNode supply the keys? Or is it just that decoupling the NameNode and the 
key server allows this approach to work for non-HDFS filesystems, at the 
expense of an additional daemon running a key distribution service?{quote}
It is a good question. Securely distributing the secrets as you mentioned among 
the cluster nodes will always be a hard problem to solve. Without adequate 
hardware support, it could possibly be a weak point during operations like 
unwrapping key. We want to leave options to KeyProvider implementation to 
decouple the key protection mechanism and data encryption mechanism, and to 
make above two work on top of any filesystem. It is possible to have a 
KeyProvider implementation which use NN as KMS as we already discussed, and 
leave room for other parties to plug their own solution?


was (Author: hitliuyi):
Todd, thanks for your comments.

{quote}A few questions here...
First, let me confirm my understanding of the key structure and storage:
Client master key: this lives on the Key Management Server, and might be 
different from application to application. {quote} 
Yes.

{quote}In many cases there may be just one per cluster, though in a multitenant 
cluster, perhaps we could have one per tenant.{quote} 
It depends on the KeyProvider implementation, these kinds of details can be 
encapsulated into the KeyProvider implementation which could be pluggable in 
CFS. Thus, customer can use their own strategy to deploy one master key or 
multiple master key, by application or by user-group etc.


{quote}Data key: this is set per encrypted directory. This key is stored in the 
directory xattr on the NN, but encrypted by the client master key (which the NN 
doesn't know).{quote} 
Yes.

{quote}So, when a client wants to read a file, the following is the process:
  1) Notices that the file is in an encrypted directory. Fetches the encrypted 
data key from the NN's xattr on the directory.
  2) Somehow associates this encrypted data key with the master key that was 
used to encrypt it (perhaps it's tagged with some identifier). > Fetches the 
appropriate master key from the key store.
  2a) The keystore somehow authenticates and authorizes the client's access to 
this key
  3) The client decrypts the data key using the master key, and is now able to 
set up a decrypting stream for the file itself. (I've ignored the IV here, but 
assume it's also stored in an xattr) {quote} 
Yes.


{quote}In terms of attack vectors:
  let's say that the NN disk is stolen. The thief now has access to a bunch of 
keys, but they're all encrypted by various master keys. So we're OK.{quote} 
Yes.

{quote}let's say that a client is malicious. It can get whichever master keys 
it has access to from the KMS. If we only have one master key per cluster, then 
the combination of one malicious client plus stealing the fsimage will give up 
all the keys{quote} 
When a client get access to master key and fsimage, there is nothing we can do 
to protected those data. The separation of data encryption key and master key 
is for master key rotation. So that one does not need to decrypt all data file 
then encrypt it with new encryption key again. 

{quote}let's say that a client has escalated to root access on one of the slave 
nodes in the cluster, or otherwise has malicious access to a NodeManager 
process. By looking at a running MR task, it could steal whatever credentials 
the task is using to access the KMS, and/or dump the memory of the client 
process in order to give up the master key above.{quote} 
When a client has root access, all information can be dumped from any process, 
right? I remember Nicholas asked the similar question on HDFS-6134. If a client 
has escalated to root access on slave nodes, how can we assume the namenode, 
standby namenode/secondary namenode are secure in the same cluster? On the 
other hand, as long as data keys remain in encrypted form in the process memory 
of the NameNode and DataNodes, and they don't have access to the wrapping keys, 
then there is no attack vector there. 

{quote}How does the MR task in this context get the credentials to fetch keys 
from the KMS? If the KMS accepts the same authentication tokens as the 
NameNode, then is there any reason that this is more secure than having the 
NameNode supply the keys? Or is it just that decoupling the NameNode and the 
key server allows this approach to work for non-HDFS filesystems, at the 
expense of an additional daemon running a key distribution service?{quote}
It is a good question. Securely distributing the secrets as you mentioned among 
the cluster nodes will always be a hard problem to solve. Without adequate 
hardware support, it could possibly be a weak point during operations like 
unwrapping key. We want to leave options to KeyProvider implementation to 
decouple the key protection mechanism and data encryption mechanism, and to 
make above two work on top of any filesystem. It is possible to have a 
KeyProvider implementation which use NN as KMS as we already discussed, and 
leave room for other parties to plug their own solution?

> Hadoop cryptographic file system
> --------------------------------
>
>                 Key: HADOOP-10150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10150
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>              Labels: rhino
>             Fix For: 3.0.0
>
>         Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file 
> system-V2.docx, HADOOP cryptographic file system.pdf, cfs.patch, extended 
> information based on INode feature.patch
>
>
> There is an increasing need for securing data when Hadoop customers use 
> various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so 
> on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based 
> on HADOOP “FilterFileSystem” decorating DFS or other file systems, and 
> transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1.    Transparent to and no modification required for upper layer 
> applications.
> 2.    “Seek”, “PositionedReadable” are supported for input stream of CFS if 
> the wrapped file system supports them.
> 3.    Very high performance for encryption and decryption, they will not 
> become bottleneck.
> 4.    Can decorate HDFS and all other file systems in Hadoop, and will not 
> modify existing structure of file system, such as namenode and datanode 
> structure if the wrapped file system is HDFS.
> 5.    Admin can configure encryption policies, such as which directory will 
> be encrypted.
> 6.    A robust key management framework.
> 7.    Support Pread and append operations if the wrapped file system supports 
> them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (HADOOP-10150) Hadoop cryptographic file system

Reply via email to