[
https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966140#comment-13966140
]
Uma Maheswara Rao G edited comment on HADOOP-10150 at 4/11/14 6:03 AM:
-----------------------------------------------------------------------
Todd, thanks for your comments.
{quote}A few questions here...
First, let me confirm my understanding of the key structure and storage:
Client master key: this lives on the Key Management Server, and might be
different from application to application. {quote}
Yes.
{quote}In many cases there may be just one per cluster, though in a multitenant
cluster, perhaps we could have one per tenant.{quote}
It depends on the KeyProvider implementation, these kinds of details can be
encapsulated into the KeyProvider implementation which could be pluggable in
CFS. Thus, the user can use their own strategy to deploy one master key or
multiple master key, by application or by user-group etc.
{quote}Data key: this is set per encrypted directory. This key is stored in the
directory xattr on the NN, but encrypted by the client master key (which the NN
doesn't know).{quote}
Yes.
{quote}So, when a client wants to read a file, the following is the process:
1) Notices that the file is in an encrypted directory. Fetches the encrypted
data key from the NN's xattr on the directory.
2) Somehow associates this encrypted data key with the master key that was
used to encrypt it (perhaps it's tagged with some identifier). > Fetches the
appropriate master key from the key store.
2a) The keystore somehow authenticates and authorizes the client's access to
this key
3) The client decrypts the data key using the master key, and is now able to
set up a decrypting stream for the file itself. (I've ignored the IV here, but
assume it's also stored in an xattr) {quote}
Yes.
{quote}In terms of attack vectors:
let's say that the NN disk is stolen. The thief now has access to a bunch of
keys, but they're all encrypted by various master keys. So we're OK.{quote}
Yes.
{quote}let's say that a client is malicious. It can get whichever master keys
it has access to from the KMS. If we only have one master key per cluster, then
the combination of one malicious client plus stealing the fsimage will give up
all the keys{quote}
When a client get access to master key and fsimage, there is nothing we can do
to protected those data. The separation of data encryption key and master key
is for master key rotation. So that one does not need to decrypt all data file
then encrypt it with new encryption key again.
{quote}let's say that a client has escalated to root access on one of the slave
nodes in the cluster, or otherwise has malicious access to a NodeManager
process. By looking at a running MR task, it could steal whatever credentials
the task is using to access the KMS, and/or dump the memory of the client
process in order to give up the master key above.{quote}
When a client has root access, all information can be dumped from any process,
right? I remember Nicholas asked the similar question on HDFS-6134. If a client
has escalated to root access on slave nodes, how can we assume the namenode,
standby namenode/secondary namenode are secure in the same cluster? On the
other hand, as long as data keys remain in encrypted form in the process memory
of the NameNode and DataNodes, and they don't have access to the wrapping keys,
then there is no attack vector there.
{quote}How does the MR task in this context get the credentials to fetch keys
from the KMS? If the KMS accepts the same authentication tokens as the
NameNode, then is there any reason that this is more secure than having the
NameNode supply the keys? Or is it just that decoupling the NameNode and the
key server allows this approach to work for non-HDFS filesystems, at the
expense of an additional daemon running a key distribution service?{quote}
It is a good question. Securely distributing the secrets as you mentioned among
the cluster nodes will always be a hard problem to solve. Without adequate
hardware support, it could possibly be a weak point during operations like
unwrapping key. We want to leave options to KeyProvider implementation to
decouple the key protection mechanism and data encryption mechanism, and to
make above two work on top of any filesystem. It is possible to have a
KeyProvider implementation which use NN as KMS as we already discussed, and
leave room for other parties to plug their own solution?
was (Author: hitliuyi):
Todd, thanks for your comments.
{quote}A few questions here...
First, let me confirm my understanding of the key structure and storage:
Client master key: this lives on the Key Management Server, and might be
different from application to application. {quote}
Yes.
{quote}In many cases there may be just one per cluster, though in a multitenant
cluster, perhaps we could have one per tenant.{quote}
It depends on the KeyProvider implementation, these kinds of details can be
encapsulated into the KeyProvider implementation which could be pluggable in
CFS. Thus, customer can use their own strategy to deploy one master key or
multiple master key, by application or by user-group etc.
{quote}Data key: this is set per encrypted directory. This key is stored in the
directory xattr on the NN, but encrypted by the client master key (which the NN
doesn't know).{quote}
Yes.
{quote}So, when a client wants to read a file, the following is the process:
1) Notices that the file is in an encrypted directory. Fetches the encrypted
data key from the NN's xattr on the directory.
2) Somehow associates this encrypted data key with the master key that was
used to encrypt it (perhaps it's tagged with some identifier). > Fetches the
appropriate master key from the key store.
2a) The keystore somehow authenticates and authorizes the client's access to
this key
3) The client decrypts the data key using the master key, and is now able to
set up a decrypting stream for the file itself. (I've ignored the IV here, but
assume it's also stored in an xattr) {quote}
Yes.
{quote}In terms of attack vectors:
let's say that the NN disk is stolen. The thief now has access to a bunch of
keys, but they're all encrypted by various master keys. So we're OK.{quote}
Yes.
{quote}let's say that a client is malicious. It can get whichever master keys
it has access to from the KMS. If we only have one master key per cluster, then
the combination of one malicious client plus stealing the fsimage will give up
all the keys{quote}
When a client get access to master key and fsimage, there is nothing we can do
to protected those data. The separation of data encryption key and master key
is for master key rotation. So that one does not need to decrypt all data file
then encrypt it with new encryption key again.
{quote}let's say that a client has escalated to root access on one of the slave
nodes in the cluster, or otherwise has malicious access to a NodeManager
process. By looking at a running MR task, it could steal whatever credentials
the task is using to access the KMS, and/or dump the memory of the client
process in order to give up the master key above.{quote}
When a client has root access, all information can be dumped from any process,
right? I remember Nicholas asked the similar question on HDFS-6134. If a client
has escalated to root access on slave nodes, how can we assume the namenode,
standby namenode/secondary namenode are secure in the same cluster? On the
other hand, as long as data keys remain in encrypted form in the process memory
of the NameNode and DataNodes, and they don't have access to the wrapping keys,
then there is no attack vector there.
{quote}How does the MR task in this context get the credentials to fetch keys
from the KMS? If the KMS accepts the same authentication tokens as the
NameNode, then is there any reason that this is more secure than having the
NameNode supply the keys? Or is it just that decoupling the NameNode and the
key server allows this approach to work for non-HDFS filesystems, at the
expense of an additional daemon running a key distribution service?{quote}
It is a good question. Securely distributing the secrets as you mentioned among
the cluster nodes will always be a hard problem to solve. Without adequate
hardware support, it could possibly be a weak point during operations like
unwrapping key. We want to leave options to KeyProvider implementation to
decouple the key protection mechanism and data encryption mechanism, and to
make above two work on top of any filesystem. It is possible to have a
KeyProvider implementation which use NN as KMS as we already discussed, and
leave room for other parties to plug their own solution?
> Hadoop cryptographic file system
> --------------------------------
>
> Key: HADOOP-10150
> URL: https://issues.apache.org/jira/browse/HADOOP-10150
> Project: Hadoop Common
> Issue Type: New Feature
> Components: security
> Affects Versions: 3.0.0
> Reporter: Yi Liu
> Assignee: Yi Liu
> Labels: rhino
> Fix For: 3.0.0
>
> Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file
> system-V2.docx, HADOOP cryptographic file system.pdf, cfs.patch, extended
> information based on INode feature.patch
>
>
> There is an increasing need for securing data when Hadoop customers use
> various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so
> on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based
> on HADOOP “FilterFileSystem” decorating DFS or other file systems, and
> transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1. Transparent to and no modification required for upper layer
> applications.
> 2. “Seek”, “PositionedReadable” are supported for input stream of CFS if
> the wrapped file system supports them.
> 3. Very high performance for encryption and decryption, they will not
> become bottleneck.
> 4. Can decorate HDFS and all other file systems in Hadoop, and will not
> modify existing structure of file system, such as namenode and datanode
> structure if the wrapped file system is HDFS.
> 5. Admin can configure encryption policies, such as which directory will
> be encrypted.
> 6. A robust key management framework.
> 7. Support Pread and append operations if the wrapped file system supports
> them.
--
This message was sent by Atlassian JIRA
(v6.2#6252)