[
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945576#comment-13945576
]
Alejandro Abdelnur commented on HDFS-6134:
------------------------------------------
(Cross-posting HADOOP-10150 & HDFS-6134]
[[email protected]], I’ve just looked at the MAR/21 proposal in HADOOP-10150
(the patches uploaded on MAR/21 do not apply on trunk cleanly, so I cannot look
at them easily. It seems to have missing pieces, like getXAttrs() and wiring to
KeyProvider API. Would be possible to rebased them so they apply to trunk?)
bq. do we need a new proposal for the work already being done on HADOOP-10150?
HADOOP-10150 aims to provide encryption for any filesystem implementation as a
decorator filesystem. While HDFS-6134 aims to provide encryption for HDFS.
The 2 approaches differ on the level of transparency you get. The comparison
table in the "HDFS Data at Rest Encryption" attachment
(https://issues.apache.org/jira/secure/attachment/12635964/HDFSDataAtRestEncryption.pdf)
highlights the differences.
Particularly, the things I’m concerned the most with HADOOP-10150 are:
* All clients (doing encryption/decryption) must have access the key management
service.
* Secure key propagation to tasks running in the cluster (i.e. mapper and
reducer tasks)
* Use of AES-CTR (instead of an authenticated encryption mode such as AES-GCM)
* Not clear how hflush()
bq. are there design choices in this proposal that are superior to the patch
already provided on HADOOP-10150?
IMO, a consolidated access/distribution of keys by the NN (as opposed to every
client) improves the security of the system.
bq. do you have additional requirement listed in this JIRA that could be
incorporated in to HADOOP-10150,
They are enumerated in the "HDFS Data at Rest Encryption" attachment. The ones
I don’t see them address in HADOOP-10150 are: #6, #8.A. And it is not clear how
#4 & #5 can be achieved.
bq. so we can collaborate and not duplicate?
Definitely, I want to work together with you guys to leverage as much as
posible. Either by unifying the 2 proposal or by sharing common code if we
think both approaches have merits and we decide to move forward with both.
Happy to jump on a call to discuss things and the report back to the community
if you think that will speed up the discussion.
----------
By looking at the latest design doc of HADOOP-10150 I can see that things have
been modified a bit (from the original design doc) bringing it a bit closer to
some of the HDFS-6134 requirements.
Still, it is not clear how transparency will be achieved for existing
applications: HDFS URI changes, clients must connect to the Key store to
retrieve the encryption key (clients will need key store principals). The
encryption key must be propagated to jobs tasks (i.e. Mapper/Reducer processes)
Requirement #4 "Can decorate HDFS and all other file systems in Hadoop, and
will not modify existing structure of file system, such as namenode and
datanode structure if the wrapped file system is HDFS." This is contradicted by
the design, in the "Storage of IV and data key" is stated "So we implement
extended information based on INode feature, and use it to store data key and
IV. "
Requirement #5 "Admin can configure encryption policies, such as which
directory will be encrypted.", this seems driven by HDFS client configuration
file (hdfs-site.xml). This is not really admin driven as clients could break
this by configuring their hdfs-site.xml file)
Restrictions of move operations for files within an encrypted directory. The
original design had something about it (not entirely correct), now is gone.
(Mentioned before), how thing flush() operations will be handled as the
encryption block will be cut short? How this is handled on writes? How this is
handled on reads?
Explicit auditing on encrypted files access does not seem handled.
> Transparent data at rest encryption
> -----------------------------------
>
> Key: HDFS-6134
> URL: https://issues.apache.org/jira/browse/HDFS-6134
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: security
> Affects Versions: 2.3.0
> Reporter: Alejandro Abdelnur
> Assignee: Alejandro Abdelnur
> Attachments: HDFSDataAtRestEncryption.pdf
>
>
> Because of privacy and security regulations, for many industries, sensitive
> data at rest must be in encrypted form. For example: the healthcare industry
> (HIPAA regulations), the card payment industry (PCI DSS regulations) or the
> US government (FISMA regulations).
> This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can
> be used transparently by any application accessing HDFS via Hadoop Filesystem
> Java API, Hadoop libhdfs C library, or WebHDFS REST API.
> The resulting implementation should be able to be used in compliance with
> different regulation requirements.
--
This message was sent by Atlassian JIRA
(v6.2#6252)