[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

Alejandro Abdelnur (JIRA) Mon, 24 Mar 2014 12:29:20 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945576#comment-13945576
 ]


Alejandro Abdelnur commented on HDFS-6134:
------------------------------------------

(Cross-posting HADOOP-10150 & HDFS-6134]

[[email protected]], I’ve just looked at the MAR/21 proposal in HADOOP-10150 
(the patches uploaded on MAR/21 do not apply on trunk cleanly, so I cannot look 
at them easily. It seems to have missing pieces, like getXAttrs() and wiring to 
KeyProvider API. Would be possible to rebased them so they apply to trunk?)

bq. do we need a new proposal for the work already being done on HADOOP-10150?

HADOOP-10150 aims to provide encryption for any filesystem implementation as a 
decorator filesystem. While HDFS-6134 aims to provide encryption for HDFS. 

The 2 approaches differ on the level of transparency you get. The comparison 
table in the "HDFS Data at Rest Encryption" attachment 
(https://issues.apache.org/jira/secure/attachment/12635964/HDFSDataAtRestEncryption.pdf)
 highlights the differences.

Particularly, the things I’m concerned the most with HADOOP-10150 are:

* All clients (doing encryption/decryption) must have access the key management 
service.
* Secure key propagation to tasks running in the cluster (i.e. mapper and 
reducer tasks)
* Use of AES-CTR (instead of an authenticated encryption mode such as AES-GCM)
* Not clear how hflush()

bq. are there design choices in this proposal that are superior to the patch 
already provided on HADOOP-10150?

IMO, a consolidated access/distribution of keys by the NN (as opposed to every 
client) improves the security of the system.

bq. do you have additional requirement listed in this JIRA that could be 
incorporated in to HADOOP-10150, 

They are enumerated in the "HDFS Data at Rest Encryption" attachment. The ones 
I don’t see them address in HADOOP-10150 are: #6, #8.A. And it is not clear how 
 #4 & #5 can be achieved.

bq. so we can collaborate and not duplicate?

Definitely, I want to work together with you guys to leverage as much as 
posible. Either by unifying the 2 proposal or by sharing common code if we 
think both approaches have merits and we decide to move forward with both.

Happy to jump on a call to discuss things and the report back to the community 
if you think that will speed up the discussion.

----------
By looking at the latest design doc of HADOOP-10150 I can see that things have 
been modified a bit (from the original design doc) bringing it a bit closer to 
some of the HDFS-6134 requirements.

Still, it is not clear how transparency will be achieved for existing 
applications: HDFS URI changes, clients must connect to the Key store to 
retrieve the encryption key (clients will need key store principals). The 
encryption key must be propagated to jobs tasks (i.e. Mapper/Reducer processes)

Requirement #4 "Can decorate HDFS and all other file systems in Hadoop, and 
will not modify existing structure of file system, such as namenode and 
datanode structure if the wrapped file system is HDFS." This is contradicted by 
the design, in the "Storage of IV and data key" is stated  "So we implement 
extended information based on INode feature, and use it to store data key and 
IV. "

Requirement #5 "Admin can configure encryption policies, such as which 
directory will be encrypted.", this seems driven by HDFS client configuration 
file (hdfs-site.xml). This is not really admin driven as clients could break 
this by configuring their hdfs-site.xml file)

Restrictions of move operations for files within an encrypted directory. The 
original design had something about it (not entirely correct), now is gone.

(Mentioned before), how thing flush() operations will be handled as the 
encryption block will be cut short? How this is handled on writes? How this is 
handled on reads?

Explicit auditing on encrypted files access does not seem handled.



> Transparent data at rest encryption
> -----------------------------------
>
>                 Key: HDFS-6134
>                 URL: https://issues.apache.org/jira/browse/HDFS-6134
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: HDFSDataAtRestEncryption.pdf
>
>
> Because of privacy and security regulations, for many industries, sensitive 
> data at rest must be in encrypted form. For example: the healthcare industry 
> (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
> US government (FISMA regulations).
> This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
> be used transparently by any application accessing HDFS via Hadoop Filesystem 
> Java API, Hadoop libhdfs C library, or WebHDFS REST API.
> The resulting implementation should be able to be used in compliance with 
> different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

Reply via email to