[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045344#comment-14045344
 ] 

Owen O'Malley commented on HDFS-6134:
-------------------------------------

In the discussion today, we covered lots of ground. Todd proposed that 
Alejandro add a virtual ".raw" directory to the top level of each encryption 
zone. This would allow processes that want access to read or write the data 
within the encryption zone an access path that doesn't require modifying the 
FileSystem API. With that change, I'm -0 to adding encryption in to HDFS. I 
still think that our users would be far better served by adding 
encryption/compression layers above HDFS rather than baking them into HDFS, but 
I'm not going to block the work. By adding the work directly into HDFS, 
Alejandro and the others working on this are signing up for a high level of QA 
at scale before this is committed.

A couple of other points came up:
* symbolic links in conjunction with cryptofs would allow users to use hdfs 
urls to access encrypted hdfs files.
* there must be an hdfs admin command to list the crypto zones to support 
auditing
* There are significant scalability concerns about each tasks requesting 
decryption of each file key. In particular, if a job has 100,000 tasks and each 
opens 1000 files, that is 100 million key requests. The current design is 
unlikely to scale correctly.
* the kms needs its own delegation tokens and hooks so that yarn will renew and 
cancel them.
* there are three levels of key rolling:
** leaving old data alone and writing new data with the new key
** re-writing the data with the new key 
** re-encoding the per file key (personally this seems pointless)

> Transparent data at rest encryption
> -----------------------------------
>
>                 Key: HDFS-6134
>                 URL: https://issues.apache.org/jira/browse/HDFS-6134
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
> HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf
>
>
> Because of privacy and security regulations, for many industries, sensitive 
> data at rest must be in encrypted form. For example: the health­care industry 
> (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
> US government (FISMA regulations).
> This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
> be used transparently by any application accessing HDFS via Hadoop Filesystem 
> Java API, Hadoop libhdfs C library, or WebHDFS REST API.
> The resulting implementation should be able to be used in compliance with 
> different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to