[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002704#comment-14002704
 ] 

Todd Lipcon commented on HDFS-6134:
-----------------------------------

I think two of Owen's questions may not be addressed in the docs. I'll do my 
best to answer them here:

bq. For release in the Hadoop 2.x line, you need to preserve both forward and 
backwards wire compatibility. How do you plan to address that?

For data which has been marked encrypted, we obviously can't provide 
backwards-compatibility. I think the most sane behavior is probably that, if an 
old client tries to access encrypted data, they should receive the ciphertext 
instead of the decrypted plaintext. Another option might be to return an error. 
Either would be achievable by having the new client provide some flag in the 
OP_READ_BLOCK request which indicates "I am reading encrypted data and I am 
aware of it." If the new server sees that a client is reading encrypted data 
and does _not_ have that flag, it could respond appropriately with either of 
the above two options.

A new client accessing an old cluster should not be problematic, as we would 
only add new fields to RPCs. The NN RPCs to set up encryption zones, etc, would 
fail with the usual "not implemented" type exceptions (same as any other new 
feature).

bq. It seems that the additional datanode and client complexity is prohibitive. 
Making changes to the HDFS write and read pipeline is extremely touchy.

I think prohibitive is a strong word. Adding new features may add complexity, 
but per the design docs that Alejandro pointed to, we think the advantages are 
worth it. There are several experienced HDFS developers working on this branch 
(alongside with the newer folks) so you can be sure we understand the areas of 
code being worked on and the associated risks. Having done much of the work 
required to support the checksum type changeover in Hadoop 2, I feel it's 
pretty likely the complexity of encryption is actually less than that project.


> Transparent data at rest encryption
> -----------------------------------
>
>                 Key: HDFS-6134
>                 URL: https://issues.apache.org/jira/browse/HDFS-6134
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.3.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: HDFSDataAtRestEncryption.pdf
>
>
> Because of privacy and security regulations, for many industries, sensitive 
> data at rest must be in encrypted form. For example: the health­care industry 
> (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
> US government (FISMA regulations).
> This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
> be used transparently by any application accessing HDFS via Hadoop Filesystem 
> Java API, Hadoop libhdfs C library, or WebHDFS REST API.
> The resulting implementation should be able to be used in compliance with 
> different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to