[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002704#comment-14002704 ]
Todd Lipcon commented on HDFS-6134: ----------------------------------- I think two of Owen's questions may not be addressed in the docs. I'll do my best to answer them here: bq. For release in the Hadoop 2.x line, you need to preserve both forward and backwards wire compatibility. How do you plan to address that? For data which has been marked encrypted, we obviously can't provide backwards-compatibility. I think the most sane behavior is probably that, if an old client tries to access encrypted data, they should receive the ciphertext instead of the decrypted plaintext. Another option might be to return an error. Either would be achievable by having the new client provide some flag in the OP_READ_BLOCK request which indicates "I am reading encrypted data and I am aware of it." If the new server sees that a client is reading encrypted data and does _not_ have that flag, it could respond appropriately with either of the above two options. A new client accessing an old cluster should not be problematic, as we would only add new fields to RPCs. The NN RPCs to set up encryption zones, etc, would fail with the usual "not implemented" type exceptions (same as any other new feature). bq. It seems that the additional datanode and client complexity is prohibitive. Making changes to the HDFS write and read pipeline is extremely touchy. I think prohibitive is a strong word. Adding new features may add complexity, but per the design docs that Alejandro pointed to, we think the advantages are worth it. There are several experienced HDFS developers working on this branch (alongside with the newer folks) so you can be sure we understand the areas of code being worked on and the associated risks. Having done much of the work required to support the checksum type changeover in Hadoop 2, I feel it's pretty likely the complexity of encryption is actually less than that project. > Transparent data at rest encryption > ----------------------------------- > > Key: HDFS-6134 > URL: https://issues.apache.org/jira/browse/HDFS-6134 > Project: Hadoop HDFS > Issue Type: New Feature > Components: security > Affects Versions: 2.3.0 > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > Attachments: HDFSDataAtRestEncryption.pdf > > > Because of privacy and security regulations, for many industries, sensitive > data at rest must be in encrypted form. For example: the healthÂcare industry > (HIPAA regulations), the card payment industry (PCI DSS regulations) or the > US government (FISMA regulations). > This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can > be used transparently by any application accessing HDFS via Hadoop Filesystem > Java API, Hadoop libhdfs C library, or WebHDFS REST API. > The resulting implementation should be able to be used in compliance with > different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)