[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036044#comment-14036044 ]
Alejandro Abdelnur commented on HDFS-6134: ------------------------------------------ [~yoderme] cornered me and brought up the point that given that we are using AES-CTR, we have to be extremely careful on not repeating IVs given an encryption key. Then he followed on explaining how we could run into the that scenario with the current implementation we are working on: * 1. All files in an encryption zone using the same keyVersion material share the same encryption key. * 2. All files in #1 have different IVs * 3. In AES-CTR, the 8 lower bytes of the IV are treated as a counter that is incremented every AES block (16 bytes). * 4. #3 ensures an IV is not repeated throughout the file (the biggest file, Long.MAX bytes, consumes 1/16 of the IV counter domain). * 5. IVs are public, and predictable based on the initial IV and the file offset. * 6. Because of #5, a possible attack would be to scan #1 files for IVs where the 8 higher bytes match. Then, fast-forward them to a common counter point (assuming files are long enough), then you’ll have more than one cypher-text using the same encryption key and the same IV. The chances of this are 1/2^64, but in cryptographic terms this is considered a high chance. A known solution to address this is: * A. Each file should use a unique data encryption key (DEK). * B. The unique DEK is encrypted with the EZ keyVersion and stored as one of the file xAttributes. * C. The unique DEK is generated by the KeyProvider and encrypted before leaving the KeyProvider. The NN never sees the DEK decrypted. * D. The NN gives the HDFS client the encrypted DEK and the keyVersion ID. * E. The HDFS client sends the encrypted DEK and the keyVersion ID to the KeyProvider and gets (if authorized to use the keyVersion) the decrypted DEK for the file. * F. The HDFS client uses the DEK to encrypt/decrypt the file. This solution requires the KeyProvider to have 2 new methods: * {{KeyVersion generateEncryptedKey(String keyVersionName, byte[] iv)}} * {{KeyVersion decryptEncryptedKey(String keyVersionName, byte[] iv, KeyVersion encryptedKey)}} Since the IV would be the file IV, then we don't have to store a new IV just for this. The implementation would do a known transformation on the IV (i.e.: xor with 0xff the original IV). The key materials (EZ key materials) to encrypt the encryption keys for files never leave the KeyProvider. They are not known to HDFS clients. This means that a compromised encryption key only compromises a file, not all the files in an EZ using the same key version. Because of this, a side effect of this change is a more secure solution. > Transparent data at rest encryption > ----------------------------------- > > Key: HDFS-6134 > URL: https://issues.apache.org/jira/browse/HDFS-6134 > Project: Hadoop HDFS > Issue Type: New Feature > Components: security > Affects Versions: 2.3.0 > Reporter: Alejandro Abdelnur > Assignee: Alejandro Abdelnur > Attachments: HDFSDataAtRestEncryption.pdf > > > Because of privacy and security regulations, for many industries, sensitive > data at rest must be in encrypted form. For example: the healthcare industry > (HIPAA regulations), the card payment industry (PCI DSS regulations) or the > US government (FISMA regulations). > This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can > be used transparently by any application accessing HDFS via Hadoop Filesystem > Java API, Hadoop libhdfs C library, or WebHDFS REST API. > The resulting implementation should be able to be used in compliance with > different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)