[ https://issues.apache.org/jira/browse/HDFS-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758603#comment-13758603 ]
Yi Liu commented on HDFS-5143: ------------------------------ Thanks Dilli, your comments and questions are very good: >>> Wait, you still have to progagate the encryption key into mapper/reducer >>> task to let them read the file from file system. Right? We don’t need to propagate the encryption key into mapper/reducer task. When upper layer applications use CFS interfaces to write data, CFS will get key from Key management service which will authenticate the user firstly. So mapper/reducer task will be unaware of encryption, procedure of getting encryption key is done in CFS. >>> How is the client supposed to choose plain HDFS protocol versus CFS? In >>> other words, how would the client detect whether the file is encrypted? It’s in the configuration. Admin configures whether file/directory is encrypted in configuration file. CFS will choose plain HDFS protocol if a file/directory is not configured to be encrypted. >>> Would this play nicely with hadoop command line: “hadoop fs –cat File1”, >>> “hadoop fs –cat File2” Yes, that would play nicely. The plain text content will be shown if the user has right to access the encrypted data, otherwise cipher text content will be shown if the user has not right to access the encrypted data. >>> I am wondering whether we should consider adding metadata to filesystem >>> namespace… This is very good, actually we have discussed this carefully internally before. As I have replied to Steve, if we put “encryption” flag, IV in namenode, we don’t need to store key name(alias) in namenode since we can get the key name through the file name, that will be great for HDFS, but many people may not like the idea of modification to namenode inodes and code. Furthermore, CFS can decorate other file system besides HDFS, so we are proposing not to modify structure of namenode. In additional, we can wait and see other comments about whether to do some modification in namenode, in our design, there is no modification required for namenode, but if many people support this, we can add it too. > Hadoop cryptographic file system > -------------------------------- > > Key: HDFS-5143 > URL: https://issues.apache.org/jira/browse/HDFS-5143 > Project: Hadoop HDFS > Issue Type: New Feature > Components: security > Affects Versions: 3.0.0 > Reporter: Yi Liu > Labels: rhino > Fix For: 3.0.0 > > Attachments: HADOOP cryptographic file system.pdf > > > There is an increasing need for securing data when Hadoop customers use > various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so > on. > HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based > on HADOOP “FilterFileSystem” decorating DFS or other file systems, and > transparent to upper layer applications. It’s configurable, scalable and fast. > High level requirements: > 1. Transparent to and no modification required for upper layer > applications. > 2. “Seek”, “PositionedReadable” are supported for input stream of CFS if > the wrapped file system supports them. > 3. Very high performance for encryption and decryption, they will not > become bottleneck. > 4. Can decorate HDFS and all other file systems in Hadoop, and will not > modify existing structure of file system, such as namenode and datanode > structure if the wrapped file system is HDFS. > 5. Admin can configure encryption policies, such as which directory will > be encrypted. > 6. A robust key management framework. > 7. Support Pread and append operations if the wrapped file system supports > them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira