[ 
https://issues.apache.org/jira/browse/HDFS-5143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758596#comment-13758596
 ] 

Yi Liu commented on HDFS-5143:
------------------------------

Steve, Thanks for your comments. 
 
>>> Is there going to be a difference between the listable length of a file 
>>> (FileSystem.listStatus(), and the user-code visible length of a file

The user will see no difference between these two in our design choice, and 
they will be the same length as original file. 

As you know, for most encryption modes of various encryption algorithms, the 
length of cipher text is different from the length of original plain text.  But 
in our design, the length of cipher text is the same length as plain text, more 
importantly, the bytes have 1:1 correspondence . 

To make the encryption more secure, we use different IV(Initialization Vector) 
in encryption algorithm, and IV is fixed size of 16bytes. We store the IV at 
the header of encrypted file, so Length of encrypted file = Length of original 
file + 16 bytes. However, we will implement listStatus/getFileStatus and other 
related interfaces of FileSystem in CFS to ensure the length returned is always 
the original length of the file.

The key point is that length of encrypted file equals length of plain text file 
+ 16bytes, the bytes have 1:1 correspondence, and our design allows a random 
access property during decryption. So we can easily get the length of plain 
text file and easily handle other operations of file system.
Actually, if we put “encryption” flag and IV in namenode, then length of 
encrypted file equals to length of plain text file. That will be great for 
HDFS, but many people may not like the idea of modification to namenode inodes 
and code. Furthermore, CFS can decorate other file system besides HDFS, so we 
are proposing not to modify structure of namenode.

>>> Is it that the cfs:// view is consistent across all file stat operations, 
>>> seek() etc.?

Right, it’s consistent. They are regard to plain text file, since upper layer 
applications should be unaware of encryption which is transparent. 
 
Furthermore, for du, df and other related commands of file system, since Length 
of encrypted file = Length of original file + 16bytes, “du” will count the 
plain text file size, and it’s consistent with the file size listed in “ls”, 
but “df” e.g. will count the encrypted file size.
 
>>> I’m curious about how this interacts with quotas.

This is a good question. HDFS Quotas includes Name Quotas and Space Quotas. We 
just need to discuss Space Quotas, as described above, length of encrypted file 
equals length of plain text file + 16 bytes, so the required space of encrypted 
directory is a bit larger than unencrypted directory, but I don’t think this 
affects usage, when copying a file from unencrypted directory to an encrypted 
one, if space quotas is not enough and the copying directory contains encrypted 
file, we will prompt with a message like “The directory contains encrypted 
file, since 16 additional bytes are required per encrypted file, the space 
quota for the target directory is insufficient”.
 
>>> Are all operations that are atomic today, e.g. renaming one directory under 
>>> another going to remain atomic?

It depends.  If renaming one directory under another, and both the source and 
target are unencrypted directory, then the operations are still atomic. 
However, we do not intend to allow renaming an unencrypted directory to 
encrypted one, instead, user should create the encrypted directory first and 
then copy files to it.
                
> Hadoop cryptographic file system
> --------------------------------
>
>                 Key: HDFS-5143
>                 URL: https://issues.apache.org/jira/browse/HDFS-5143
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>              Labels: rhino
>             Fix For: 3.0.0
>
>         Attachments: HADOOP cryptographic file system.pdf
>
>
> There is an increasing need for securing data when Hadoop customers use 
> various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so 
> on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based 
> on HADOOP “FilterFileSystem” decorating DFS or other file systems, and 
> transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1.    Transparent to and no modification required for upper layer 
> applications.
> 2.    “Seek”, “PositionedReadable” are supported for input stream of CFS if 
> the wrapped file system supports them.
> 3.    Very high performance for encryption and decryption, they will not 
> become bottleneck.
> 4.    Can decorate HDFS and all other file systems in Hadoop, and will not 
> modify existing structure of file system, such as namenode and datanode 
> structure if the wrapped file system is HDFS.
> 5.    Admin can configure encryption policies, such as which directory will 
> be encrypted.
> 6.    A robust key management framework.
> 7.    Support Pread and append operations if the wrapped file system supports 
> them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to