[
https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987015#comment-13987015
]
Owen O'Malley commented on HADOOP-10150:
----------------------------------------
I've been working through this. We have two metadata items that we need for
each file:
* the key name and version
* the iv
Note that the current patches only store the iv, but we really need to store
the key name and version. The version is absolutely critical because if you
roll a new key version you don't want to re-write all of the current data.
It seems to me there are three reasonable places to store the small amount of
metadata:
* at the beginning of the file
* in a side file
* encoded using a filename mangling scheme
The beginning of the file creates trouble because it throws off the block
calculations that are done by mapreduce. (In other words, if we slide all of
the data down by 1k, then each input split will always cross HDFS block
boundaries.) On the other hand, it doesn't add any load to the namenode and
will always be consistent with the file.
A side file doesn't change the offsets into the file, but does double the
amount of traffic and storage required on the namenode.
Doing name mangling means the underlying HDFS file names are more complicated,
but it doesn't mess with either the file offsets or increase the load on the
namenode.
I think we should do the name mangling. What do others think?
> Hadoop cryptographic file system
> --------------------------------
>
> Key: HADOOP-10150
> URL: https://issues.apache.org/jira/browse/HADOOP-10150
> Project: Hadoop Common
> Issue Type: New Feature
> Components: security
> Affects Versions: 3.0.0
> Reporter: Yi Liu
> Assignee: Yi Liu
> Labels: rhino
> Fix For: 3.0.0
>
> Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file
> system-V2.docx, HADOOP cryptographic file system.pdf,
> HDFSDataAtRestEncryptionAlternatives.pdf,
> HDFSDataatRestEncryptionAttackVectors.pdf,
> HDFSDataatRestEncryptionProposal.pdf, cfs.patch, extended information based
> on INode feature.patch
>
>
> There is an increasing need for securing data when Hadoop customers use
> various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so
> on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based
> on HADOOP “FilterFileSystem” decorating DFS or other file systems, and
> transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1. Transparent to and no modification required for upper layer
> applications.
> 2. “Seek”, “PositionedReadable” are supported for input stream of CFS if
> the wrapped file system supports them.
> 3. Very high performance for encryption and decryption, they will not
> become bottleneck.
> 4. Can decorate HDFS and all other file systems in Hadoop, and will not
> modify existing structure of file system, such as namenode and datanode
> structure if the wrapped file system is HDFS.
> 5. Admin can configure encryption policies, such as which directory will
> be encrypted.
> 6. A robust key management framework.
> 7. Support Pread and append operations if the wrapped file system supports
> them.
--
This message was sent by Atlassian JIRA
(v6.2#6252)