Xiaoyu Yao created HDFS-8747:
--------------------------------

             Summary: Provide Better "Scratch Space" and "Soft Delete" Support 
for HDFS Encryption Zones
                 Key: HDFS-8747
                 URL: https://issues.apache.org/jira/browse/HDFS-8747
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: encryption
    Affects Versions: 2.6.0
            Reporter: Xiaoyu Yao
            Assignee: Xiaoyu Yao


HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to allow 
create encryption zone on top of a single HDFS directory. Files under the root 
directory of the encryption zone will be encrypted/decrypted transparently upon 
HDFS client write or read operations. 

Generally, it does not support rename(without data copying) across encryption 
zones or between encryption zone and non-encryption zone because different 
security settings of encryption zones. However, there are certain use cases 
where efficient rename support is desired. This JIRA is to propose better 
support of two such use case “Scratch Space” (a.k.a. staging area) and “Soft 
Delete” (a.k.a. trash) with HDFS encryption zones.

“Scratch Space” is widely used in Hadoop jobs, which requires efficient rename 
support. Temporary files from MR jobs are usually stored in staging area 
outside encryption zone such as “/tmp” directory and then rename to targeted 
directories as specified once the data is ready to be further processed. 

Below is a summary of supported/unsupported cases from latest Hadoop:

* Rename within the encryption zone is supported
* Rename the entire encryption zone by moving the root directory of the zone  
is allowed.
* Rename sub-directory/file from encryption zone to non-encryption zone is not 
allowed.
* Rename sub-directory/file from encryption zone A to encryption zone B is not 
allowed.
* Rename from non-encryption zone to encryption zone is not allowed.

“Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that helps 
prevent accidental deletion of files and directories. If trash is enabled and a 
file or directory is deleted using the Hadoop shell, the file is moved to the 
.Trash directory of the user's home directory instead of being deleted.  
Deleted files are initially moved (renamed) to the Current sub-directory of the 
.Trash directory with original path being preserved. Files and directories in 
the trash can be restored simply by moving them to a location outside the 
.Trash directory.

Due to the limited rename support, delete sub-directory/file within encryption 
zone with trash feature is not allowed. Client has to use -skipTrash option to 
work around this. HADOOP-10902 and HDFS-6767 improved the error message but 
without a complete solution to the problem. 

We propose to solve the problem by generalizing the mapping between encryption 
zone and its underlying HDFS directories from 1:1 today to 1:N. The encryption 
zone should allow non-overlapped directories such as scratch space or soft 
delete "trash" locations to be added/removed dynamically after creation. This 
way, rename for "scratch space" and "soft delete" can be better supported 
without breaking the assumption that rename is only supported "within the 
zone". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to