[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012551#comment-14012551
 ] 

Colin Patrick McCabe commented on HDFS-6382:
--------------------------------------------

bq. One approach, as you suggested, is we that implement a separate cleanup 
platform and users submit their policy to this platform, and we do the real 
cleanup action to the HDFS on behalf of users (as a superuser or other powerful 
user). But the separate platform has to implement an 
authentication/authorization mechanism to make sure the user is who they claim 
to be and have the permission (authentication is a must, authorization might be 
optional but it'd better have). It's a repeated job as the NameNode has done 
with Kerberos/acl.... If it's implemented inside the NameNode, we could 
leverage NameNode's authentication/authorization mechanism.

YARN / MR / etc already have authentication frameworks that you can use.  For 
example, you can set up a YARN queue with certain permissions so that only 
certain users or groups can submit to it.

Another idea is to have an HDFS directory where each group (or user) puts their 
files containing the cleanup policies they want, and let HDFS take care of 
permissions.

> HDFS File/Directory TTL
> -----------------------
>
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to