[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028682#comment-14028682
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6382:
-------------------------------------------

Checked the design doc.  It looks good.  Some comments:

- "Standalone Daemon Approach ... To Implement a completely new standalone 
daemon can rarely reuse existing code, will need lots of work to do."
I don't agree.  We may refactor Balancer or other tools if necessary.

- Using xattrs for TTL is a good idea. Do we really need ttl in milliseconds?  
Do you think that the daemon could guarantee such accuracy?  We don't want to 
waste namenode memory space to store trailing zeros/digits for each ttl.  How 
about supporting symbolic ttl notation, e.g. 10h, 5d?

- The name "Supervisor" sounds too general.  How about calling it "TtlManager" 
for the moment?  If there are more new features added to the tool, we may 
change the name later.

- For setting ttl on a directory foo, write permission permission on the parent 
directory of foo is not enough.  Namenode also checks rwx for all 
subdirectories of foo for recursive delete.  BTW, permission could be changed 
from time to time.  A user may be able to delete a file/dir at the time of 
setting TTL but the same user may not have permission to delete the same 
file/dir when the ttl expires.
I suggest not to check additional permission requirement on setting ttl but run 
as the particular user when deleting the file.  Then we need to add username to 
the ttl xattr.



> HDFS File/Directory TTL
> -----------------------
>
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>         Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design.pdf
>
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to