[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010879#comment-14010879
 ] 

Haohui Mai commented on HDFS-6382:
----------------------------------

bq. Your suggestion is that we'd better have a general mechanism/framework to 
run a job (maybe periodically) over the namespace inside the NN, and the TTL 
policy is just a specific job that might be implemented by user?

This is correct. There are a couple additional use cases that might be useful 
to keep in mind:

# Archiving data. TTL is one of the use case here.
# Backing up or syncing data between clusters. It's nice to back up / to sync 
data between clusters for disaster recovery, without running a MR job.
# Balancing data between data nodes.

A mechanism that can support the above use cases can be quite powerful and 
improve the state of the art. I'm happy to collaborate if this is the direction 
you guys want to pursue.

bq. We are heavy users of Hadoop and also do some in-house improvements per our 
business requirement. We definitely want to contribute the improvements back to 
community.

This is great to hear. Patches are welcome.

> HDFS File/Directory TTL
> -----------------------
>
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to