[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013411#comment-14013411
 ] 

Hangjun Ye commented on HDFS-6382:
----------------------------------

I think we have two discussions here now: a TTL cleanup policy (implemented 
inside or outiside NN), and a general mechanism to help implement such a policy 
easily inside NN.

I've been convinced that a specific TTL cleanup policy implementation does NOT 
sound feasible to fly in core code of NN directly, I'm more interested to 
pursuing a mechanism to enable such policy implementation.

Considering HBase having co-processor 
(https://blogs.apache.org/hbase/entry/coprocessor_introduction), people could 
extend the functionality easily (w/o extending the base classes), such as 
counting rows, secondary index. We could argue that most of such usages are NOT 
necessarily implemented as server side, but having such a mechanism gives users 
an opportunity to choose what is most suitable for their requirements.

If the NN has such an extensible mechanism (as Haohui suggested earlier), we 
could implement a TTL cleanup policy in NN in an elegant way (w/o touching the 
base classes). And NN has abstracted out the "INode.Feature", we could 
implement a TTLFeature to hold the meta. The policy implementation doesn't have 
to go into community's codebase if it's too specific, we could keep it in our 
private branch. But basing on a general mechanism (w/o touching the base 
classes) makes it easy to be maintained (considering we would upgrade with new 
Hadoop releases regularly).

If you guys think such a general mechanism deserves to be considered, we are 
happy to contribute some efforts.

> HDFS File/Directory TTL
> -----------------------
>
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to