[ 
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015678#comment-14015678
 ] 

Haohui Mai commented on HDFS-6382:
----------------------------------

I think the comments against implementing it in NN are legit. Popping up one 
level, I'm wondering what is the best approach to meet the following 
requirements:

# Fine tune the behavior of HDFS, which requires the information from the 
internal data structure in HDFS.
# Performing the above task without MapReduce to simplify the operations of the 
cluster.

To meet the above requirements, today it looks like to me that there is no way 
other than making massive changes in HDFS.

What I'm wondering is that whether it is possible to architect the system to 
make things easier. For example, is it possible to generalize the architecture 
of the balancer we have today to accomplish these types of tasks? From a very 
high level it looks to me that most of the code can sit outside of the NN while 
meeting the above requirements. Since this is aiming for advanced usages, there 
are more freedoms on the design of the architecture. For instance, the 
architecture might choose to expose the details of the implementation and do 
not guarantee compatibility (like an Exokernel type of system). 

Thoughts?

> HDFS File/Directory TTL
> -----------------------
>
>                 Key: HDFS-6382
>                 URL: https://issues.apache.org/jira/browse/HDFS-6382
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>
> In production environment, we always have scenario like this, we want to 
> backup files on hdfs for some time and then hope to delete these files 
> automatically. For example, we keep only 1 day's logs on local disk due to 
> limited disk space, but we need to keep about 1 month's logs in order to 
> debug program bugs, so we keep all the logs on hdfs and delete logs which are 
> older than 1 month. This is a typical scenario of HDFS TTL. So here we 
> propose that hdfs can support TTL.
> Following are some details of this proposal:
> 1. HDFS can support TTL on a specified file or directory
> 2. If a TTL is set on a file, the file will be deleted automatically after 
> the TTL is expired
> 3. If a TTL is set on a directory, the child files and directories will be 
> deleted automatically after the TTL is expired
> 4. The child file/directory's TTL configuration should override its parent 
> directory's
> 5. A global configuration is needed to configure that whether the deleted 
> files/directories should go to the trash or not
> 6. A global configuration is needed to configure that whether a directory 
> with TTL should be deleted when it is emptied by TTL mechanism or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to