[ 
https://issues.apache.org/jira/browse/HADOOP-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623481#action_12623481
 ] 

dhruba borthakur commented on HADOOP-1869:
------------------------------------------

I agree that HADOOP-3336 does this from an auditing perspective.

I am interested in making some form of archival store in HDFS. Files that are 
not used for a long time can automatically be moved to slower and/or denser 
storage. Given the rate at which a cluster  size increases, and given the fact 
that the cost to store data for infinitely long time is very low, it makes 
sense for the file system to make intelligent storage decisions based on 
how/when data was accessed. This argues for "access time" to be stored in the 
file system itself.

HADOOP-3336 can be used to accomplish this to some extent... the separate log 
that it generates can be periodically merged with the file system image. But, I 
feel that design is a little awkward and not too elegant.


> access times of HDFS files
> --------------------------
>
>                 Key: HADOOP-1869
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1869
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> HDFS should support some type of statistics that allows an administrator to 
> determine when a file was last accessed. 
> Since HDFS does not have quotas yet, it is likely that users keep on 
> accumulating files in their home directories without much regard to the 
> amount of space they are occupying. This causes memory-related problems with 
> the namenode.
> Access times are costly to maintain. AFS does not maintain access times. I 
> thind DCE-DFS does maintain access times with a coarse granularity.
> One proposal for HDFS would be to implement something like an "access bit". 
> 1. This access-bit is set when a file is accessed. If the access bit is 
> already set, then this call does not result in a transaction.
> 2. A FileSystem.clearAccessBits() indicates that the access bits of all files 
> need to be cleared.
> An administrator can effectively use the above mechanism (maybe a daily cron 
> job) to determine files that are recently used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to