[
https://issues.apache.org/jira/browse/HADOOP-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526274
]
Doug Cutting commented on HADOOP-1869:
--------------------------------------
> if we have to write a transaction for every file access, that could be a
> performance killer, do you agree?
I don't know. Logging file opens should be good enough, right? How much is
transaction logging a bottleneck currently? How much worse would this make it?
If files average ten or more blocks, and we're reading files not much more
often than we're writing them, then the impact might be small.
Another option to consider is making this a separate log that's buffered, since
its data is not as critical to filesystem function. We could flush the buffer
every minute or so, so that when the namenode crashes we'd lose only the last
minute of access time updates. Might that be acceptable?
> access times of HDFS files
> --------------------------
>
> Key: HADOOP-1869
> URL: https://issues.apache.org/jira/browse/HADOOP-1869
> Project: Hadoop
> Issue Type: New Feature
> Components: dfs
> Reporter: dhruba borthakur
>
> HDFS should support some type of statistics that allows an administrator to
> determine when a file was last accessed.
> Since HDFS does not have quotas yet, it is likely that users keep on
> accumulating files in their home directories without much regard to the
> amount of space they are occupying. This causes memory-related problems with
> the namenode.
> Access times are costly to maintain. AFS does not maintain access times. I
> thind DCE-DFS does maintain access times with a coarse granularity.
> One proposal for HDFS would be to implement something like an "access bit".
> 1. This access-bit is set when a file is accessed. If the access bit is
> already set, then this call does not result in a transaction.
> 2. A FileSystem.clearAccessBits() indicates that the access bits of all files
> need to be cleared.
> An administrator can effectively use the above mechanism (maybe a daily cron
> job) to determine files that are recently used.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.