[ 
https://issues.apache.org/jira/browse/HADOOP-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626781#action_12626781
 ] 

dhruba edited comment on HADOOP-1869 at 8/28/08 4:11 PM:
-------------------------------------------------------------------

I like Raghu's proposal that FileSystem.setAccessTime() can be renamed as 
FileSystem.utimes(FileStatus). But it creates some other issues:

1. The FileStatus object has blockSize of the file. The blockSize cannot be 
changed. Similarly, the FileStatus object has a field called 'isdir". What 
happens to this one? 

2. Similarly, the FileStatus has the length of the file. Are we going to 
truncate the file (or create a sparse file with holes if the user sets a longer 
length)?

3. There are existing APIs FileSystem.setReplication(), FileSystem.setOwner(), 
setGroup(), setPermissions(). etc. Will these be deprecated or coexist with the 
new API?

I prefer adding a setAccessTime because it allows an application to set the 
access time to an arbitrary value. If we want to merge all the above APIs into 
FileSystem.utimes(), I can do it as part of a separate JIRA.

Raghu, Konstanin: does it sound ok?

      was (Author: dhruba):
    I like Raghu's proposal that FileSystem.setAccessTime() can be renamed as 
FileSystem.utimes(FileStatus). But it creates some other issues:

1. The FikleStatus object has blockSize of the file. The blockSize cannot be 
changed. Similarly, the FileStatus object as a field called 'isdir". What 
happens to this one? 

2. Similarly, the FileStatus as the length of the file. Are we going to 
truncate the file (or create a sparse file if the users' sets a longer length)?

3. There are existing APIs FileSystem.setReplication(), FileSystem.setOwner(), 
setGroup(), setPermissions(). etc. Will these be deprecated or coexist with the 
new API?

I prefer adding a setAccessTime because it allows an application to set the 
access time to an arbitrary value. If we want to merge all the above APIs into 
FileSystem.utimes(), I can do it as part of a separate JIRA.

Raghu, Konstanin: does it sound ok?
  
> access times of HDFS files
> --------------------------
>
>                 Key: HADOOP-1869
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1869
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.19.0
>
>         Attachments: accessTime1.patch, accessTime4.patch, accessTime5.patch
>
>
> HDFS should support some type of statistics that allows an administrator to 
> determine when a file was last accessed. 
> Since HDFS does not have quotas yet, it is likely that users keep on 
> accumulating files in their home directories without much regard to the 
> amount of space they are occupying. This causes memory-related problems with 
> the namenode.
> Access times are costly to maintain. AFS does not maintain access times. I 
> thind DCE-DFS does maintain access times with a coarse granularity.
> One proposal for HDFS would be to implement something like an "access bit". 
> 1. This access-bit is set when a file is accessed. If the access bit is 
> already set, then this call does not result in a transaction.
> 2. A FileSystem.clearAccessBits() indicates that the access bits of all files 
> need to be cleared.
> An administrator can effectively use the above mechanism (maybe a daily cron 
> job) to determine files that are recently used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to