[ 
http://issues.apache.org/jira/browse/HADOOP-432?page=comments#action_12459441 ] 
            
Doug Cutting commented on HADOOP-432:
-------------------------------------

> doing this every few minutes is expensive

Yes, walking the entire trash bucket too frequently would be a problem.  So we 
can either walk it less frequently and/or don't walk the whole thing.  I've 
proposed bucketing the trash into 10-or-more minute sub-directories, so that 
only the root trash directory need be listed, and even that should only be 
listed every 30 or more minutes.

> Creating a folder for every X minutes (how many?) will make restoring a file 
> harder.

But, with globbing, it won't be too hard.  The primary point is not to make 
restoring files ultra-simple but rather to make it possible.

> an external process reclaiming space needs to be monitored, otherwise files 
> will accumulate in the trash and the dfs will fill up

If the trash is full then folks can empty the trash.  I'm not arguing that we 
shouldn't start a thread in the namenode that empties the trash, just that this 
thread should be reusable code, written using the public FileSystem API.

Adding a trash can isn't going to magically resolve space issues.  It will 
primarily permit folks who accidentally delete things using the command line to 
recover their files.  With a per-user trash can, folks can easily monitor their 
trash usage manually if they like, and admins can email users whose trash is 
large, or even empty it for them.  The cleanup thread is an added feature that 
reduces the need for manual monitoring.


> support undelete, snapshots, or other mechanism to recover lost files
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-432
>                 URL: http://issues.apache.org/jira/browse/HADOOP-432
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Yoram Arnon
>         Assigned To: Wendy Chien
>         Attachments: undelete12.patch, undelete16.patch, undelete17.patch
>
>
> currently, once you delete a file it's gone forever.
> most file systems allow some form of recovery of deleted files.
> a simple solution would be an 'undelete' command.
> a more comprehensive solution would include snapshots, manual and automatic, 
> with scheduling options.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to