[
https://issues.apache.org/jira/browse/HADOOP-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866724#action_12866724
]
Ravi Phulari commented on HADOOP-6761:
--------------------------------------
Dhruba, I am fine with adding completely new section titled "Trash behavior"
and describing HDFS trash in that.
But the Space Reclamation in HDFS Architecture Guide (hdfs_design.html) already
talks about Trash behavior and user facing details of trash along with data
retention policy.
I think adding trash section on user guide will be duplication of existing
section from hdfs design.
> Improve Trash Emptier
> ---------------------
>
> Key: HADOOP-6761
> URL: https://issues.apache.org/jira/browse/HADOOP-6761
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Dmytro Molkov
> Assignee: Dmytro Molkov
> Attachments: HADOOP-6761.patch
>
>
> There are two inefficiencies in the Trash functionality right now that have
> caused some problems for us.
> First if you configured your trash interval to be one day (24 hours) that
> means that you store 2 days worth of data eventually. The Current and the
> previous timestamp that will not be deleted until the end of the interval.
> And another problem is accumulating a lot of data in Trash before the Emptier
> wakes up. If there are a couple of million files trashed and the Emptier does
> deletion on HDFS the NameNode will freeze until everything is removed. (this
> particular problem hopefully will be addressed with HDFS-1143).
> My proposal is to have two configuration intervals. One for deleting the
> trashed data and another for checkpointing. This way for example for
> intervals of one day and one hour we will only store 25 hours of data instead
> of 48 right now and the deletions will be happening in smaller chunks every
> hour of the day instead of a huge deletion at the end of the day now.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.