[
https://issues.apache.org/jira/browse/HADOOP-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmytro Molkov updated HADOOP-6761:
----------------------------------
Attachment: HADOOP-6761.2.patch
This patch has a unittest and the modifications to core-default.
The idea of the test is to run Emptier and keep deleting files until we have
enough checkpoints that the old ones are getting deleted.
This tests that we are creating multiple checkpoints for these values of the
intervals and that we are deleting the older checkpoints correctly.
I am not including modifications to the documentation here since documentation
is in different project (HDFS) and it will require a different jira.
> Improve Trash Emptier
> ---------------------
>
> Key: HADOOP-6761
> URL: https://issues.apache.org/jira/browse/HADOOP-6761
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Dmytro Molkov
> Assignee: Dmytro Molkov
> Attachments: HADOOP-6761.2.patch, HADOOP-6761.patch
>
>
> There are two inefficiencies in the Trash functionality right now that have
> caused some problems for us.
> First if you configured your trash interval to be one day (24 hours) that
> means that you store 2 days worth of data eventually. The Current and the
> previous timestamp that will not be deleted until the end of the interval.
> And another problem is accumulating a lot of data in Trash before the Emptier
> wakes up. If there are a couple of million files trashed and the Emptier does
> deletion on HDFS the NameNode will freeze until everything is removed. (this
> particular problem hopefully will be addressed with HDFS-1143).
> My proposal is to have two configuration intervals. One for deleting the
> trashed data and another for checkpointing. This way for example for
> intervals of one day and one hour we will only store 25 hours of data instead
> of 48 right now and the deletions will be happening in smaller chunks every
> hour of the day instead of a huge deletion at the end of the day now.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.