Improve Trash Emptier
---------------------
Key: HADOOP-6761
URL: https://issues.apache.org/jira/browse/HADOOP-6761
Project: Hadoop Common
Issue Type: Improvement
Reporter: Dmytro Molkov
There are two inefficiencies in the Trash functionality right now that have
caused some problems for us.
First if you configured your trash interval to be one day (24 hours) that means
that you store 2 days worth of data eventually. The Current and the previous
timestamp that will not be deleted until the end of the interval.
And another problem is accumulating a lot of data in Trash before the Emptier
wakes up. If there are a couple of million files trashed and the Emptier does
deletion on HDFS the NameNode will freeze until everything is removed. (this
particular problem hopefully will be addressed with HDFS-1143).
My proposal is to have two configuration intervals. One for deleting the
trashed data and another for checkpointing. This way for example for intervals
of one day and one hour we will only store 25 hours of data instead of 48 right
now and the deletions will be happening in smaller chunks every hour of the day
instead of a huge deletion at the end of the day now.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.