[ 
https://issues.apache.org/jira/browse/HDFS-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789982#action_12789982
 ] 

Todd Lipcon commented on HDFS-821:
----------------------------------

Two options:
# A time based heuristic as Mike suggested in the comment - 48 hours of "tmp" 
should be safe to remove, since as far as I recall, client writes actually go 
into the rbw/ directory now.
# I believe replicas in progress actually track the writing thread now, as 
well. So, we can see if that thread is still running, and if it's not, remove 
it.

> Garbage collect datanode tmp dirs
> ---------------------------------
>
>                 Key: HDFS-821
>                 URL: https://issues.apache.org/jira/browse/HDFS-821
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>
> I've seen in practice (and it's been reported on the list) cases where the 
> datanode's tmp dir can become quite full with abandoned blocks. There's an 
> ancient comment from April 07:
> {code}
>   // REMIND - mjc - eventually we should have a timeout system
>   // in place to clean up block files left by abandoned clients.
>   // We should have some timer in place, so that if a blockfile
>   // is created but non-valid, and has been idle for >48 hours,
>   // we can GC it safely.
> {code}
> Well, we can consider ourselves reminded, so let's do it!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to