[ 
https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521887#comment-14521887
 ] 

Suresh Srinivas commented on HDFS-8193:
---------------------------------------

[~zhz], I am not clear on what use case this is solving.
We now have a mechanism to delay block deletion after namenode startup. This is 
precisely targeting the issues of administrator copying wrong fsimage (older 
fsimage) which could result in deletion of blocks and loss of data.

bq. First, NN bugs could cause block replicas to be deleted without deleting 
the file. Second, it's rather easy to back up NN metadata before performing 
maintenance, but extremely difficult to back up actual DN data. This JIRA aims 
to address that deficiency / discrepancy.
Second use case, NN deleted file and admin wants to restore it (the case of NN 
metadata backup). Going back to an older fsimage is not that straight forward 
and a solution to be used only in desperate situation. It can cause corruption 
for other applications running on HDFS. It also results in loss of newly 
created data across the file system. Snapshots and trash are solutions for this.

Second use case, NN deletes blocks without deleting files. Have you seen an 
instance of this? It would be great to get one pager on how one handles this 
condition. Does NN keep deleting the blocks until it is hot fixed? Also 
completing deletion of blocks in a timely manner is important for a running 
cluster. All files don't require the same reliability. Intermediate data and 
tmp files need to be deleted immediately to free up cluster storage to avoid 
the risk of running out of storage space. At datanode level, there is no notion 
of whether files are temporary or important ones that need to be preserved. So 
a trash such as this can result in retaining lot of tmp files and deletes not 
being able to free up storage with in the cluster fast enough.

Can you please talk about any other administrative mistakes that you are 
targeting with this functionality?


> Add the ability to delay replica deletion for a period of time
> --------------------------------------------------------------
>
>                 Key: HDFS-8193
>                 URL: https://issues.apache.org/jira/browse/HDFS-8193
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: namenode
>    Affects Versions: 2.7.0
>            Reporter: Aaron T. Myers
>            Assignee: Zhe Zhang
>
> When doing maintenance on an HDFS cluster, users may be concerned about the 
> possibility of administrative mistakes or software bugs deleting replicas of 
> blocks that cannot easily be restored. It would be handy if HDFS could be 
> made to optionally not delete any replicas for a configurable period of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to