[
https://issues.apache.org/jira/browse/HDFS-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522819#comment-14522819
]
Zhe Zhang commented on HDFS-8193:
---------------------------------
Thanks [~sureshms] for the helpful comments!
bq. Second use case, NN deleted file and admin wants to restore it (the case of
NN metadata backup). Going back to an older fsimage is not that straight
forward and a solution to be used only in desperate situation. It can cause
corruption for other applications running on HDFS. It also results in loss of
newly created data across the file system. Snapshots and trash are solutions
for this.
You are absolutely right that it's always preferable to protect data on the
file instead of block level. This JIRA indeed is aimed as the last resort for
desperate situations. It's similar to recovering data directly from hard disk
drives when the file system is corrupt beyond recovery. It's fully controlled
by the DN and is the last layer of protection when all layers above have failed
(trash mistakenly emptied, snapshots not correctly setup, etc.).
bq. First use case, NN deletes blocks without deleting files. Have you seen an
instance of this? It would be great to get one pager on how one handles this
condition.
One possible situation (recently fixed by HDFS-7960) is that NN mistakenly
considers some blocks as over replicated, caused by zombie storages. Even
though HDFS-7960 is already fixed, we should do something to protect against
possible future NN bugs. This is the crux of why file-level protections,
although always desirable, are not always sufficient. It could be that the NN
gets something wrong, and then we're left with irrecoverable data loss.
bq. Does NN keep deleting the blocks until it is hot fixed?
In the above case, NN will delete all replicas it considers over replicated
until hot fixed.
bq. Also completing deletion of blocks in a timely manner is important for a
running cluster.
Yes this is a valid concern. Empirically, most customer clusters do not run
even close to near disk capacity. Therefore, adding a reasonable grace period
shouldn't delay allocating new blocks. The configured delay window should also
be enforced under the constraint of available space (e.g., don't delay deletion
when available disk space < 10%). We will also add Web UI and metrics support
to clearly show the space consumption by deletion-delayed replicas.
bq. All files don't require the same reliability. Intermediate data and tmp
files need to be deleted immediately to free up cluster storage to avoid the
risk of running out of storage space. At datanode level, there is no notion of
whether files are temporary or important ones that need to be preserved. So a
trash such as this can result in retaining lot of tmp files and deletes not
being able to free up storage with in the cluster fast enough.
This is a great point. The proposed work (at least in the first phase) is
intended as a best-effort optimization and will always yield to foreground
workloads. The target is to statistically reduce the chance and severity of
data losses given typical storage consumption conditions. It's certainly still
possible for wave of tmp data to flush out more important data in DN trashes.
We can design some smart eviction algorithms as future work.
As I [commented |
https://issues.apache.org/jira/browse/HDFS-8193?focusedCommentId=14505336&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14505336]
above, we are considering a more radical approach as a potential next phase of
this work, where deletion-delayed replicas will just be overwritten by incoming
replicas. In that case we might not even need to count deletion-delayed
replicas in the space quota, making the feature more transparent to admins.
> Add the ability to delay replica deletion for a period of time
> --------------------------------------------------------------
>
> Key: HDFS-8193
> URL: https://issues.apache.org/jira/browse/HDFS-8193
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: namenode
> Affects Versions: 2.7.0
> Reporter: Aaron T. Myers
> Assignee: Zhe Zhang
>
> When doing maintenance on an HDFS cluster, users may be concerned about the
> possibility of administrative mistakes or software bugs deleting replicas of
> blocks that cannot easily be restored. It would be handy if HDFS could be
> made to optionally not delete any replicas for a configurable period of time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)