[
https://issues.apache.org/jira/browse/HDFS-13031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574593#comment-16574593
]
Adam Antal commented on HDFS-13031:
-----------------------------------
Looking through thoroughly the cases there are several aspects of the problems
that need to be addressed carefully (plus recapping what's said so far):
1. There may be multiple bugs that can cause corruption in the FSImage. Besides
knowing that snapshotting always plays a part, the exact cause is yet unknown.
2. Usually when the corruption turns out, the good fsimage is already deleted,
and we couldn't replay previous actions, therefore couldn't get closer to the
root cause.
3. If there were at least one case where we could obtain and examine an
audit.log around the corruption event (provided the customer realize the
corruption and shutdown the cluster in time), we would have an idea what causes
the corruption ultimately leading to a solution.
4. We can provide a patch for HDFS to just ignore the NPE. It would cause loss
of data for the customer, thus it should be avoided.
5. Apart from a NN startup throwing the error, there is nothing in the
customer's hand that could reassure him/her that the FSImages is good or
corrupted.
6. Solutions like put up a tertiary NN, and try running it to make sure of the
correctness of the FSImage is a little bit of overkill. The goal is to detect
the corruption on the spot, possibly without starting a tertiary NN.
7. Patching HDFS with "if you detect something nasty while writing out the
FSImage, stop" is basically equivalent with finding and solving the bug itself,
so the FSImage must be loaded again after written out.
8. An additional NN would be an overkill so another program is advisable. The
OIV would be a handy choice, while building an independent tool for this sole
purpose is may be too much effort.
9. The OIV already has functionality like loading the fsimage and constructing
the folder structure, we just have to add the option of detecting the null
INodes.
10. For e.g. the Delimited OIV processor can already use in disk MetadataMap,
which reduces memory consumption. Also there may be a window for parallelizing:
iterating through INodes for e.g. could be done distributed, increasing
efficiency, and we wouldn't need a high mem-high CPU setup for just checking
the FSImage.
For the above mentioned reasons I suggest a modified OIV processor as a
solution for detecting fsimage corruption on the spot.
As a sidemark: it is another approach to HDFS-13314. The OIV can be run as a
guarantee: if corruption is detected, the NN exits.
What do you think about?
> To detect fsimage corruption on the spot
> ----------------------------------------
>
> Key: HDFS-13031
> URL: https://issues.apache.org/jira/browse/HDFS-13031
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Environment:
> Reporter: Yongjun Zhang
> Assignee: Adam Antal
> Priority: Major
>
> Since we fixed HDFS-9406, there are new cases reported from the field that
> similar fsimage corruption happens. We need good fsimage + editlogs to replay
> to reproduce the corruption. However, usually when the corruption is detected
> (at later NN restart), the good fsimage is already deleted.
> We need to have a way to detect fsimage corruption on the spot. Currently
> what I think we could do is:
> # after SNN creates a new fsimage, it spawn a new modified NN process (NN
> with some new command line args) to just load the fsimage and do nothing
> else.
> # If the process failed, the currently running SNN will do either a) backup
> the fsimage + editlogs or b) no longer do checkpointing. And it need to
> somehow raise a flag to user that the fsimage is corrupt.
> In step 2, if we do a, we need to introduce new NN->JN API to backup
> editlogs; if we do b, it changes SNN's behavior, and kind of not compatible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]