[
https://issues.apache.org/jira/browse/HDFS-13818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604571#comment-16604571
]
Gabor Bota commented on HDFS-13818:
-----------------------------------
Thanks for working on this [~adam.antal]. This feature is starting to look
great.
I've noticed the following while looking into HDFS-13818.003.patch:
* asflicense is missing in {{Corruption}} class
* Please consider a better name for the {{Corruption}} class - like
{{PbImageCorruption}}.
* For Preconditions.checkState in Corruption: please add the error message,
what was a failure. We could also consider using {{assert}} for this purpose.
* It seems like CorruptionType could be an enum. Maybe we could even use a Set
of those enums for different kinds of Corruption
* Code structuring: {{OutputEntryBuilder}} could be in the
{{PBImageCorruptionDetector}} - it will be the part of it that logic, and we
could use {{Corruption}} just for storing data
* Please extend the docs in
{{hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsImageViewer.md}} with
the description of this feature
* Fix checkstyle issue. There's a [link for it in the Hadoop QA's
comment|https://builds.apache.org/job/PreCommit-HDFS-Build/24941/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt]
> Extend OIV to detect FSImage corruption
> ---------------------------------------
>
> Key: HDFS-13818
> URL: https://issues.apache.org/jira/browse/HDFS-13818
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Reporter: Adam Antal
> Assignee: Adam Antal
> Priority: Major
> Attachments: HDFS-13818.001.patch, HDFS-13818.002.patch,
> HDFS-13818.003.patch, HDFS-13818.003.patch,
> OIV_CorruptionDetector_processor.001.pdf,
> OIV_CorruptionDetector_processor.002.pdf
>
>
> A follow-up Jira for HDFS-13031: an improvement of the OIV is suggested for
> detecting corruptions like HDFS-13101 in an offline way.
> The reasoning is the following. Apart from a NN startup throwing the error,
> there is nothing in the customer's hand that could reassure him/her that the
> FSImages is good or corrupted.
> Although real full checking of the FSImage is only possible by the NN, for
> stack traces associated with the observed corruption cases the solution of
> putting up a tertiary NN is a little bit of overkill. The OIV would be a
> handy choice, already having functionality like loading the fsimage and
> constructing the folder structure, we just have to add the option of
> detecting the null INodes. For e.g. the Delimited OIV processor can already
> use in disk MetadataMap, which reduces memory consumption. Also there may be
> a window for parallelizing: iterating through INodes for e.g. could be done
> distributed, increasing efficiency, and we wouldn't need a high mem-high CPU
> setup for just checking the FSImage.
> The suggestion is to add a --detectCorruption option to the OIV which would
> check the FSImage for consistency.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]