[
https://issues.apache.org/jira/browse/HDFS-13314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420973#comment-16420973
]
Yongjun Zhang commented on HDFS-13314:
--------------------------------------
I had couple of email exchange with [~arpitagarwal]
{quote}
HI Arpit,
Sorry I'm behind to catch up with:
https://issues.apache.org/jira/browse/HDFS-13314
Good work you did there!
I have some questions and thoughts:
1. Did you observe duplicate entries in deleted list below? I assumed so, would
like to confirm just in case.
List<INode> deleted = diff.getChildrenDiff().getDeletedUnmodifiable();
2. The fsimage loading code could crash here:
for (int refId : e.getRefChildrenList()) {
INodeReference ref = refList.get(refId);
addToParent(p, ref);
}
due to fsimage corruption. Does your checking cover that?
3. The code below: if the list is sorted, how could misorder happen? Did you
observe disorder at all? or just to make the checking complete? Also very
minor, misorder seems to be detected once only because of the highlighted code
check !misordered.
INode previousNode = null;
boolean misordered = false;
for (INode d : deleted) {
// getBytes() may return null below, and that is okay.
final int result = previousNode == null ? -1 :
previousNode.compareTo(d.getLocalNameBytes());
if (result == 0) {
FSImage.LOG.error(
"Name '" + d.getLocalName() + "' is repeated in the " +
"'deleted' difflist of directory " +
dir.getFullPathName() + ", INodeId=" + dir.getId());
++numImageErrors;
} else if (result > 0 && !misordered) {
misordered = true;
++numImageErrors;
}
previousNode = d;
4. I assume the code above detect only duplicate entries. Can we have the check
in the place where new entries are added to the deleteList? That way, we can
know exactly the trace stack that caused duplicated entries are added, and
probably additional information?
Specifically, in the following code, we can change the AssertionError into a
real exception?
private void insert(final ListType type, final E element, final int i) {
List<E> list = type == ListType.CREATED? created: deleted;
if (i >= 0) {
throw new AssertionError("Element already exists: element=" + element
+ ", " + type + "=" + list);
}
if (list == null) {
list = new ArrayList<E>(DEFAULT_ARRAY_INITIAL_CAPACITY);
if (type == ListType.CREATED) {
created = list;
} else if (type == ListType.DELETED){
deleted = list;
}
}
list.add(-i - 1, element);
}
Thanks a lot.
--Yongjun
{quote}
> NameNode should optionally exit if it detects FsImage corruption
> ----------------------------------------------------------------
>
> Key: HDFS-13314
> URL: https://issues.apache.org/jira/browse/HDFS-13314
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Arpit Agarwal
> Assignee: Arpit Agarwal
> Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.2
>
> Attachments: HDFS-13314.01.patch, HDFS-13314.02.patch,
> HDFS-13314.03.patch, HDFS-13314.04.patch, HDFS-13314.05.patch
>
>
> The NameNode should optionally exit after writing an FsImage if it detects
> the following kinds of corruptions:
> # INodeReference pointing to non-existent INode
> # Duplicate entries in snapshot deleted diff list.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]