[
https://issues.apache.org/jira/browse/HDFS-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053304#comment-17053304
]
Stephen O'Donnell commented on HDFS-15205:
------------------------------------------
Thanks for adding the details. The sorting logic does indeed seem to be wrong,
but *only when there are nulls* due to section names in the image not being
defined in the Sections enum. Normally this should never happen, as this change
is not strictly compatible. You have created the image in a version with the
patch and then attempted to load it in a version without the patch, and that
has highlighted the null sorting issue.
Looking at the 2.6 branch code, if we fix the sort, it may allow your workflow
to work, but I know for sure there are later versions of Hadoop where it will
notice the undefined section and abort, instead of just skipping the undefined
section. This is the main reason we made this change off by default, as if you
turn it on, your ability to rollback is affected.
> FSImage sort section logic is wrong
> -----------------------------------
>
> Key: HDFS-15205
> URL: https://issues.apache.org/jira/browse/HDFS-15205
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: angerszhu
> Priority: Blocker
> Attachments: HDFS-15205.001.patch
>
>
> When load FSImage, it will sort sections in FileSummary and load Section's in
> SectionName enum sequence. But the sort method is wrong , when I use
> branch-2.6.0 to load fsimage write by branch-2 with patch
> https://issues.apache.org/jira/browse/HDFS-14771, it will throw NPE because
> it load INODE first
> {code:java}
> 2020-03-03 14:33:26,618 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadPermission(FSImageFormatPBINode.java:101)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:148)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadRootINode(FSImageFormatPBINode.java:332)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:218)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1036)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1020)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:741)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:677)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1092)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:780)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:666)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:838)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:817)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
> {code}
> I print the load order:
> {code:java}
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = INODE,
> offset = 37, length = 11790829 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 37, length = 826591 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 826628, length = 828192 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 1654820, length = 835240 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 2490060, length = 833630 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 3323690, length = 909445 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 4233135, length = 866147 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 5099282, length = 1272751 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 6372033, length = 1311876 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 7683909, length = 1251510 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 8935419, length = 1296120 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 10231539, length = 770082 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 11001621, length = 789245 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 11790866, length = 67038 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 11857904, length = 84692 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 11942596, length = 71759 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> NS_INFO, offset = 8, length = 29 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> STRING_TABLE, offset = 12567596, length = 440 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_REFERENCE, offset = 12566380, length = 0 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> SNAPSHOT, offset = 12566191, length = 83 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR, offset = 11790866, length = 774068 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> FILES_UNDERCONSTRUCTION, offset = 12564934, length = 1257 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> SNAPSHOT_DIFF, offset = 12566274, length = 106 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> SECRET_MANAGER, offset = 12566380, length = 1209 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> CACHE_MANAGER, offset = 12567589, length = 7 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12014355, length = 84629 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12098984, length = 65215 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12164199, length = 64496 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12228695, length = 68122 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12296817, length = 53417 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12350234, length = 51455 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12401689, length = 80305 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12481994, length = 82940 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> SNAPSHOT_DIFF_SUB, offset = 12566274, length = 106 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Current loadin
> {code}
> The order is wrong
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]