[
https://issues.apache.org/jira/browse/HDFS-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052673#comment-17052673
]
angerszhu commented on HDFS-15205:
----------------------------------
[~sodonnell] [~hexiaoqiao] [~weichiu]
When start test, nn01 is active NN, nn02 is SBN
1. I back port [HDFS-14771|https://issues.apache.org/jira/browse/HDFS-14771] to
our branch 2.6
2. with patch, I restart nn02, nn02 can load origin FSImage well, then it write
new FSImage with sub-section
3. FSImage with sub-section will been synchronized to active nn01
4. (just for try) Failover from nn01 to nn02, I restart nn01 with jar without
patch [HDFS-14771|https://issues.apache.org/jira/browse/HDFS-14771]
5. then the NPE problem happens
So this error happened when we use a NN without patch
[HDFS-14771|https://issues.apache.org/jira/browse/HDFS-14771] to load FSImage
with sub-section in FileSummary
> FSImage sort section logic is wrong
> -----------------------------------
>
> Key: HDFS-15205
> URL: https://issues.apache.org/jira/browse/HDFS-15205
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: angerszhu
> Priority: Blocker
> Attachments: HDFS-15205.001.patch
>
>
> When load FSImage, it will sort sections in FileSummary and load Section's in
> SectionName enum sequence. But the sort method is wrong , when I use
> branch-2.6.0 to load fsimage write by branch-2 with patch
> https://issues.apache.org/jira/browse/HDFS-14771, it will throw NPE because
> it load INODE first
> {code:java}
> 2020-03-03 14:33:26,618 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadPermission(FSImageFormatPBINode.java:101)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:148)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadRootINode(FSImageFormatPBINode.java:332)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:218)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1036)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1020)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:741)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:677)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1092)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:780)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:666)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:838)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:817)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
> {code}
> I print the load order:
> {code:java}
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = INODE,
> offset = 37, length = 11790829 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 37, length = 826591 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 826628, length = 828192 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 1654820, length = 835240 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 2490060, length = 833630 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 3323690, length = 909445 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 4233135, length = 866147 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 5099282, length = 1272751 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 6372033, length = 1311876 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 7683909, length = 1251510 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 8935419, length = 1296120 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 10231539, length = 770082 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_SUB, offset = 11001621, length = 789245 ]
> 2020-03-03 15:49:36,424 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 11790866, length = 67038 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 11857904, length = 84692 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 11942596, length = 71759 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> NS_INFO, offset = 8, length = 29 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> STRING_TABLE, offset = 12567596, length = 440 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_REFERENCE, offset = 12566380, length = 0 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> SNAPSHOT, offset = 12566191, length = 83 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR, offset = 11790866, length = 774068 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> FILES_UNDERCONSTRUCTION, offset = 12564934, length = 1257 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> SNAPSHOT_DIFF, offset = 12566274, length = 106 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> SECRET_MANAGER, offset = 12566380, length = 1209 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> CACHE_MANAGER, offset = 12567589, length = 7 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12014355, length = 84629 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12098984, length = 65215 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12164199, length = 64496 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12228695, length = 68122 ]
> 2020-03-03 15:49:36,425 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12296817, length = 53417 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12350234, length = 51455 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12401689, length = 80305 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> INODE_DIR_SUB, offset = 12481994, length = 82940 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name =
> SNAPSHOT_DIFF_SUB, offset = 12566274, length = 106 ]
> 2020-03-03 15:49:36,426 INFO
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Current loadin
> {code}
> The order is wrong
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]