[ 
https://issues.apache.org/jira/browse/HDFS-15205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051017#comment-17051017
 ] 

Stephen O'Donnell commented on HDFS-15205:
------------------------------------------

Hi [~angerszhuuu] With the patch you have uploaded, does this fix the load 
order problem you are seeing on branch-2?

Looking at the logic, your change should only have an impact when one of the 
section names is null, and I think that should never be the case. Therefore I 
am wondering if there is something else going wrong when you backported the 
patch to branch 2.6. We have also seen this patch used in a few places and this 
has not been reported before.

It does look like this is a bug in the sorting, but it has probably gone 
unnoticed as there should never be any null section names.

Can I ask:

1. Did the patch backport cleanly to 2.6 or did you need to make some changes?
2. With and without your fix, could you print out the order of the items in the 
"sections" list after sorting it so we can see what it is doing, and also log a 
message if any of the sections are null, eg:

{code}
      ...
      ArrayList<FileSummary.Section> sections = Lists.newArrayList(summary
          .getSectionsList());
      Collections.sort(sections, new Comparator<FileSummary.Section>() {
        @Override
        public int compare(FileSummary.Section s1, FileSummary.Section s2) {
          SectionName n1 = SectionName.fromString(s1.getName());
          SectionName n2 = SectionName.fromString(s2.getName());
          if (n1 == null) {
            LOG.warn("n1 is null");
            return n2 == null ? 0 : -1;
          } else if (n2 == null) {
             LOG.warn("n2 is null but n1 is not");
            return -1;
          } else {
            return n1.ordinal() - n2.ordinal();
          }
        }
      });

     for (FileSummary.Section s : sections) {
         log.warn("Section: " + s.getName());
     }
{code}

3. If you check the SectionName enum in FSImageFormatProtobuf.java, are all the 
sections present in your image also defined in the enum?



> FSImage sort section logic is wrong
> -----------------------------------
>
>                 Key: HDFS-15205
>                 URL: https://issues.apache.org/jira/browse/HDFS-15205
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: angerszhu
>            Priority: Blocker
>         Attachments: HDFS-15205.001.patch
>
>
> When load FSImage, it will sort sections in FileSummary and load Section's in 
> SectionName enum sequence. But the sort method is wrong , when I use 
> branch-2.6.0 to load fsimage write by branch-2 with patch  
> https://issues.apache.org/jira/browse/HDFS-14771, it will throw NPE because 
> it load INODE first 
> {code:java}
> 2020-03-03 14:33:26,618 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadPermission(FSImageFormatPBINode.java:101)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:148)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadRootINode(FSImageFormatPBINode.java:332)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:218)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:254)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:180)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:226)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1036)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1020)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:741)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:677)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:290)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1092)
>     at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:780)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:609)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:666)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:838)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:817)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1538)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1606)
> {code}
> I print the load  order:
> {code:java}
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = INODE,  
> offset = 37, length = 11790829 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 37, length = 826591 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 826628, length = 828192 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 1654820, length = 835240 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 2490060, length = 833630 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 3323690, length = 909445 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 4233135, length = 866147 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 5099282, length = 1272751 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 6372033, length = 1311876 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 7683909, length = 1251510 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 8935419, length = 1296120 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 10231539, length = 770082 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_SUB,  offset = 11001621, length = 789245 ]
> 2020-03-03 15:49:36,424 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 11790866, length = 67038 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 11857904, length = 84692 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 11942596, length = 71759 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> NS_INFO,  offset = 8, length = 29 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> STRING_TABLE,  offset = 12567596, length = 440 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_REFERENCE,  offset = 12566380, length = 0 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> SNAPSHOT,  offset = 12566191, length = 83 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR,  offset = 11790866, length = 774068 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> FILES_UNDERCONSTRUCTION,  offset = 12564934, length = 1257 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> SNAPSHOT_DIFF,  offset = 12566274, length = 106 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> SECRET_MANAGER,  offset = 12566380, length = 1209 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> CACHE_MANAGER,  offset = 12567589, length = 7 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 12014355, length = 84629 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 12098984, length = 65215 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 12164199, length = 64496 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 12228695, length = 68122 ]
> 2020-03-03 15:49:36,425 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 12296817, length = 53417 ]
> 2020-03-03 15:49:36,426 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 12350234, length = 51455 ]
> 2020-03-03 15:49:36,426 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 12401689, length = 80305 ]
> 2020-03-03 15:49:36,426 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> INODE_DIR_SUB,  offset = 12481994, length = 82940 ]
> 2020-03-03 15:49:36,426 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: [name = 
> SNAPSHOT_DIFF_SUB,  offset = 12566274, length = 106 ]
> 2020-03-03 15:49:36,426 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Current loadin
> {code}
> The order is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to