[ 
https://issues.apache.org/jira/browse/HDFS-15792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275901#comment-17275901
 ] 

Stephen O'Donnell commented on HDFS-15792:
------------------------------------------

[~prasad-acit] I am not able to get the line numbers in the stack trace to line 
up with the 3.1.1 release branch. Have you some other image loading patches 
applied to this release (eg HDFS-14617)? For example any of the parallel image 
loading patches or patches which are internal and just on your build?

The error occurred in the class:

org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes

But I don't see that method anywhere in FSImageFormatPBINode.java on 3.1.1 or 
trunk. Grepping the source for `grep -R "readPBINodes" *` gives no results 
either.

While moving to a concurrent hashMap may make sense, all the structures in the 
NN are generally protected by the global NN lock when concurrent operations are 
happening. Therefore the NN tends not to use concurrent hash maps etc. 

If this structure is getting modified by two treads at once to cause this bug, 
I would like to understand how it is happening - ie have we missed 
synchronising around this structure somehow during image loading? Prior to 
HDFS-14617 the image is loaded by a single thread which makes concurrent access 
less likely too.

> ClasscastException while loading FSImage
> ----------------------------------------
>
>                 Key: HDFS-15792
>                 URL: https://issues.apache.org/jira/browse/HDFS-15792
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: nn
>            Reporter: Renukaprasad C
>            Assignee: Renukaprasad C
>            Priority: Major
>         Attachments: HDFS-15792.001.patch, HDFS-15792.002.patch, 
> image-2021-01-27-12-00-34-846.png
>
>
> FSImage loading has failed with ClasscastException - 
> java.lang.ClassCastException: java.util.HashMap$Node cannot be cast to 
> java.util.HashMap$TreeNode.
> This is the usage issue with Hashmap in concurrent scenarios.
> Same issue has been reported on Java & closed as usage issue.  - 
> https://bugs.openjdk.java.net/browse/JDK-8173671
> 2020-12-28 11:36:26,127 | ERROR | main | An exception occurred when loading 
> INODE from fsiamge. | FSImageFormatProtobuf.java:442
> java.lang.
> : java.util.HashMap$Node cannot be cast to java.util.HashMap$TreeNode
>       at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835)
>       at java.util.HashMap$TreeNode.treeify(HashMap.java:1951)
>       at java.util.HashMap.treeifyBin(HashMap.java:772)
>       at java.util.HashMap.putVal(HashMap.java:644)
>       at java.util.HashMap.put(HashMap.java:612)
>       at 
> org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53)
>       at 
> org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391)
>       at 
> org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> 2020-12-28 11:36:26,130 | ERROR | main | Failed to load image from 
> FSImageFile(file=/srv/BigData/namenode/current/fsimage_0000000000198227480, 
> cpktTxId=0000000000198227480) | FSImage.java:738
> java.io.IOException: java.lang.ClassCastException: java.util.HashMap$Node 
> cannot be cast to java.util.HashMap$TreeNode
>       at 
> org.apache.hadoop.io.MultipleIOException$Builder.add(MultipleIOException.java:68)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.runLoaderTasks(FSImageFormatProtobuf.java:444)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:360)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:263)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:227)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:971)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:955)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:820)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:733)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1113)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:730)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:926)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1665)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1735)
> Caused by: java.lang.ClassCastException: java.util.HashMap$Node cannot be 
> cast to java.util.HashMap$TreeNode
>       at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1835)
>       at java.util.HashMap$TreeNode.treeify(HashMap.java:1951)
>       at java.util.HashMap.treeifyBin(HashMap.java:772)
>       at java.util.HashMap.putVal(HashMap.java:644)
>       at java.util.HashMap.put(HashMap.java:612)
>       at 
> org.apache.hadoop.hdfs.util.ReferenceCountMap.put(ReferenceCountMap.java:53)
>       at 
> org.apache.hadoop.hdfs.server.namenode.AclStorage.addAclFeature(AclStorage.java:391)
>       at 
> org.apache.hadoop.hdfs.server.namenode.INodeWithAdditionalFields.addAclFeature(INodeWithAdditionalFields.java:349)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectory(FSImageFormatPBINode.java:225)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINode(FSImageFormatPBINode.java:406)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.readPBINodes(FSImageFormatPBINode.java:367)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeSection(FSImageFormatPBINode.java:342)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader$2.call(FSImageFormatProtobuf.java:469)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to