[
https://issues.apache.org/jira/browse/HDFS-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373706#comment-17373706
]
Wei-Chiu Chuang commented on HDFS-14529:
----------------------------------------
We encountered this bug again, and it is reproducible for this set of
fsimage/edit logs.
We added debug logs and found that the IIP has a few missing components. It was
supposed to have 8 components in the path but only 6 was found. Two were nulls.
It is likely caused by files already deleted from snapshots. Somehow the active
NN keeps the file in memory, so standby namenode crashes upon loading edits.
Comparing this method with other similar methods, I think we should check for
nullity of iip.getLastINode(), and throw FileNotFoundException. There are other
places in the code where we could add the nullity check as well. I did fail
several times for other edit log op (mkdir, rename, renameSnapshot) too.
{noformat}
21/07/02 11:39:39 ERROR namenode.FSEditLogLoader: AssertionError caught in
unprotectedSetTimes: iip=INodesInPath: path =
/apps/hive/warehouse/ea_common.db/sls_blng_rw/ins_gmt_dt=2021-06-22/part-00001-087de2ec-7888-4f2b-bea6-3702c69cf953.c000
inodes = [, apps, hive, warehouse, ea_common.db, sls_blng_rw, null, null],
length=8
isSnapshot = false
snapshotId = 8014, lastINode=null, mtime=-1, atime=1624825911021,
force? true
java.lang.AssertionError: i = 6 != 8, this=INodesInPath: path =
/apps/hive/warehouse/ea_common.db/sls_blng_rw/ins_gmt_dt=2021-06-22/part-00001-087de2ec-7888-4f2b-bea6-3702c69cf953.c000
inodes = [, apps, hive, warehouse, ea_common.db, sls_blng_rw, null, null],
length=8
isSnapshot = false
snapshotId = 8014
at
org.apache.hadoop.hdfs.server.namenode.INodesInPath.validate(INodesInPath.java:488)
at
org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:355)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:631)
{noformat}
> NPE while Loading the Editlogs
> ------------------------------
>
> Key: HDFS-14529
> URL: https://issues.apache.org/jira/browse/HDFS-14529
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs
> Affects Versions: 3.1.1
> Reporter: Harshakiran Reddy
> Assignee: Ayush Saxena
> Priority: Major
>
> {noformat}
> 2019-05-31 15:15:42,397 ERROR namenode.FSEditLogLoader: Encountered exception
> on operation TimesOp [length=0,
> path=/testLoadSpace/dir0/dir0/dir0/dir2/_file_9096763, mtime=-1,
> atime=1559294343288, opCode=OP_TIMES, txid=18927893]
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:490)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:711)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:286)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:181)
> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:924)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:771)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1558)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1640)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1725){noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]