[
https://issues.apache.org/jira/browse/HDFS-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cao, Lionel updated HDFS-14787:
-------------------------------
Summary: NameNode error (was: [Help] NameNode error )
> NameNode error
> ---------------
>
> Key: HDFS-14787
> URL: https://issues.apache.org/jira/browse/HDFS-14787
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Cao, Lionel
> Priority: Major
> Attachments: core-site.xml,
> hadoop-cmf-hdfs-NAMENODE-smc-nn02.jq.log.out.20190827, hdfs-site.xml,
> move&concat.java, rt-Append.txt
>
>
> Hi committee,
> We encountered a NN error as below,
> The primary NN was shut down last Thursday and we recover it by remove some
> OP in the edit log.. But the standby NN was shut down again yesterday by the
> same error...
> could you pls help address the possible root cause?
>
> Attach some error log:
> Full log and NameNode configuration pls refer to the attachments.
> Besides, I have attached some java code which could cause the error,
> # We do some append action in spark streaming program (rt-Append.txt) which
> caused the primary NN shutdown last Thursday
> # We do some move & concat operation in data convert
> program(move&concat.java) which caused the standby NN shutdown yesterday
> 2019-08-27 09:51:12,409 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
> 766146/953617 transactions completed. (80%)2019-08-27 09:51:12,409 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
> 766146/953617 transactions completed. (80%)2019-08-27 09:51:12,858 INFO
> org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication
> from 2 to 2 for
> /user/smcjob/.sparkStaging/application_1561429828507_20423/__spark_libs__2381992047634476351.zip2019-08-27
> 09:51:12,870 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory:
> Increasing replication from 2 to 2 for
> /user/smcjob/.sparkStaging/application_1561429828507_20423/oozietest2-0.0.1-SNAPSHOT.jar2019-08-27
> 09:51:12,898 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory:
> Increasing replication from 2 to 2 for
> /user/smcjob/.sparkStaging/application_1561429828507_20423/__spark_conf__.zip2019-08-27
> 09:51:12,910 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory:
> Increasing replication from 2 to 2 for
> /user/smctest/.sparkStaging/application_1561429828507_20424/__spark_libs__8875310030853528804.zip2019-08-27
> 09:51:12,927 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory:
> Increasing replication from 2 to 2 for
> /user/smctest/.sparkStaging/application_1561429828507_20424/__spark_conf__.zip2019-08-27
> 09:51:13,777 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader:
> replaying edit log: 857745/953617 transactions completed. (90%)2019-08-27
> 09:51:14,035 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory:
> Increasing replication from 2 to 2 for
> /user/smc_ss/.sparkStaging/application_1561429828507_20425/__spark_libs__7422229681005558653.zip2019-08-27
> 09:51:14,067 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory:
> Increasing replication from 2 to 2 for
> /user/smc_ss/.sparkStaging/application_1561429828507_20426/__spark_libs__7479542421029947753.zip2019-08-27
> 09:51:14,070 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory:
> Increasing replication from 2 to 2 for
> /user/smctest/.sparkStaging/application_1561429828507_20428/__spark_libs__7647933078788028649.zip2019-08-27
> 09:51:14,075 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader:
> Encountered exception on operation CloseOp [length=0, inodeId=0,
> path=/******/v2-data-20190826.mayfly.data, replication=2,
> mtime=1566870616821, atime=1566870359230, blockSize=134217728,
> blocks=[blk_1270599798_758966421, blk_1270599852_758967928,
> blk_1270601282_759026903, blk_1270602443_759027052, blk_1270602446_759061086,
> blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--,
> aclEntries=null, clientName=, clientMachine=, overwrite=false,
> storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE,
> txid=4359520942]java.io.IOException: Mismatched block IDs or generation
> stamps, attempting to replace block blk_1270602446_759027503 with
> blk_1270602446_759061086 as block # 4/6 of
> /******/v2-data-20190826.mayfly.data at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27
> 09:51:14,077 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> FSNamesystem write lock held for 11714 ms
> viajava.lang.Thread.getStackTrace(Thread.java:1559)org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:218)org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1630)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:309)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> Number of suppressed write-lock reports: 0 Longest write-lock held interval:
> 117142019-08-27 09:51:14,077 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block report
> queue is full2019-08-27 09:51:14,077 FATAL
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error
> encountered while tailing edits. Shutting down standby
> NN.java.io.IOException: Mismatched block IDs or generation stamps, attempting
> to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as
> block # 4/6 of /*******/v2-data-20190826.mayfly.data at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27
> 09:51:14,105 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1:
> java.io.IOException: Mismatched block IDs or generation stamps, attempting to
> replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block
> # 4/6 of /*******/v2-data-20190826.mayfly.data2019-08-27 09:51:14,118 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************SHUTDOWN_MSG:
> Shutting down NameNode at
> xxx-nn02.jq/10.129.148.13************************************************************/2019-08-27
> 10:43:15,713 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> STARTUP_MSG:
> /************************************************************STARTUP_MSG:
> Starting NameNodeSTARTUP_MSG: host = xxx-nn02.jq/10.129.148.13STARTUP_MSG:
> args = []STARTUP_MSG: version = 3.0.0-cdh6.0.1
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]