[ 
https://issues.apache.org/jira/browse/HDFS-14787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao, Lionel updated HDFS-14787:
-------------------------------
    Summary: NameNode error   (was: [Help] NameNode error )

> NameNode error 
> ---------------
>
>                 Key: HDFS-14787
>                 URL: https://issues.apache.org/jira/browse/HDFS-14787
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Cao, Lionel
>            Priority: Major
>         Attachments: core-site.xml, 
> hadoop-cmf-hdfs-NAMENODE-smc-nn02.jq.log.out.20190827, hdfs-site.xml, 
> move&concat.java, rt-Append.txt
>
>
> Hi committee,
> We encountered a NN error as below,
> The primary NN was shut down last Thursday and we recover it by remove some 
> OP in the edit log..  But the standby NN was shut down again yesterday by the 
> same error...
> could you pls help address the possible root cause?
>  
> Attach some error log:
> Full log and NameNode configuration pls refer to the attachments.
> Besides, I have attached some java code which could cause the error,
>  # We do some append action in spark streaming program (rt-Append.txt) which 
> caused the primary NN shutdown last Thursday
>  # We do some move & concat operation in data convert 
> program(move&concat.java) which caused the standby NN shutdown yesterday
> 2019-08-27 09:51:12,409 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 
> 766146/953617 transactions completed. (80%)2019-08-27 09:51:12,409 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 
> 766146/953617 transactions completed. (80%)2019-08-27 09:51:12,858 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory: Increasing replication 
> from 2 to 2 for 
> /user/smcjob/.sparkStaging/application_1561429828507_20423/__spark_libs__2381992047634476351.zip2019-08-27
>  09:51:12,870 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
> Increasing replication from 2 to 2 for 
> /user/smcjob/.sparkStaging/application_1561429828507_20423/oozietest2-0.0.1-SNAPSHOT.jar2019-08-27
>  09:51:12,898 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
> Increasing replication from 2 to 2 for 
> /user/smcjob/.sparkStaging/application_1561429828507_20423/__spark_conf__.zip2019-08-27
>  09:51:12,910 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
> Increasing replication from 2 to 2 for 
> /user/smctest/.sparkStaging/application_1561429828507_20424/__spark_libs__8875310030853528804.zip2019-08-27
>  09:51:12,927 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
> Increasing replication from 2 to 2 for 
> /user/smctest/.sparkStaging/application_1561429828507_20424/__spark_conf__.zip2019-08-27
>  09:51:13,777 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: 
> replaying edit log: 857745/953617 transactions completed. (90%)2019-08-27 
> 09:51:14,035 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
> Increasing replication from 2 to 2 for 
> /user/smc_ss/.sparkStaging/application_1561429828507_20425/__spark_libs__7422229681005558653.zip2019-08-27
>  09:51:14,067 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
> Increasing replication from 2 to 2 for 
> /user/smc_ss/.sparkStaging/application_1561429828507_20426/__spark_libs__7479542421029947753.zip2019-08-27
>  09:51:14,070 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: 
> Increasing replication from 2 to 2 for 
> /user/smctest/.sparkStaging/application_1561429828507_20428/__spark_libs__7647933078788028649.zip2019-08-27
>  09:51:14,075 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: 
> Encountered exception on operation CloseOp [length=0, inodeId=0, 
> path=/******/v2-data-20190826.mayfly.data, replication=2, 
> mtime=1566870616821, atime=1566870359230, blockSize=134217728, 
> blocks=[blk_1270599798_758966421, blk_1270599852_758967928, 
> blk_1270601282_759026903, blk_1270602443_759027052, blk_1270602446_759061086, 
> blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--, 
> aclEntries=null, clientName=, clientMachine=, overwrite=false, 
> storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, 
> txid=4359520942]java.io.IOException: Mismatched block IDs or generation 
> stamps, attempting to replace block blk_1270602446_759027503 with 
> blk_1270602446_759061086 as block # 4/6 of 
> /******/v2-data-20190826.mayfly.data at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27
>  09:51:14,077 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
> FSNamesystem write lock held for 11714 ms 
> viajava.lang.Thread.getStackTrace(Thread.java:1559)org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:261)org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:218)org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1630)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:309)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
>  Number of suppressed write-lock reports: 0 Longest write-lock held interval: 
> 117142019-08-27 09:51:14,077 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Block report 
> queue is full2019-08-27 09:51:14,077 FATAL 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error 
> encountered while tailing edits. Shutting down standby 
> NN.java.io.IOException: Mismatched block IDs or generation stamps, attempting 
> to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as 
> block # 4/6 of /*******/v2-data-20190826.mayfly.data at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888) at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)2019-08-27
>  09:51:14,105 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: 
> java.io.IOException: Mismatched block IDs or generation stamps, attempting to 
> replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block 
> # 4/6 of /*******/v2-data-20190826.mayfly.data2019-08-27 09:51:14,118 INFO 
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
> /************************************************************SHUTDOWN_MSG: 
> Shutting down NameNode at 
> xxx-nn02.jq/10.129.148.13************************************************************/2019-08-27
>  10:43:15,713 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> STARTUP_MSG: 
> /************************************************************STARTUP_MSG: 
> Starting NameNodeSTARTUP_MSG:   host = xxx-nn02.jq/10.129.148.13STARTUP_MSG:  
>  args = []STARTUP_MSG:   version = 3.0.0-cdh6.0.1



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to