That's not supposed to happen. What version of Hadoop are you using? It
Please file a jira with details including how the namenodes are configured.

For the recovery:
First and foremost, do not shut down the active namenode. Put it into safe
mode and issue a saveNamespace command to create a checkpoint.  Then use
the bootstrapStandby command to re-initialize the standby.

Hope it helps.

Kihwal


On Tue, Aug 27, 2019 at 7:12 AM Lionel CL <whuca...@outlook.com> wrote:

> Hi committee,
> We encountered a NN error as below,
> The primary NN was shut down last Thursday and we recover it by remove
> some OP in the edit log..  But today the standby NN was shut down again by
> the same error...
> could you pls help address the possible root cause?
>
> 2019-08-27 09:51:14,075 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered
> exception on operation CloseOp [length=0, inodeId=0,
> path=/******/v2-data-20190826.data, replication=2, mtime=1566870616821,
> atime=1566870359230, blockSize=134217728, blocks=[blk_1270599798_758966421,
> blk_1270599852_758967928, blk_1270601282_759026903,
> blk_1270602443_759027052, blk_1270602446_759061086,
> blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--,
> aclEntries=null, clientName=, clientMachine=, overwrite=false,
> storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE,
> txid=4359520942]
> java.io.IOException: Mismatched block IDs or generation stamps, attempting
> to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as
> block # 4/6 of /******/v2-data-20190826.mayfly.data
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> 2019-08-27 09:51:14,077 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write
> lock held for 11714 ms via
>
>
> Thanks & Best Regards,
> Lionel Cao
>

Reply via email to