[ 
https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896818#comment-16896818
 ] 

Chao Sun commented on HDFS-14674:
---------------------------------

Thanks [~wangzhaohui] and [~wuweiwei]. Now I see the issue: suppose the 
{{editStreams}} contains two edits: {{[0,1000)}}, and {{[1000,2000]}}, and the 
config is set to {{500}}, then in the loop it will load the first 500 edits of 
the first edit stream and the continue to load the first 500 of the second 
stream, skipping the rest 500 edits in the first stream. 

The patch looks good to me. It would be great if you can add a test case for 
this though. Also cc [~vagarychen] and [~shv].

> Got an unexpected txid when tail editlog
> ----------------------------------------
>
>                 Key: HDFS-14674
>                 URL: https://issues.apache.org/jira/browse/HDFS-14674
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: wangzhaohui
>            Assignee: wangzhaohui
>            Priority: Major
>         Attachments: HDFS-14674-001.patch, image-2019-07-26-11-34-23-405.png
>
>
> Add the following configuration
> !image-2019-07-26-11-34-23-405.png!
> error:
> {code:java}
> //代码占位符
> [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit 
> log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] 
> [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds 
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] 
> [Edit log tailer] : Start loading edits file 
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log 
> tailer] : Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit 
> log tailer] ip: Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit 
> log tailer] : Unknown error encountered while tailing edits. Shutting down 
> standby NN. java.io.IOException: There appears to be a gap in the edit log. 
> We expected txid 232056752162, but got txid 232077264498. at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
>  [2019-07-17T11:50:21.064+08:00] [INFO] [Edit log tailer] : Exiting with 
> status 1 [2019-07-17T11:50:21.066+08:00] [INFO] [Thread-1] : SHUTDOWN_MSG: 
> /************************************************************ SHUTDOWN_MSG: 
> Shutting down NameNode at ip 
> ************************************************************/
> {code}
>  
> if dfs.ha.tail-edits.max-txns-per-lock value is 500,when the namenode load 
> the editlog util 500,the current namenode will load the next editlog,but 
> editlog more than 500.So,namenode got an unexpected txid when tail editlog.
>  
>  
> {code:java}
> //代码占位符[2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Start loading 
> edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnsToRead = 500
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Fast-forwarding 
> stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Fast-forwarding 
> stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662
> [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit log tailer] : Unknown error 
> encountered while tailing edits. Shutting down standby NN.
> java.io.IOException: There appears to be a gap in the edit log.  We expected 
> txid 232056752162, but got txid 232077264498.
> {code}
> Read data from JN twice in the same second,changed segmentTxId,finally quit 
> because of txid mismatch.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to