[ 
https://issues.apache.org/jira/browse/HDFS-10943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16445363#comment-16445363
 ] 

He Xiaoqiao edited comment on HDFS-10943 at 4/20/18 6:22 AM:
-------------------------------------------------------------

I would like to provide more detail information:
 A. NameNode logs before fatal which is same as [~yzhangal] mentioned above:
{code:java}
2018-04-19 13:10:24,222 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Rolling edit logs
2018-04-19 13:10:24,222 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Ending log segment 15221708217
2018-04-19 13:10:24,595 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Number of transactions: 206713 Total time for transactions(ms): 2575 Number of 
transactions batched in Syncs: 8218 Number of syncs: 138767 SyncTimes(ms): 
42337 29016
2018-04-19 13:10:25,264 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Error: finalize log segment 15221708217, 15221914929 failed for required 
journal (JournalAndStream(mgr=QJM to [0.0.0.0:8485, 0.0.0.0:8485, 
0.0.0.0:8485], stream=QuorumOutputStream starting at txid 15221708217))
java.io.IOException: FSEditStream has 160370 bytes still to be flushed and 
cannot be closed.
        at 
org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66)
        at 
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1274)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1203)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1293)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6084)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1295)
        at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
        at 
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1690)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
2018-04-19 13:10:25,390 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting 
QuorumOutputStream starting at txid 15221708217
{code}
B. JournalNode logs when Standby NameNode transition To Active and recovery 
EditLog:
{code:java}
2018-04-19 13:10:47,137 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: 
Accepted recovery for segment 15221708217: segmentState { startTxId: 
15221708217 endTxId: 15221913880 isInProgress: true } acceptedInEpoch: 24
{code}
C. CallQueue of NameNode is full and lasted no more than three seconds.
{code:java}
Date   Time         RpcProcessingTime  CallQueueLength RpcQueueTime
04-19 13:10:22         0.120               0              0.031 
04-19 13:10:24         0.505            3200             10.619 
04-19 13:10:25      1038.286            3200            265.548
{code}
Based on logs, we can come to conclusions:
 1. only two JournalNodes active at that time since logs show {{SyncTimes(ms): 
42337 29016}}
 2. 1049 {{FSEditLogOp}} transations (finalize TxId of NameNode before fatal: 
15221914929 - endTxId of JournalNode after recovery: 15221913880) are still in 
{{bufCurrent}} of EditsDoubleBuffer and do not flushed out.
 3. All write of NameNode from DFSClient request still not return at that time, 
and FsEditLogOps of DFSClient's write op. still continue to put into 
{{bufCurrent}}.


was (Author: hexiaoqiao):
I would like to provide more detail information:
A. NameNode logs before fatal which is same as [~yzhangal] mentioned above:
{code:java}
2018-04-19 13:10:24,222 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Rolling edit logs
2018-04-19 13:10:24,222 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Ending log segment 15221708217
2018-04-19 13:10:24,595 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Number of transactions: 206713 Total time for transactions(ms): 2575 Number of 
transactions batched in Syncs: 8218 Number of syncs: 138767 SyncTimes(ms): 
42337 29016
2018-04-19 13:10:25,264 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Error: finalize log segment 15221708217, 15221914929 failed for required 
journal (JournalAndStream(mgr=QJM to [0.0.0.0:8485, 0.0.0.0:8485, 
0.0.0.0:8485], stream=QuorumOutputStream starting at txid 15221708217))
java.io.IOException: FSEditStream has 160370 bytes still to be flushed and 
cannot be closed.
        at 
org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66)
        at 
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
        at 
org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1274)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1203)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1293)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6084)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1295)
        at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
        at 
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1690)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
2018-04-19 13:10:25,390 WARN 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting 
QuorumOutputStream starting at txid 15221708217
{code}
B. JournalNode logs when Standby NameNode transition To Active and recovery 
EditLog:
{code:java}
2018-04-19 13:10:47,137 INFO org.apache.hadoop.hdfs.qjournal.server.Journal: 
Accepted recovery for segment 15221708217: segmentState { startTxId: 
15221708217 endTxId: 15221913880 isInProgress: true } acceptedInEpoch: 24
{code}
C. CallQueue of NameNode is full and lasted no more than three seconds.
{code:java}
Date   Time         RpcProcessingTimeAvgTime  CallQueueLength 
RpcQueueTimeAvgTime
04-19 13:10:22                                          0.12                    
        0                              0.031 
04-19 13:10:24                                        0.505                     
 3200                            10.619 
04-19 13:10:25                                  1038.286                      
3200                          265.548
{code}

Based on logs, we can come to conclusions:
1. only two JournalNodes active at that time since logs show {{SyncTimes(ms): 
42337 29016}}
2. 1049 {{FSEditLogOp}} transations (finalize TxId of NameNode before fatal: 
15221914929 - endTxId of JournalNode after recovery: 15221913880)  are still in 
{{bufCurrent}} of EditsDoubleBuffer and do not flushed out.
3. All write of NameNode from DFSClient request still not return at that time, 
and FsEditLogOps of DFSClient's write op. still continue to put into 
{{bufCurrent}}.

> rollEditLog expects empty EditsDoubleBuffer.bufCurrent which is not guaranteed
> ------------------------------------------------------------------------------
>
>                 Key: HDFS-10943
>                 URL: https://issues.apache.org/jira/browse/HDFS-10943
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Priority: Major
>
> Per the following trace stack:
> {code}
> FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: finalize log 
> segment 10562075963, 10562174157 failed for required journal 
> (JournalAndStream(mgr=QJM to [0.0.0.1:8485, 0.0.0.2:8485, 0.0.0.3:8485, 
> 0.0.0.4:8485, 0.0.0.5:8485], stream=QuorumOutputStream starting at txid 
> 10562075963))
> java.io.IOException: FSEditStream has 49708 bytes still to be flushed and 
> cannot be closed.
>         at 
> org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:66)
>         at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.close(QuorumOutputStream.java:65)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
>         at 
> org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1243)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1172)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1243)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:6437)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:1002)
>         at 
> org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:142)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:12025)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> 2016-09-23 21:40:59,618 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting 
> QuorumOutputStream starting at txid 10562075963
> {code}
> The exception is from  EditsDoubleBuffer
> {code}
>  public void close() throws IOException {
>     Preconditions.checkNotNull(bufCurrent);
>     Preconditions.checkNotNull(bufReady);
>     int bufSize = bufCurrent.size();
>     if (bufSize != 0) {
>       throw new IOException("FSEditStream has " + bufSize
>           + " bytes still to be flushed and cannot be closed.");
>     }
>     IOUtils.cleanup(null, bufCurrent, bufReady);
>     bufCurrent = bufReady = null;
>   }
> {code}
> We can see that FSNamesystem.rollEditLog expects  
> EditsDoubleBuffer.bufCurrent to be empty.
> Edits are recorded via FSEditLog$logSync, which does:
> {code}
>    * The data is double-buffered within each edit log implementation so that
>    * in-memory writing can occur in parallel with the on-disk writing.
>    *
>    * Each sync occurs in three steps:
>    *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>    *      flag.
>    *   2. unsynchronized, it flushes the data to storage
>    *   3. synchronized, it resets the flag and notifies anyone waiting on the
>    *      sync.
>    *
>    * The lack of synchronization on step 2 allows other threads to continue
>    * to write into the memory buffer while the sync is in progress.
>    * Because this step is unsynchronized, actions that need to avoid
>    * concurrency with sync() should be synchronized and also call
>    * waitForSyncToFinish() before assuming they are running alone.
>    */
> {code}
> We can see that step 2 is on-purposely not synchronized to let other threads 
> to write into the memory buffer, presumbaly EditsDoubleBuffer.bufCurrent. 
> This means that the EditsDoubleBuffer.bufCurrent  can be non-empty when 
> logSync is done.
> Now if rollEditLog happens, the above exception happens.
> Another interesting observation is, the size of the EditsDoubleBuffer can be 
> as large as "private int outputBufferCapacity = 512 * 1024;", which means a 
> lot of edits could get buffered before they are flushed to JNs. 
> How can rollEdit operation expect the EditsDoubleBuffer.bufCurrent to be 
> empty? Or ask in another way, ollEdit operation want to close the stream
> {code}
> org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
> {code}
> where the close operation expects EditsDoubleBuffer.bufCurrent to be empty. 
> It  seems rollEdit should make sure the EditsDoubleBuffer is flushed in a 
> synchronized way before it tries to close the stream. Why it doesn't do that?
> If my above theory is correct, wonder why this issue doesn't show up so often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to