tomscut commented on a change in pull request #4082:
URL: https://github.com/apache/hadoop/pull/4082#discussion_r834896270



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java
##########
@@ -1509,13 +1509,18 @@ synchronized void abortCurrentLogSegment() {
    * effect.
    */
   @Override
-  public synchronized void purgeLogsOlderThan(final long minTxIdToKeep) {
+  public synchronized void purgeLogsOlderThan(long minTxIdToKeep) {
     // Should not purge logs unless they are open for write.
     // This prevents the SBN from purging logs on shared storage, for example.
     if (!isOpenForWrite()) {
       return;
     }
-    
+
+    // Reset purgeLogsFrom to avoid purging edit log which is in progress.
+    if (isSegmentOpen()) {
+      minTxIdToKeep = minTxIdToKeep > curSegmentTxId ? curSegmentTxId : 
minTxIdToKeep;

Review comment:
       Hi @jojochuang @Hexiaoqiao @ayushtkn , please also take a look. Thank 
you very much.
   
   This problem begin from inprogress edits tail. And this issue 
[HDFS-14317](https://issues.apache.org/jira/browse/HDFS-14317) does a good job 
of avoiding this problem.
   
   However, if SNN's rolledit operation is disabled accidentally by 
configuration, and ANN's automatic roll period is very long, then edit log 
which is in progress may also be purged.
   
   Although we add assertions, assertion is generally disabled in a 
production(we don't normally add `-ea` to JVM parameters). This problem and the 
logs also prove that we are not strictly ensure`(inTxIdToKeep <= 
curSegmentTxId)`. So it is dangerous for NameNode.  It may crash the active 
NameNode, because of "No log file to Finalize at Transaction ID xxx".
   ```
   2022-03-15 17:28:52,867 FATAL namenode.FSEditLog 
(JournalSet.java:mapJournalsAndReportErrors(393)) - Error: finalize log segment 
24207987, 27990692 failed for required journal (JournalAndStream(mgr=QJM
    to [xxx:8485, xxx:8485, xxx:8485, xxx:8485, xxx:8485], stream=null))
   org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
exceptions to achieve quorum size 3/5. 5 exceptions thrown:
   10.152.124.157:8485: No log file to finalize at transaction ID 24207987 ; 
journal id: ambari-test
           at 
org.apache.hadoop.hdfs.qjournal.server.Journal.finalizeLogSegment(Journal.java:656)
           at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.finalizeLogSegment(JournalNodeRpcServer.java:210)
           at 
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.finalizeLogSegment(QJournalProtocolServerSideTranslatorPB.java:205)
           at 
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:28890)
           at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:550)
           at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1094)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1066)
           at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
           at java.security.AccessController.doPrivileged(Native Method)
           at javax.security.auth.Subject.doAs(Subject.java:422)
           at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
           at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2989) 
   ```
   
   We should reset `minTxIdToKeep` to ensure that the in progress edit log is 
not purged very strict. And wait for ANN to automatically roll to finalize the 
edit log. Then, after checkpoint, ANN automatically purged the finalized 
editlog(See the stack mentioned above).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to