Xinhao GU created RATIS-2271:
--------------------------------

             Summary: Leadership Loss Causes ClosedByInterruptException and 
NullPointerException in LogAppender Thread
                 Key: RATIS-2271
                 URL: https://issues.apache.org/jira/browse/RATIS-2271
             Project: Ratis
          Issue Type: Improvement
          Components: gRPC, Leader
            Reporter: Xinhao GU
         Attachments: image-2025-03-25-14-40-32-711.png, 
image-2025-03-25-14-49-11-998.png, image-2025-03-25-15-05-41-424.png, 
image-2025-03-25-15-06-43-276.png

After a leader loses leadership due to heartbeat timeout with a majority of 
followers, it forcibly interrupts the {{GrpcLogAppender}} thread.

This abrupt termination leads to two critical exceptions during log file reads:

{{1. ClosedByInterruptException}} when initializing 
{{{}SegmentedRaftLogInputStream{}}}.
{code:java}
2025-01-18 00:29:31,472 
[13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon] ERROR 
o.a.r.s.r.s.SegmentedRaftLogInputStream:107 - caught exception initializing 
log_455-480 java.nio.channels.ClosedByInterruptException: null    at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164)    at 
sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)    at 
sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)    at 
sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)    at 
java.io.BufferedInputStream.fill(BufferedInputStream.java:246)    at 
java.io.BufferedInputStream.read1(BufferedInputStream.java:286)    at 
java.io.BufferedInputStream.read(BufferedInputStream.java:345)    at 
java.io.FilterInputStream.read(FilterInputStream.java:133)    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader$LimitedInputStream.read(SegmentedRaftLogReader.java:96)
    at java.io.DataInputStream.read(DataInputStream.java:149)    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.verifyHeader(SegmentedRaftLogReader.java:172)
    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.init(SegmentedRaftLogInputStream.java:86)
    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:105)
    at 
org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:132)
    at 
org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:238)
    at 
org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:348)
    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:296)
    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
    at 
org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
    at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
    at 
org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)    
at 
org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80) 
   at java.lang.Thread.run(Thread.java:748) {code}
2. A cascading {{NullPointerException}} in {{LogSegment.loadCache()}} due to 
incomplete log loading

{code:java}
2025-01-18 00:29:32,055 
[13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon] WARN  
o.a.r.s.l.LogAppenderDaemon:89 - 
13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon failed 
org.apache.ratis.server.raftlog.RaftLogIOException: 
java.lang.NullPointerException    at 
org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:350)
    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:296)
    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
    at 
org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
    at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
    at 
org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)    
at 
org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80) 
   at java.lang.Thread.run(Thread.java:748)Caused by: 
java.lang.NullPointerException: null    at 
java.util.Objects.requireNonNull(Objects.java:203)    at 
org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:247)
    at 
org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:348)
    ... 7 common frames omitted  {code}
 

The relevant code is: 

!image-2025-03-25-14-40-32-711.png!

!image-2025-03-25-14-49-28-802.png!

 

{*}We expect Behaviors are like{*}:
 * Graceful termination of {{GrpcLogAppender}} thread without interrupting 
in-progress I/O operations.
 * Proper resource cleanup (e.g., file handles) before thread termination.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to