Xinhao GU created RATIS-2271:
--------------------------------
Summary: Leadership Loss Causes ClosedByInterruptException and
NullPointerException in LogAppender Thread
Key: RATIS-2271
URL: https://issues.apache.org/jira/browse/RATIS-2271
Project: Ratis
Issue Type: Improvement
Components: gRPC, Leader
Reporter: Xinhao GU
Attachments: image-2025-03-25-14-40-32-711.png,
image-2025-03-25-14-49-11-998.png, image-2025-03-25-15-05-41-424.png,
image-2025-03-25-15-06-43-276.png
After a leader loses leadership due to heartbeat timeout with a majority of
followers, it forcibly interrupts the {{GrpcLogAppender}} thread.
This abrupt termination leads to two critical exceptions during log file reads:
{{1. ClosedByInterruptException}} when initializing
{{{}SegmentedRaftLogInputStream{}}}.
{code:java}
2025-01-18 00:29:31,472
[13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon] ERROR
o.a.r.s.r.s.SegmentedRaftLogInputStream:107 - caught exception initializing
log_455-480 java.nio.channels.ClosedByInterruptException: null at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164) at
sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65) at
sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109) at
sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at
java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at
java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at
java.io.BufferedInputStream.read(BufferedInputStream.java:345) at
java.io.FilterInputStream.read(FilterInputStream.java:133) at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader$LimitedInputStream.read(SegmentedRaftLogReader.java:96)
at java.io.DataInputStream.read(DataInputStream.java:149) at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.verifyHeader(SegmentedRaftLogReader.java:172)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.init(SegmentedRaftLogInputStream.java:86)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:105)
at
org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:132)
at
org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:238)
at
org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:348)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:296)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
at
org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
at
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
at
org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)
at
org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
at java.lang.Thread.run(Thread.java:748) {code}
2. A cascading {{NullPointerException}} in {{LogSegment.loadCache()}} due to
incomplete log loading
{code:java}
2025-01-18 00:29:32,055
[13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon] WARN
o.a.r.s.l.LogAppenderDaemon:89 -
13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon failed
org.apache.ratis.server.raftlog.RaftLogIOException:
java.lang.NullPointerException at
org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:350)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:296)
at
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
at
org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
at
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
at
org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)
at
org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
at java.lang.Thread.run(Thread.java:748)Caused by:
java.lang.NullPointerException: null at
java.util.Objects.requireNonNull(Objects.java:203) at
org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:247)
at
org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:348)
... 7 common frames omitted {code}
The relevant code is:
!image-2025-03-25-14-40-32-711.png!
!image-2025-03-25-14-49-28-802.png!
{*}We expect Behaviors are like{*}:
* Graceful termination of {{GrpcLogAppender}} thread without interrupting
in-progress I/O operations.
* Proper resource cleanup (e.g., file handles) before thread termination.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)