[
https://issues.apache.org/jira/browse/RATIS-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938791#comment-17938791
]
Xinhao GU commented on RATIS-2271:
----------------------------------
Hi [~szetszwo] , I want to fix this problem please.
> Leadership Loss Causes ClosedByInterruptException and NullPointerException in
> LogAppender Thread
> ------------------------------------------------------------------------------------------------
>
> Key: RATIS-2271
> URL: https://issues.apache.org/jira/browse/RATIS-2271
> Project: Ratis
> Issue Type: Improvement
> Components: gRPC, Leader
> Reporter: Xinhao GU
> Assignee: Sumit Agrawal
> Priority: Major
> Attachments: image-2025-03-25-14-40-32-711.png,
> image-2025-03-25-14-49-11-998.png, image-2025-03-25-15-05-41-424.png,
> image-2025-03-25-15-06-43-276.png, image-2025-03-25-15-15-50-750.png
>
>
> *After a leader loses leadership due to heartbeat timeout with a majority of
> followers, it forcibly interrupts the {{GrpcLogAppender}} thread.*
> This abrupt termination leads to two critical exceptions during log file
> reads:
> {{1. ClosedByInterruptException}} when initializing
> {{{}SegmentedRaftLogInputStream{}}}.
> {code:java}
> 2025-01-18 00:29:31,472
> [13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon] ERROR
> o.a.r.s.r.s.SegmentedRaftLogInputStream:107 - caught exception initializing
> log_455-480 java.nio.channels.ClosedByInterruptException: null at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:164) at
> sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65) at
> sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109) at
> sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at
> java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at
> java.io.BufferedInputStream.read(BufferedInputStream.java:345) at
> java.io.FilterInputStream.read(FilterInputStream.java:133) at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader$LimitedInputStream.read(SegmentedRaftLogReader.java:96)
> at java.io.DataInputStream.read(DataInputStream.java:149) at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogReader.verifyHeader(SegmentedRaftLogReader.java:172)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.init(SegmentedRaftLogInputStream.java:86)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogInputStream.nextEntry(SegmentedRaftLogInputStream.java:105)
> at
> org.apache.ratis.server.raftlog.segmented.LogSegment.readSegmentFile(LogSegment.java:132)
> at
> org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:238)
> at
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:348)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:296)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
> at
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)
> at
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
> at java.lang.Thread.run(Thread.java:748) {code}
> 2. A cascading {{NullPointerException}} in {{LogSegment.loadCache()}} due to
> incomplete log loading
> {code:java}
> 2025-01-18 00:29:32,055
> [13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon] WARN
> o.a.r.s.l.LogAppenderDaemon:89 -
> 13@group-00020000000F->15-GrpcLogAppender-LogAppenderDaemon failed
> org.apache.ratis.server.raftlog.RaftLogIOException:
> java.lang.NullPointerException at
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:350)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.get(SegmentedRaftLog.java:296)
> at
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.getEntryWithData(SegmentedRaftLog.java:301)
> at
> org.apache.ratis.server.leader.LogAppenderBase.newAppendEntriesRequest(LogAppenderBase.java:240)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:387)
> at
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:262)
> at
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
> at java.lang.Thread.run(Thread.java:748)Caused by:
> java.lang.NullPointerException: null at
> java.util.Objects.requireNonNull(Objects.java:203) at
> org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:247)
> at
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:348)
> ... 7 common frames omitted {code}
>
> *The relevant code is:*
> !image-2025-03-25-14-40-32-711.png!
> !image-2025-03-25-15-15-50-750.png|width=1001,height=633!
>
> {*}We expect Behaviors are like{*}:
> * Graceful termination of {{GrpcLogAppender}} thread without interrupting
> in-progress I/O operations.
> * Proper resource cleanup (e.g., file handles) before thread termination.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)