[ 
https://issues.apache.org/jira/browse/RATIS-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17945837#comment-17945837
 ] 

Xinhao GU commented on RATIS-2282:
----------------------------------

The  *Log entry not found* occurs again.
2025-04-19 15:41:39,574 
[5@group-000100000001->2-GrpcLogAppender-LogAppenderDaemon] WARN  
o.a.r.s.l.LogAppenderDaemon:89 - 
5@group-000100000001->2-GrpcLogAppender-LogAppenderDaemon failed 
org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: index 
= 454    at 
org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.retainEntryWithData(SegmentedRaftLog.java:337)
    at 
org.apache.ratis.server.leader.LogAppenderBase.readLogEntries(LogAppenderBase.java:376)
    at 
org.apache.ratis.server.leader.LogAppenderBase.nextAppendEntriesRequest(LogAppenderBase.java:332)
    at 
org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:403)
    at 
org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:269)    
at 
org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80) 
   at java.base/java.lang.Thread.run(Thread.java:829)
 

Because leader's nextIndex for AppendEntriesRequest is calculated from 
+records.getLast()+ instead of endIndex.

 
{code:java}
TermIndex getLastTermIndex() {
  final LogRecord last = records.getLast();
  return last == null ? null : last.getTermIndex();
}

LogRecord getLast() {
  final Map.Entry<Long, LogRecord> last = map.lastEntry();
  return last != null? last.getValue() : null;
} {code}
 

*We can modify the getLogRecord function:*

!image-2025-04-19-16-37-16-322.png!

!image-2025-04-19-16-37-34-179.png!

 

*After this change, the log not found ERROR disappears.*

 

> LogAppender Restart Due to Premature Log Entry Access During Concurrent Write 
> Processing
> ----------------------------------------------------------------------------------------
>
>                 Key: RATIS-2282
>                 URL: https://issues.apache.org/jira/browse/RATIS-2282
>             Project: Ratis
>          Issue Type: Bug
>          Components: Leader
>            Reporter: Xinhao GU
>            Assignee: Xinhao GU
>            Priority: Major
>         Attachments: a.patch, image-2025-04-18-17-36-36-096.png, 
> image-2025-04-19-16-37-16-322.png, image-2025-04-19-16-37-34-179.png
>
>
> We observed two recurring patterns of {{GrpcLogAppender}} restarts caused by 
> log entry access failures:
> h3. Problem 1: Log entry not found
> {code:java}
> 2025-04-14 11:47:39,566 
> [4@group-000100000002->5-GrpcLogAppender-LogAppenderDaemon] WARN  
> o.a.r.s.l.LogAppenderDaemon:89 - 
> 4@group-000100000002->5-GrpcLogAppender-LogAppenderDaemon failed 
> org.apache.ratis.server.raftlog.RaftLogIOException: Log entry not found: 
> index = 376
>     at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.retainEntryWithData(SegmentedRaftLog.java:337)
>     at 
> org.apache.ratis.server.leader.LogAppenderBase.readLogEntries(LogAppenderBase.java:376)
>     at 
> org.apache.ratis.server.leader.LogAppenderBase.nextAppendEntriesRequest(LogAppenderBase.java:332)
>     at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:403)
>     at 
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:269)
>     at 
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
>     at java.base/java.lang.Thread.run(Thread.java:829)
>     
> 2025-04-14 11:47:39,568 
> [4@group-000100000002->5-GrpcLogAppender-LogAppenderDaemon] INFO  
> o.a.r.s.i.LeaderStateImpl:688 - 4@group-000100000002-LeaderStateImpl: 
> Restarting GrpcLogAppender for 4@group-000100000002->5      {code}
> h3. Problem 2: Failed to load log entry
> {code:java}
> 2025-04-14 22:06:33,971 
> [3@group-000100000002->2-GrpcLogAppender-LogAppenderDaemon] WARN  
> o.a.r.s.l.LogAppenderDaemon:89 - 
> 3@group-000100000002->2-GrpcLogAppender-LogAppenderDaemon failed 
> org.apache.ratis.server.raftlog.RaftLogIOException: Failed to load log entry 
> (t:2, i:118)
>     at 
> org.apache.ratis.server.raftlog.segmented.LogSegment$LogEntryLoader.load(LogSegment.java:306)
>     at 
> org.apache.ratis.server.raftlog.segmented.LogSegment.loadCache(LogSegment.java:520)
>     at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.retainLog(SegmentedRaftLog.java:325)
>     at 
> org.apache.ratis.server.raftlog.segmented.SegmentedRaftLog.retainEntryWithData(SegmentedRaftLog.java:335)
>     at 
> org.apache.ratis.server.leader.LogAppenderBase.readLogEntries(LogAppenderBase.java:376)
>     at 
> org.apache.ratis.server.leader.LogAppenderBase.nextAppendEntriesRequest(LogAppenderBase.java:332)
>     at 
> org.apache.ratis.grpc.server.GrpcLogAppender.appendLog(GrpcLogAppender.java:403)
>     at 
> org.apache.ratis.grpc.server.GrpcLogAppender.run(GrpcLogAppender.java:269)
>     at 
> org.apache.ratis.server.leader.LogAppenderDaemon.run(LogAppenderDaemon.java:80)
>     at java.lang.Thread.run(Thread.java:748)
>     
> 2025-04-14 22:06:34,001 
> [3@group-000100000002->2-GrpcLogAppender-LogAppenderDaemon] INFO  
> o.a.r.s.i.LeaderStateImpl:688 - 3@group-000100000002-LeaderStateImpl: 
> Restarting GrpcLogAppender for 3@group-000100000002->2          {code}
> h3. *Cause Analysis:*
> Both exceptions occur due to race conditions between:
>  # Leader writes log entry into memory cache and disk
>  # Leader sends log entry to followers via {{GrpcLogAppender}}
> The current implementation allows log replication to attempt accessing 
> entries before they're written. This is evidenced by:
>  * The successful prevention of errors through experimental sleep delays in 
> critical code paths
> !image-2025-04-18-17-36-36-096.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to