[
https://issues.apache.org/jira/browse/RATIS-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906167#comment-17906167
]
Tsz-wo Sze commented on RATIS-2208:
-----------------------------------
{quote}I suspect this issue is the root cause for RATIS-2203.
{quote}
How could it cause the problem?
{quote}IllegalStateException: SegmentedRaftLog: Already running a method by
{quote}
Should we avoid the exception by forcing appendEntries to wait for the previous
appendEntries?
{quote}the debug log TransactionManager.java#L79 is a little distracting
{quote}
Let's change it to just printing the size:
{code:java}
+++
b/ratis-server/src/main/java/org/apache/ratis/server/impl/TransactionManager.java
@@ -71,12 +71,6 @@ class TransactionManager {
@Override
public String toString() {
- if (contexts.isEmpty()) {
- return name + " <empty>";
- }
-
- final StringBuilder b = new StringBuilder(name);
- contexts.forEach((k, v) -> b.append("\n ").append(k).append(":
initialized? ").append(v.isInitialized()));
- return b.toString();
+ return name + ":size=" + contexts.size();
}
}
{code}
Also, let's print also the stack trace for the running thread:
{code:java}
+++
b/ratis-server-api/src/main/java/org/apache/ratis/server/raftlog/RaftLogSequentialOps.java
@@ -79,8 +79,10 @@ interface RaftLogSequentialOps {
// The current thread is already the runner.
return operation.get();
} else {
+ final Throwable cause = new Throwable("The thread already running: " +
previous);
+ cause.setStackTrace(previous.getStackTrace());
throw new IllegalStateException(
- name + ": Already running a method by " + previous + ", current="
+ current);
+ name + ": Already running a method by " + previous + ", current="
+ current, cause);
}
}
}
{code}
> IllegalStateException: SegmentedRaftLog: Already running a method by
> --------------------------------------------------------------------
>
> Key: RATIS-2208
> URL: https://issues.apache.org/jira/browse/RATIS-2208
> Project: Ratis
> Issue Type: Bug
> Components: gRPC, Leader, server
> Affects Versions: 3.1.2
> Reporter: Song Ziyang
> Assignee: Song Ziyang
> Priority: Major
>
>
> {code:java}
> 2024-12-06 18:19:18,750 [4-server-thread3] ERROR
> o.a.r.s.i.RaftServerImpl:1481 - 4@group-000200000030: Failed appendEntries*
> 9->4#3-t1,previous=(t:0, i:0),leaderCommit=9097,initializing? true,entries:
> size=9098, first=(t:1, i:0),
> CONFIGURATIONENTRY(current:id:"9"address:"172.16.2.9:10750"startupRole:FOLLOWER,
> old:) java.lang.IllegalStateException:
> 4@group-000200000030-SegmentedRaftLog: Already running a method by
> Thread[4-server-thread2,5,main], current=Thread[4-server-thread3,5,main]
> at
> org.apache.ratis.server.raftlog.RaftLogSequentialOps$Runner.runSequentially(RaftLogSequentialOps.java:80)
>
> at org.apache.ratis.server.raftlog.RaftLogBase.append(RaftLogBase.java:359)
>
> at
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1590)
>
> at
> org.apache.ratis.server.impl.RaftServerImpl.appendEntriesAsync(RaftServerImpl.java:1479)
>
> at
> org.apache.ratis.server.impl.RaftServerProxy.lambda$appendEntriesAsync$28(RaftServerProxy.java:645)
>
> at org.apache.ratis.util.JavaUtils.callAsUnchecked(JavaUtils.java:118)
> at
> org.apache.ratis.server.impl.RaftServerImpl.lambda$executeSubmitServerRequestAsync$10(RaftServerImpl.java:899)
>
> at
> java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
>
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>
> at java.base/java.lang.Thread.run(Thread.java:833) {code}
> How this issue was triggered?
>
> # Client C (IoTDB Application) adds a new node A to an existing Raft Group
> via SetConf request.
> # Leader tries to bootstrap A by sending AppendEntries with (9000+ log
> entries)
> # appendEntries operation in new node A +*takes exceptionally long time,*+
> (~1-3 ms each entry, 20+ seconds in total by estimation). Therefore, A fails
> to respond this AppendEntries request within timeout (12s as configured in
> IoTDB).
> # Leader think the bootstrapping process failed and respond to client
> notifying SetConf failure.
> # Client C retries SetConf immediately.
> # Leader tries to bootstrap A by sending AppendEntries, {+}*again*{+}.
> However, at this moment, +*the previous AppendEntries is still ongoing. That
> triggered IllegalStateException.*+
>
> This exception suggests that even one AppendEntries request size is small
> within 4-16MB, the time need to process this AppendEntries request is still
> very long if it is consisted of large amount of tiny chunk of logs. Possible
> solutions:
> # Constraint max number of entries within a AppendEntries.
> # Batch write tasks at follower side.
> # Other solutions.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)