[ 
https://issues.apache.org/jira/browse/IGNITE-23642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis Chudov updated IGNITE-23642:
----------------------------------
    Description: 
There is a cluster with 3 nodes and 1200 partitions in total (400 per node). 
When the cluster is restarted, each node recovers the Metastorage successfully, 
its leader is elected, then partitions recovery is started. This results in a 
lot of exceptions like the following in logs:

 
{code:java}
2024-11-08 13:23:28:845 +0000 [INFO][%node1%tableManager-io-15][NodeImpl] Node 
<48_part_3/node1> start vote and grant vote self, term=1.
2024-11-08 13:23:28:846 +0000 
[ERROR][%node1%Raft-Group-Client-14][RebalanceUtil] Exception on updating 
assignments for [tableId=38, name=INVENTORY, partition=23]
java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 
7, groupId = metastorage_group, traceId = 5f329100-3de7-4ab8-a796-9969b7b91b22].
        at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
Source)
        at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown 
Source)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
        at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown 
Source)
        at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559)
        at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at java.base/java.lang.Thread.run(Unknown Source){code}
 

Also, there is another stack trace:

 
{code:java}
2024-11-08 13:27:03:523 +0000 
[WARNING][%node1%rebalance-scheduler-11][RebalanceRaftGroupEventsListener] 
Unable to start rebalance [tablePartitionId, term=44_part_45] 
java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: 
Send with retry timed out [retryCount = 7, groupId = metastorage_group, traceId 
= d52b447e-3c40-4f4b-9c67-863be811b0cb]. 
at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source) 
at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) at 
org.apache.ignite.internal.distributionzones.rebalance.RebalanceRaftGroupEventsListener.lambda$onLeaderElected$0(RebalanceRaftGroupEventsListener.java:167)
 
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown 
Source) 
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) 
at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source) 
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
at java.base/java.lang.Thread.run(Unknown Source) 
Caused by: java.util.concurrent.TimeoutException: Send with retry timed out 
[retryCount = 7, groupId = metastorage_group, traceId = 
d52b447e-3c40-4f4b-9c67-863be811b0cb]. at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559)
 
at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750)
 ... 6 more{code}
 

It seems that an avalanche of Metastorage accesses by hundreds of starting 
partitions overloads the Metastorage leader, so recovery fails with 
TimeoutExceptions.

We could probably solve this by establishing some kind of rate limiting on 
Metastorage accesses. We could implement this just for the recovery procedure 
or for normal operation as well.

High-priority accesses (Metastorage SafeTime propagation, Lease updates) should 
not be subject to rate limiting.

  was:
There is a cluster with 3 nodes and 1200 partitions in total (400 per node). 
When the cluster is restarted, each node recovers the Metastorage successfully, 
its leader is elected, then partitions recovery is started. This results in a 
lot of exceptions like the following in logs:

 

2024-11-08 13:23:28:845 +0000 [INFO][%node1%tableManager-io-15][NodeImpl] Node 
<48_part_3/node1> start vote and grant vote self, term=1.
2024-11-08 13:23:28:846 +0000 
[ERROR][%node1%Raft-Group-Client-14][RebalanceUtil] Exception on updating 
assignments for [tableId=38, name=INVENTORY, partition=23]
java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 
7, groupId = metastorage_group, traceId = 5f329100-3de7-4ab8-a796-9969b7b91b22].
        at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source)
        at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
Source)
        at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown 
Source)
        at 
java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
        at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown 
Source)
        at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559)
        at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750)
        at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
        at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at java.base/java.lang.Thread.run(Unknown Source)

 

Also, there is another stack trace:

 

2024-11-08 13:27:03:523 +0000 
[WARNING][%node1%rebalance-scheduler-11][RebalanceRaftGroupEventsListener] 
Unable to start rebalance [tablePartitionId, term=44_part_45] 
java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: 
Send with retry timed out [retryCount = 7, groupId = metastorage_group, traceId 
= d52b447e-3c40-4f4b-9c67-863be811b0cb]. at 
java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source) at 
java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) at 
org.apache.ignite.internal.distributionzones.rebalance.RebalanceRaftGroupEventsListener.lambda$onLeaderElected$0(RebalanceRaftGroupEventsListener.java:167)
 at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown 
Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source) at java.base/java.lang.Thread.run(Unknown Source) Caused by: 
java.util.concurrent.TimeoutException: Send with retry timed out [retryCount = 
7, groupId = metastorage_group, traceId = 
d52b447e-3c40-4f4b-9c67-863be811b0cb]. at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559)
 at 
org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750)
 ... 6 more

 

It seems that an avalanche of Metastorage accesses by hundreds of starting 
partitions overloads the Metastorage leader, so recovery fails with 
TimeoutExceptions.

We could probably solve this by establishing some kind of rate limiting on 
Metastorage accesses. We could implement this just for the recovery procedure 
or for normal operation as well.

High-priority accesses (Metastorage SafeTime propagation, Lease updates) should 
not be subject to rate limiting.


> Unable to start a node due to too many assignments recovered
> ------------------------------------------------------------
>
>                 Key: IGNITE-23642
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23642
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Assignee: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>
> There is a cluster with 3 nodes and 1200 partitions in total (400 per node). 
> When the cluster is restarted, each node recovers the Metastorage 
> successfully, its leader is elected, then partitions recovery is started. 
> This results in a lot of exceptions like the following in logs:
>  
> {code:java}
> 2024-11-08 13:23:28:845 +0000 [INFO][%node1%tableManager-io-15][NodeImpl] 
> Node <48_part_3/node1> start vote and grant vote self, term=1.
> 2024-11-08 13:23:28:846 +0000 
> [ERROR][%node1%Raft-Group-Client-14][RebalanceUtil] Exception on updating 
> assignments for [tableId=38, name=INVENTORY, partition=23]
> java.util.concurrent.CompletionException: 
> java.util.concurrent.TimeoutException: Send with retry timed out [retryCount 
> = 7, groupId = metastorage_group, traceId = 
> 5f329100-3de7-4ab8-a796-9969b7b91b22].
>         at 
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown 
> Source)
>         at 
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown 
> Source)
>         at 
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(Unknown 
> Source)
>         at 
> java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
>         at 
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
>  Source)
>         at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559)
>         at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750)
>         at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>         at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
>         at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
>  Source)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>         at java.base/java.lang.Thread.run(Unknown Source){code}
>  
> Also, there is another stack trace:
>  
> {code:java}
> 2024-11-08 13:27:03:523 +0000 
> [WARNING][%node1%rebalance-scheduler-11][RebalanceRaftGroupEventsListener] 
> Unable to start rebalance [tablePartitionId, term=44_part_45] 
> java.util.concurrent.ExecutionException: 
> java.util.concurrent.TimeoutException: Send with retry timed out [retryCount 
> = 7, groupId = metastorage_group, traceId = 
> d52b447e-3c40-4f4b-9c67-863be811b0cb]. 
> at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source) 
> at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) at 
> org.apache.ignite.internal.distributionzones.rebalance.RebalanceRaftGroupEventsListener.lambda$onLeaderElected$0(RebalanceRaftGroupEventsListener.java:167)
>  
> at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown 
> Source) 
> at java.base/java.util.concurrent.FutureTask.run(Unknown Source) 
> at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
>  Source) 
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source) 
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source) 
> at java.base/java.lang.Thread.run(Unknown Source) 
> Caused by: java.util.concurrent.TimeoutException: Send with retry timed out 
> [retryCount = 7, groupId = metastorage_group, traceId = 
> d52b447e-3c40-4f4b-9c67-863be811b0cb]. at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.sendWithRetry(RaftGroupServiceImpl.java:559)
>  
> at 
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$scheduleRetry$40(RaftGroupServiceImpl.java:750)
>  ... 6 more{code}
>  
> It seems that an avalanche of Metastorage accesses by hundreds of starting 
> partitions overloads the Metastorage leader, so recovery fails with 
> TimeoutExceptions.
> We could probably solve this by establishing some kind of rate limiting on 
> Metastorage accesses. We could implement this just for the recovery procedure 
> or for normal operation as well.
> High-priority accesses (Metastorage SafeTime propagation, Lease updates) 
> should not be subject to rate limiting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to