[
https://issues.apache.org/jira/browse/IGNITE-25808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18010635#comment-18010635
]
Vladislav Pyatkov commented on IGNITE-25808:
--------------------------------------------
Merged ac029ad96da4dbe6d62eca64367b553b52e96c4c
> Lease negotiator may flood the log if a candidate left and topology tracker
> not updated
> ---------------------------------------------------------------------------------------
>
> Key: IGNITE-25808
> URL: https://issues.apache.org/jira/browse/IGNITE-25808
> Project: Ignite
> Issue Type: Bug
> Reporter: Denis Chudov
> Assignee: Denis Chudov
> Priority: Major
> Labels: ignite-3
> Time Spent: 20m
> Remaining Estimate: 0h
>
> If there are hundreds of tables and there is a delay in the placement driver
> TopologyTracker for any reason, the placement driver knows nothing about the
> topology change and keeps choosing the offline node as a candidate. It is not
> a critical issue, but may fill the logs with warnings in just a few seconds:
> {code:java}
> 2025-05-06 10:06:44:672 +0000
> [WARNING][%gridgain-2.novalocal%JRaft-FSMCaller-Disruptor-metastorage_group_stripe_0-0][LeaseNegotiator]
> Lease was not negotiated due to exception [lease=Lease
> [leaseholder=gridgain-5.novalocal,
> leaseholderId=3844f6cb-ee97-49bd-8d40-5b96d0a6bc82, accepted=false,
> startTime=HybridTimestamp [physical=2025-05-06 10:06:44:602 +0000,
> logical=114, composite=114460328237596786], expirationTime=HybridTimestamp
> [physical=2025-05-06 10:08:44:602 +0000, logical=0,
> composite=114460336101916672], prolongable=true, proposedCandidate=null,
> replicationGroupId=590_part_13]]
> org.apache.ignite.internal.network.UnresolvableConsistentIdException:
> IGN-NETWORK-1 TraceId:57c4ef42-4786-431b-a6ff-a7d79edcfd6a Recipient
> consistent ID cannot be resolved: gridgain-5.novalocal
> at
> org.apache.ignite.internal.network.DefaultMessagingService.invoke(DefaultMessagingService.java:231)
> at
> org.apache.ignite.internal.network.MessagingService.invoke(MessagingService.java:190)
> at
> org.apache.ignite.internal.placementdriver.negotiation.LeaseNegotiator.negotiate(LeaseNegotiator.java:63)
> at
> org.apache.ignite.internal.placementdriver.LeaseUpdater$Updater.lambda$updateLeaseBatchInternal$0(LeaseUpdater.java:590)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> at
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
> at
> org.apache.ignite.internal.raft.RaftGroupServiceImpl.lambda$sendWithRetry$49(RaftGroupServiceImpl.java:603)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
> at
> java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
> at
> java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147)
> at
> org.apache.ignite.internal.network.DefaultMessagingService.onInvokeResponse(DefaultMessagingService.java:575)
> at
> org.apache.ignite.internal.network.DefaultMessagingService.send0(DefaultMessagingService.java:261)
> at
> org.apache.ignite.internal.network.DefaultMessagingService.respond(DefaultMessagingService.java:204)
> at
> org.apache.ignite.internal.network.MessagingService.respond(MessagingService.java:107)
> at
> org.apache.ignite.raft.jraft.rpc.impl.IgniteRpcServer$NetworkRpcContext.sendResponse(IgniteRpcServer.java:240)
> at
> org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor.sendResponse(ActionRequestProcessor.java:268)
> at
> org.apache.ignite.raft.jraft.rpc.impl.ActionRequestProcessor$1.result(ActionRequestProcessor.java:177)
> at
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$WriteCommandIterator$1.result(JraftServerImpl.java:995)
> at
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler$ResultCachingClosure.result(MetaStorageWriteHandler.java:453)
> at
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleWriteWithTime(MetaStorageWriteHandler.java:217)
> at
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleNonCachedWriteCommand(MetaStorageWriteHandler.java:153)
> at
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageWriteHandler.handleWriteCommand(MetaStorageWriteHandler.java:123)
> at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
> at
> org.apache.ignite.internal.metastorage.server.raft.MetaStorageListener.onWrite(MetaStorageListener.java:193)
> at
> org.apache.ignite.internal.raft.server.impl.JraftServerImpl$DelegatingStateMachine.onApply(JraftServerImpl.java:825)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doApplyTasks(FSMCallerImpl.java:570)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.doCommitted(FSMCallerImpl.java:536)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl.runApplyTask(FSMCallerImpl.java:454)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:123)
> at
> org.apache.ignite.raft.jraft.core.FSMCallerImpl$ApplyTaskHandler.onEvent(FSMCallerImpl.java:117)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:322)
> at
> org.apache.ignite.raft.jraft.disruptor.StripedDisruptor$StripeEntryHandler.onEvent(StripedDisruptor.java:279)
> at
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:167)
> at
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:122)
> at java.base/java.lang.Thread.run(Thread.java:840){code}
> As a possible solution, we may throttle this log.
> *Scenario:*
> * big number of replication groups
> * lease updater chooses lease candidates
> * one node leaves the cluster
> * lease negotiator fails to start the lease agreements for leases that have
> the left node as a candidate
> * on each lease, it writes an error to log and there are too many of them
> (as the number of replication groups is big).
> We can print the error just once per negotiation round.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)