[
https://issues.apache.org/jira/browse/IGNITE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexandr Shapkin reassigned IGNITE-28013:
-----------------------------------------
Assignee: Anton Laletin
> Lease shouldn't be prolonged for node UUID out of the current logical
> topology snapshot
> ---------------------------------------------------------------------------------------
>
> Key: IGNITE-28013
> URL: https://issues.apache.org/jira/browse/IGNITE-28013
> Project: Ignite
> Issue Type: Bug
> Reporter: Mikhail Efremov
> Assignee: Anton Laletin
> Priority: Major
> Labels: ignite-3
>
> *Description*
> {{ItHighAvailablePartitionsRecoveryByFilterUpdateTest#testSeveralHaResetsAndSomeNodeRestart}}
> with a default zone with 25+ partitions fails with guarantee due to the
> follow assertion fail:
> {code:java}
> 2026-02-25T10:42:38,771][ERROR][%ihaprbfut_tshrasnr_3344%lease-updater][FailureManager]
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeFailureHandler [nodeName=ihaprbfut_tshrasnr_3344,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=CRITICAL_ERROR, failureCtxId=5e7318b6-f5ed-4e93-b526-cfdfd8ed377e]
> org.apache.ignite.internal.failure.StackTraceCapturingException: Error
> occurred when updating the leases.
> at
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:199)
> at
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:176)
> at
> org.apache.ignite.internal.placementdriver.LeaseUpdater$Updater.run(LeaseUpdater.java:394)
> at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: java.lang.AssertionError: 8
> at
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.packNodesInfo(LeaseBatchSerializer.java:350)
> at
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writeLease(LeaseBatchSerializer.java:323)
> at
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writeLeasesForObject(LeaseBatchSerializer.java:279)
> at
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writePartitionedGroupLeases(LeaseBatchSerializer.java:245)
> at
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writeExternalData(LeaseBatchSerializer.java:169)
> at
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writeExternalData(LeaseBatchSerializer.java:109)
> at
> org.apache.ignite.internal.versioned.VersionedSerializer.writeExternal(VersionedSerializer.java:71)
> at
> org.apache.ignite.internal.versioned.VersionedSerialization.toBytes(VersionedSerialization.java:52)
> {code}
> This means that {{NodesDictionary}} contains {{nameIndexToName.size()}} less
> or equal to 8 due to {{holderIdAndProposedCandidateFitIn1Byte}} but later we
> got index greater or equal 8 during {{packNodesInfo}} from dictionary
> {{idToNodeIndex}} map.
> So, we some why have a node with the same consistentId, but different UUIDs
> (test case: restart almost all 8 nodes -- corner case). But it mostly not the
> dictionary issue: we shouldn't have such lease batch at all. The root cause
> is in {{tryToFindCandidateAmongAssignments}}:
> {code:java}
> // Check whether given assignments is actually available in logical topology.
> It's a best effort check because it's possible
> // for proposed primary candidate to leave the topology at any
> time. In that case primary candidate will be recalculated.
> InternalClusterNode candidateNode =
> topologyTracker.nodeByConsistentId(assignment.consistentId());
> if (candidateNode == null) {
> continue;
> }
> {code}
> We're looking up for consistent ID node name instead of UUID, this leads to
> leases for a partitions with leaseholders with the same ID, but different
> UUIDs. This should be fixed.
> *Motivation*
> We shouldn't have leases in a batch with nodes UUID that aren't in the actual
> logical topology.
> *Definition of done*
> # Lease candidate is looking up based on UUID.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)