[ 
https://issues.apache.org/jira/browse/IGNITE-28013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandr Shapkin reassigned IGNITE-28013:
-----------------------------------------

    Assignee: Anton Laletin

> Lease shouldn't be prolonged for node UUID out of the current logical 
> topology snapshot
> ---------------------------------------------------------------------------------------
>
>                 Key: IGNITE-28013
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28013
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mikhail Efremov
>            Assignee: Anton Laletin
>            Priority: Major
>              Labels: ignite-3
>
> *Description*
> {{ItHighAvailablePartitionsRecoveryByFilterUpdateTest#testSeveralHaResetsAndSomeNodeRestart}}
>  with a default zone with 25+ partitions fails with guarantee due to the 
> follow assertion fail:
> {code:java}
> 2026-02-25T10:42:38,771][ERROR][%ihaprbfut_tshrasnr_3344%lease-updater][FailureManager]
>  Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=StopNodeFailureHandler [nodeName=ihaprbfut_tshrasnr_3344, 
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
> failureCtx=CRITICAL_ERROR, failureCtxId=5e7318b6-f5ed-4e93-b526-cfdfd8ed377e]
> org.apache.ignite.internal.failure.StackTraceCapturingException: Error 
> occurred when updating the leases.
>   at 
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:199)
>   at 
> org.apache.ignite.internal.failure.FailureManager.process(FailureManager.java:176)
>   at 
> org.apache.ignite.internal.placementdriver.LeaseUpdater$Updater.run(LeaseUpdater.java:394)
>   at java.base/java.lang.Thread.run(Thread.java:833)
> Caused by: java.lang.AssertionError: 8
>   at 
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.packNodesInfo(LeaseBatchSerializer.java:350)
>   at 
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writeLease(LeaseBatchSerializer.java:323)
>   at 
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writeLeasesForObject(LeaseBatchSerializer.java:279)
>   at 
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writePartitionedGroupLeases(LeaseBatchSerializer.java:245)
>   at 
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writeExternalData(LeaseBatchSerializer.java:169)
>   at 
> org.apache.ignite.internal.placementdriver.leases.LeaseBatchSerializer.writeExternalData(LeaseBatchSerializer.java:109)
>   at 
> org.apache.ignite.internal.versioned.VersionedSerializer.writeExternal(VersionedSerializer.java:71)
>   at 
> org.apache.ignite.internal.versioned.VersionedSerialization.toBytes(VersionedSerialization.java:52)
> {code}
> This means that {{NodesDictionary}} contains {{nameIndexToName.size()}} less 
> or equal to 8 due to {{holderIdAndProposedCandidateFitIn1Byte}} but later we 
> got index greater or equal 8 during {{packNodesInfo}} from dictionary 
> {{idToNodeIndex}} map.
> So, we some why have a node with the same consistentId, but different UUIDs 
> (test case: restart almost all 8 nodes -- corner case). But it mostly not the 
> dictionary issue: we shouldn't have such lease batch at all. The root cause 
> is in {{tryToFindCandidateAmongAssignments}}:
> {code:java}
> // Check whether given assignments is actually available in logical topology. 
> It's a best effort check because it's possible
>             // for proposed primary candidate to leave the topology at any 
> time. In that case primary candidate will be recalculated.
>             InternalClusterNode candidateNode = 
> topologyTracker.nodeByConsistentId(assignment.consistentId());
>             if (candidateNode == null) {
>                 continue;
>             }
> {code}
> We're looking up for consistent ID node name instead of UUID, this leads to 
> leases for a partitions with leaseholders with the same ID, but different 
> UUIDs. This should be fixed.
> *Motivation*
> We shouldn't have leases in a batch with nodes UUID that aren't in the actual 
> logical topology.
> *Definition of done*
> # Lease candidate is looking up based on UUID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to