[
https://issues.apache.org/jira/browse/IGNITE-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032848#comment-17032848
]
Cameron Braid commented on IGNITE-9803:
---------------------------------------
I've hit this in production a few times over the last 2 weeks. Is this an easy
backport to 2.7.x ? or wil 2.8 be out soonish ?
> GridDhtInvalidPartitionException in GridDhtPartitionDemander
> ------------------------------------------------------------
>
> Key: IGNITE-9803
> URL: https://issues.apache.org/jira/browse/IGNITE-9803
> Project: Ignite
> Issue Type: Bug
> Components: cache
> Reporter: Semen Boikov
> Priority: Major
> Fix For: 2.8
>
>
> Debugged failure of
> DynamicIndexPartitionedTransactionalConcurrentSelfTest.testConcurrentRebalance
> with GridDhtInvalidPartitionException, here is scenario where this error
> occurs:
> * test starts node1, node2, loads data
> * node3 is started, one partition is assigned to [node2, node3] and node3
> starts rebalancing
> * node4 is started, partition is re-assigned to [node2, node4]
> * at this time rebalancing on node3 is in progress, it is going to handle
> supply message and at this moment exchange thread moves partition to RENTING
> state, and at this moment partition can not be moved to EVICTED since async
> partition cleanup is needed
> * thread doing rebalancing at node3 sees RENTING partition and gets
> GridDhtInvalidPartitionException
> Probability of such failure is very high if insert sleep(5000) in the code
> doing async partition cleanup (PartitionEvictionTask.run).
>
> I think fix for this issue is just handle GridDhtInvalidPartitionException in
> GridDhtPartitionDemander.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)