[ 
https://issues.apache.org/jira/browse/IGNITE-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Semen Boikov updated IGNITE-9803:
---------------------------------
    Description: 
Debugged failure of 
DynamicIndexPartitionedTransactionalConcurrentSelfTest.testConcurrentRebalance 
with GridDhtInvalidPartitionException, here is scenario where this error occurs:
 * test starts node1, node2, loads data
 * node3 is started, one partition is assigned to [node2, node3] and node3 
starts rebalancing
 * node4 is started, partition is re-assigned to [node2, node4]
 * at this time rebalancing on node3 is in progress, it is going to handle 
supply message and at this moment exchange thread moves partition to RENTING 
state, and at this moment partition can not be moved to EVICTED since async 
partition cleanup is needed
 * thread doing rebalancing at node3 sees RENTING partition and gets 
GridDhtInvalidPartitionException

Probability of such failure is very high if insert sleep(5000) in the code 
doing async partition cleanup (PartitionEvictionTask.run).

 

I think fix for this issue is just handle GridDhtInvalidPartitionException in 
GridDhtPartitionDemander.

  was:
Debugged failure of 
DynamicIndexPartitionedTransactionalConcurrentSelfTest.testConcurrentRebalance 
with GridDhtInvalidPartitionException, here is scenario where this error occurs:
 * test starts node1, node2, loads data
 * node3 is started, one partition is assigned to [node2, node3] and node3 
starts rebalancing
 * node4 is started, partition is re-assigned to [node2, node4]
 * at this time rebalancing on node3 is in progress, is is going to handles 
supply message and at this moment exchange thread moves partition to RENTING 
state, at this moment it can not be moved to EVICTED since async partition 
cleanup is needed
 * at node3 thread doing rebalancing sees RENTING partition and gets 
GridDhtInvalidPartitionException

Probability of such failure is very high if insert sleep(5000) in the code 
doing async partition cleanup (PartitionEvictionTask.run).

 

I think fix for this issue is just handle GridDhtInvalidPartitionException in 
GridDhtPartitionDemander.


> GridDhtInvalidPartitionException in GridDhtPartitionDemander
> ------------------------------------------------------------
>
>                 Key: IGNITE-9803
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9803
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>            Reporter: Semen Boikov
>            Assignee: Semen Boikov
>            Priority: Major
>             Fix For: 2.8
>
>
> Debugged failure of 
> DynamicIndexPartitionedTransactionalConcurrentSelfTest.testConcurrentRebalance
>  with GridDhtInvalidPartitionException, here is scenario where this error 
> occurs:
>  * test starts node1, node2, loads data
>  * node3 is started, one partition is assigned to [node2, node3] and node3 
> starts rebalancing
>  * node4 is started, partition is re-assigned to [node2, node4]
>  * at this time rebalancing on node3 is in progress, it is going to handle 
> supply message and at this moment exchange thread moves partition to RENTING 
> state, and at this moment partition can not be moved to EVICTED since async 
> partition cleanup is needed
>  * thread doing rebalancing at node3 sees RENTING partition and gets 
> GridDhtInvalidPartitionException
> Probability of such failure is very high if insert sleep(5000) in the code 
> doing async partition cleanup (PartitionEvictionTask.run).
>  
> I think fix for this issue is just handle GridDhtInvalidPartitionException in 
> GridDhtPartitionDemander.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to