[ 
https://issues.apache.org/jira/browse/IGNITE-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069394#comment-15069394
 ] 

Semen Boikov edited comment on IGNITE-647 at 12/23/15 10:26 AM:
----------------------------------------------------------------

There is one more race here:
- node 1 starts, exchange is finished
- node 2 starts, exchange is finished
- node 2 starts cache with fair affinity function, starts exchange and sends 
GridDhtAffinityAssignmentRequest to node1. At this point node1 started exchange 
process, but did start cache yet and did not register required message handler 
so message will be ignored.

So any dynamic cache start with fair affinity initiated from non-oldest node 
can easily hang.

This behaviour is caused by wrong logic in 
GridDhtPartitionsExchangeFuture#canCalculateAffinity - when cache is started 
then all nodes can calculate affinity and there is no need in 
GridDhtAffinityAssignmentRequest send.


was (Author: sboikov):
There is one more race here:
- node 1 starts, exchange is finished
- node 2 starts, exchange is finished
- node 2 starts cache with fair affinity function, starts exchange and sends 
GridDhtAffinityAssignmentRequest to node1. At this point node1 started exchange 
process, but did start cache yet and did not register required message handler 
so message will be ignored.

So any dynamic cache start with fair affinity initiated from non-oldest node 
can easily hang.

> org.apache.ignite.IgniteCacheAffinitySelfTest.testAffinity() hangs
> ------------------------------------------------------------------
>
>                 Key: IGNITE-647
>                 URL: https://issues.apache.org/jira/browse/IGNITE-647
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Yakov Zhdanov
>            Assignee: Semen Boikov
>            Priority: Blocker
>              Labels: Muted_test
>         Attachments: FairAffinityDynamicCacheSelfTest.testStartStopCache.txt, 
> threaddump.txt
>
>
> 1-2 runs out of ~10 local runs hanged for me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to