[ 
https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-10898:
--------------------------------------
    Fix Version/s: 2.8

> Exchange coordinator failover breaks in some cases when node filter is used
> ---------------------------------------------------------------------------
>
>                 Key: IGNITE-10898
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10898
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexey Goncharuk
>            Priority: Critical
>             Fix For: 2.8
>
>
> Currently if a node does not pass cache node filter, we do not store this 
> cache affinity on the node unless the node is coordinator. This, however, may 
> fail in the following scenario:
> 1) A node passing node filter joins cluster
> 2) During the join coordinator fails, new coordinator is selected for which 
> previous exchange is completed
> 3) Next coordinator attempts to fetch the affinity, and joining node resends 
> partitions single message, but there are two problems here. First, exchange 
> fast-reply does not wait for the new affinity initialization which results in 
> {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead 
> either to deadlock or to incorrectly fetched affinity (basically, coordinator 
> must be in consensus with other nodes passing node filter)
> Test attached reproduces the issue.
> I suggest to always calculate and keep affinity on all nodes, even ones not 
> passing the filter. In this case, there will be no need to fetch and 
> recalculate affinity ({{initCoordinatorCaches}} will go away.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to