[
https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ilya Lantukh reassigned IGNITE-10898:
-------------------------------------
Assignee: Ilya Lantukh
> Exchange coordinator failover breaks in some cases when node filter is used
> ---------------------------------------------------------------------------
>
> Key: IGNITE-10898
> URL: https://issues.apache.org/jira/browse/IGNITE-10898
> Project: Ignite
> Issue Type: Bug
> Reporter: Alexey Goncharuk
> Assignee: Ilya Lantukh
> Priority: Critical
> Fix For: 2.8
>
> Attachments: NodeWithFilterRestartTest.java
>
>
> Currently if a node does not pass cache node filter, we do not store this
> cache affinity on the node unless the node is coordinator. This, however, may
> fail in the following scenario:
> 1) A node passing node filter joins cluster
> 2) During the join coordinator fails, new coordinator is selected for which
> previous exchange is completed
> 3) Next coordinator attempts to fetch the affinity, and joining node resends
> partitions single message, but there are two problems here. First, exchange
> fast-reply does not wait for the new affinity initialization which results in
> {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead
> either to deadlock or to incorrectly fetched affinity (basically, coordinator
> must be in consensus with other nodes passing node filter)
> Test attached reproduces the issue.
> I suggest to always calculate and keep affinity on all nodes, even ones not
> passing the filter. In this case, there will be no need to fetch and
> recalculate affinity ({{initCoordinatorCaches}} will go away.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)