[
https://issues.apache.org/jira/browse/IGNITE-10898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Goncharuk updated IGNITE-10898:
--------------------------------------
Description:
Currently if a node does not pass cache node filter, we do not store this cache
affinity on the node unless the node is coordinator. This, however, may fail in
the following scenario:
1) A node passing node filter joins cluster
2) During the join coordinator fails, new coordinator is selected for which
previous exchange is completed
3) Next coordinator attempts to fetch the affinity, and joining node resends
partitions single message, but there are two problems here. First, exchange
fast-reply does not wait for the new affinity initialization which results in
{{IllegalStateException}}. Second, such an attempt to fetch affinity may lead
either to deadlock or to incorrectly fetched affinity (basically, coordinator
must be in consensus with other nodes passing node filter)
Test attached reproduces the issue.
I suggest to always calculate and keep affinity on all nodes, even ones not
passing the filter. In this case, there will be no need to fetch and
recalculate affinity ({{initCoordinatorCaches}} will go away.
> Exchange coordinator failover breaks in some cases when node filter is used
> ---------------------------------------------------------------------------
>
> Key: IGNITE-10898
> URL: https://issues.apache.org/jira/browse/IGNITE-10898
> Project: Ignite
> Issue Type: Bug
> Reporter: Alexey Goncharuk
> Priority: Major
>
> Currently if a node does not pass cache node filter, we do not store this
> cache affinity on the node unless the node is coordinator. This, however, may
> fail in the following scenario:
> 1) A node passing node filter joins cluster
> 2) During the join coordinator fails, new coordinator is selected for which
> previous exchange is completed
> 3) Next coordinator attempts to fetch the affinity, and joining node resends
> partitions single message, but there are two problems here. First, exchange
> fast-reply does not wait for the new affinity initialization which results in
> {{IllegalStateException}}. Second, such an attempt to fetch affinity may lead
> either to deadlock or to incorrectly fetched affinity (basically, coordinator
> must be in consensus with other nodes passing node filter)
> Test attached reproduces the issue.
> I suggest to always calculate and keep affinity on all nodes, even ones not
> passing the filter. In this case, there will be no need to fetch and
> recalculate affinity ({{initCoordinatorCaches}} will go away.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)