[
https://issues.apache.org/jira/browse/IGNITE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-20828:
---------------------------------------
Summary: Do not retry attempts to unsubscribe in
TopologyAwareRaftGroupService (was: Do not retry attempts to (un)subscribe in
TopologyAwareRaftGroupService)
> Do not retry attempts to unsubscribe in TopologyAwareRaftGroupService
> ---------------------------------------------------------------------
>
> Key: IGNITE-20828
> URL: https://issues.apache.org/jira/browse/IGNITE-20828
> Project: Ignite
> Issue Type: Bug
> Reporter: Roman Puchkovskiy
> Assignee: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
> Fix For: 3.0.0-beta2
>
>
> When TopologyAwareRaftGroupService is shutdown, it tries to unsubscribe
> itself from all peers. If the unsubscription fails, it tries to get the
> logical topology (calling the CMG leader with RAFT), check that the target
> node is still in the topology, and if yes, retry the unsubscription request.
> So, if the CMG leader has already left the topology, an attempt to check the
> logical topology will take 10 seconds. This makes partition stop in
> TableManager timeout (as it has a limit of 10 seconds), which in turn results
> in a partition group staying registered with Loza even after
> TableManager#stop() returns, which causes Loza#stop() to fail the Ignite node
> stop procedure (leaving HTTP(S) ports bound).
> It seems that it makes no sense to retry unsubscription requests at all. Even
> more, subscription requests should not be retries as well (instead,
> propagating the exception right away). The difference between the scenarios
> should be that for unsubscription an exception should never be propagated (if
> it's not an Error).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)