Roman Puchkovskiy created IGNITE-20828:
------------------------------------------
Summary: Do not retry attempts to (un)subscribe in
TopologyAwareRaftGroupService
Key: IGNITE-20828
URL: https://issues.apache.org/jira/browse/IGNITE-20828
Project: Ignite
Issue Type: Bug
Reporter: Roman Puchkovskiy
Assignee: Roman Puchkovskiy
Fix For: 3.0.0-beta2
When TopologyAwareRaftGroupService is shutdown, it tries to unsubscribe itself
from all peers. If the unsubscription fails, it tries to get the logical
topology (calling the CMG leader with RAFT), check that the target node is
still in the topology, and if yes, retry the unsubscription request. So, if the
CMG leader has already left the topology, an attempt to check the logical
topology will take 10 seconds. This makes partition stop in TableManager
timeout (as it has a limit of 10 seconds), which in turn results in a partition
group staying registered with Loza even after TableManager#stop() returns,
which causes Loza#stop() to fail the Ignite node stop procedure (leaving
HTTP(S) ports bound).
It seems that it makes no sense to retry unsubscription requests at all. Even
more, subscription requests should not be retries as well (instead, propagating
the exception right away). The difference between the scenarios should be that
for unsubscription an exception should never be propagated (if it's not an
Error).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)