Roman Puchkovskiy created IGNITE-20828:
------------------------------------------

             Summary: Do not retry attempts to (un)subscribe in 
TopologyAwareRaftGroupService
                 Key: IGNITE-20828
                 URL: https://issues.apache.org/jira/browse/IGNITE-20828
             Project: Ignite
          Issue Type: Bug
            Reporter: Roman Puchkovskiy
            Assignee: Roman Puchkovskiy
             Fix For: 3.0.0-beta2


When TopologyAwareRaftGroupService is shutdown, it tries to unsubscribe itself 
from all peers. If the unsubscription fails, it tries to get the logical 
topology (calling the CMG leader with RAFT), check that the target node is 
still in the topology, and if yes, retry the unsubscription request. So, if the 
CMG leader has already left the topology, an attempt to check the logical 
topology will take 10 seconds. This makes partition stop in TableManager 
timeout (as it has a limit of 10 seconds), which in turn results in a partition 
group staying registered with Loza even after TableManager#stop() returns, 
which causes Loza#stop() to fail the Ignite node stop procedure (leaving 
HTTP(S) ports bound).

It seems that it makes no sense to retry unsubscription requests at all. Even 
more, subscription requests should not be retries as well (instead, propagating 
the exception right away). The difference between the scenarios should be that 
for unsubscription an exception should never be propagated (if it's not an 
Error).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to