[ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417719#comment-17417719 ]
Aleksei Zotov commented on CASSANDRA-14930: ------------------------------------------- Sorry guys, I checked the wrong commit! So just deleted my previous comment to prevent any confusion. However, I have some comments to the actual PR: # Looks like a new property ({{cassandra.messaging_destroy_delay_in_ms}}) is going to be introduced. I think it needs to be described in {{cassandra.yaml}} and other documentation. Moreover, as far as I understand work with properties needs to happen through the {{DatabaseDescriptor}} , not just system properties. # {code:java} if (delay <= 0) // opt out{code} Is it really possible? If yes, could you please describe a scenario? We either take a max of positive values or read it from the property (that should have corresponding validation of being non-negative or positive). # Here I do not really know the flow well, but just to confirm: the node should *not* be among the unreachable endpoint to close the connection? {code:java} if (!liveEndpoints.contains(endpoint) && !unreachableEndpoints.containsKey(endpoint)){code} # Should we have corresponding tests introduced? Taking into account that it is just a bug fix for old versions probably #1 and #4 might be not really critical (but I'd still remove the property unless there is a justification of having it). #2 also does not seem to affect something even though might be not required. > decommission may cause timeout because messaging backlog is cleared > -------------------------------------------------------------------- > > Key: CASSANDRA-14930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14930 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Core > Reporter: Zhao Yang > Assignee: Zhao Yang > Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On a 3-node cluster with RF=2, decommissioning a node may cause quorum write > timeout because messaging backlog to decommissioned node is cleared via > {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}. > (Timeout is less likely to happen with RF=3, because we can afford one less > response) > {code:java} > What happened: > 1. [WriteStage] before the leaving node is removed from tokenmetadata, the > write endpoints are generated ( leaving endpoint is included ) > 2. [GossipStage] the leaving node is removed from tokenmetadata, no more > future write handler will include leaving endpoints > 3. [WriteStage] write handlers sends messages to messaging-service backlog > 4. [GossipStage] messaging-service backlog is cleared, messages are not sent > and connection closed > 5. [WriteStage] write time out > {code} > |patch| > |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]| > |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]| > We can avoid it by delaying to destroy messaging connection so that messages > are sent and responded. This patch also avoids reopening already closed > connection on {{MessagingService#convict()}}. > New messaging framework rewrite in {{Trunk}} avoids the issues by not > clearing messaging backlog. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org