[
https://issues.apache.org/jira/browse/CASSANDRA-19187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803436#comment-17803436
]
Stefan Miklosovic commented on CASSANDRA-19187:
-----------------------------------------------
Do I get it right this is not ready for the commit then? I think until we
answer remaining questions about the flakiness etc we can't ship that ...
> nodetool assassinate may cause thread serialization for that node
> -----------------------------------------------------------------
>
> Key: CASSANDRA-19187
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19187
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Membership
> Reporter: Runtian Liu
> Assignee: Runtian Liu
> Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> When assassinate an ip address that is not in the gossip map, a "corrupted"
> entry will be inserted into the gossip map.
> [(1)|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L810]
> For example, if we do "nodetool assassinate 10.1.1.1"
> we will get an entry like below by running "nodetool gossipinfo":
>
> {code:java}
> /10.1.1.1
> generation:1702006511
> heartbeat:9999
> STATUS:209516:LEFT,-8393921141401589197,1702265651923
> STATUS_WITH_PORT:209515:LEFT,-8393921141401589197,1702265651923
> TOKENS: not present {code}
>
> This entry in endpointStateMap will cause issue for
> [isUpgradingFromVersionLowerThan|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L2284]
> function. Because the
> [upgradeFromVersionSupplier|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L191]
> supplier will always set the
> [allHostsHaveKnownVersion|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L216]
> flag to false so no memoized value will be returned. The "get" function will
> always require a lock from this
> [line|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/ExpiringMemoizingSupplier.java#L66].
> If application is using "fetchAll", the native-transport-requests thread will
> hit this
> [line|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/db/filter/ColumnFilter.java#L574].
> This means all the native-transport-requests thread is serialized, also, the
> lock is shared by GossipStage threads. It means if a node in a cluster with
> the corrupted gossip map is restart, the node will run into this problem.
> To fix the issue,
> # Why we want to add a dummy entry for nodetool assassinate if the endpoint
> is not in the map anymore. Should we do nothing or throw exception if the
> node is not in the gossip map anymore?
> # Before checking if a version is null, we should make sure the node is not
> a dead node. A decommissioned node, a left node should not be considered part
> of the cluster anymore when calculating "upgradeInProgressPossible"
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]