[
https://issues.apache.org/jira/browse/CASSANDRA-19187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802087#comment-17802087
]
Stefan Miklosovic commented on CASSANDRA-19187:
-----------------------------------------------
Just a heads-up that "hasMajorVersion3Nodes" method in Gossiper which is used
in tests in the patches for this ticket will change its name here
https://github.com/pauloricardomg/cassandra/commit/ba01b094bcb50766b3dae2da6e4e5dde4579b829
That is patch for 5.0 for CASSANDRA-18999 (similar renaming will be done in 4.0
and 4.1) [~paulo] is driving the merge of that, all is done, we just seem to
struggle to run upgrade dtests in Circle due to our free plans for branches of
4.0, 4.1, 5.0, please find the details here:
https://issues.apache.org/jira/browse/CASSANDRA-18999?focusedCommentId=17801825&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17801825
So, just saying, that if we happen to merge CASSANDRA-18999 first, then the
renaming of that method will need to be reflected in this patch too. The logic
of that method is changed slightly as well.
> nodetool assassinate may cause thread serialization for that node
> -----------------------------------------------------------------
>
> Key: CASSANDRA-19187
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19187
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Membership
> Reporter: Runtian Liu
> Assignee: Runtian Liu
> Priority: Normal
> Fix For: 4.0.x, 4.1.x, 5.0.x
>
>
> When assassinate an ip address that is not in the gossip map, a "corrupted"
> entry will be inserted into the gossip map.
> [(1)|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L810]
> For example, if we do "nodetool assassinate 10.1.1.1"
> we will get an entry like below by running "nodetool gossipinfo":
>
> {code:java}
> /10.1.1.1
> generation:1702006511
> heartbeat:9999
> STATUS:209516:LEFT,-8393921141401589197,1702265651923
> STATUS_WITH_PORT:209515:LEFT,-8393921141401589197,1702265651923
> TOKENS: not present {code}
>
> This entry in endpointStateMap will cause issue for
> [isUpgradingFromVersionLowerThan|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L2284]
> function. Because the
> [upgradeFromVersionSupplier|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L191]
> supplier will always set the
> [allHostsHaveKnownVersion|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/gms/Gossiper.java#L216]
> flag to false so no memoized value will be returned. The "get" function will
> always require a lock from this
> [line|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/ExpiringMemoizingSupplier.java#L66].
> If application is using "fetchAll", the native-transport-requests thread will
> hit this
> [line|https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/db/filter/ColumnFilter.java#L574].
> This means all the native-transport-requests thread is serialized, also, the
> lock is shared by GossipStage threads. It means if a node in a cluster with
> the corrupted gossip map is restart, the node will run into this problem.
> To fix the issue,
> # Why we want to add a dummy entry for nodetool assassinate if the endpoint
> is not in the map anymore. Should we do nothing or throw exception if the
> node is not in the gossip map anymore?
> # Before checking if a version is null, we should make sure the node is not
> a dead node. A decommissioned node, a left node should not be considered part
> of the cluster anymore when calculating "upgradeInProgressPossible"
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]