Cameron Zemek created CASSANDRA-18543:
-----------------------------------------
Summary: Waiting for gossip to settle does not wait for live
endpoints
Key: CASSANDRA-18543
URL: https://issues.apache.org/jira/browse/CASSANDRA-18543
Project: Cassandra
Issue Type: Bug
Reporter: Cameron Zemek
Attachments: gossip.patch
When a node starts it will get endpoint states (via shadow round) but have all
nodes marked as down. The problem is the wait to settle only checks the size of
endpoint states is stable before starting Native transport. Once native
transport starts it will receive queries and fail consistency levels such as
LOCAL_QUORUM since it still thinks nodes are down.
My initial solution to this was to also check live endpoints size in addition
to size of endpoint states. This worked but I noticed in testing this fix that
there also a lot of duplication of checking the same node (via Echo messages)
for liveness. So the patch also removes this duplication of checking node is UP.
The final problem I found while testing is sometimes could still not see a
change in live endpoints due to only 1 second polling, so the patch allows for
overridding the settle parameters.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]