Jason Brown created CASSANDRA-13993:
---------------------------------------
Summary: Add optional startup delay to wait until peers are ready
Key: CASSANDRA-13993
URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
Project: Cassandra
Issue Type: Improvement
Components: Lifecycle
Reporter: Jason Brown
Assignee: Jason Brown
Priority: Minor
Fix For: 4.x
When bouncing a node in a large cluster, is can take a while to recognize the
rest of the cluster as available. This is especially true if using TLS on
internode messaging connections. The bouncing node (and any clients connected
to it) may see a series of Unavailable or Timeout exceptions until the node is
'warmed up' as connecting to the rest of the cluster is asynchronous from the
rest of the startup process.
There are two aspects that drive a node's ability to successfully communicate
with a peer after a bounce:
- marking the peer as 'alive' (state that is held in gossip). This affects the
unavailable exceptions
- having both open outbound and inbound connections open and ready to each
peer. This affects timeouts.
Details of each of these mechanisms are described in the comments below.
This ticket proposes adding a mechanism, optional and configurable, to delay
opening the client native protocol port until some percentage of the peers in
the cluster is marked alive and connected to/from. Thus while we potentially
slow down startup (delay opening the client port), we alleviate the chance that
queries made by clients don't hit transient unavailable/timeout exceptions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]