Jason Brown created CASSANDRA-13993:
---------------------------------------

             Summary: Add optional startup delay to wait until peers are ready
                 Key: CASSANDRA-13993
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
             Project: Cassandra
          Issue Type: Improvement
          Components: Lifecycle
            Reporter: Jason Brown
            Assignee: Jason Brown
            Priority: Minor
             Fix For: 4.x


When bouncing a node in a large cluster, is can take a while to recognize the 
rest of the cluster as available. This is especially true if using TLS on 
internode messaging connections. The bouncing node (and any clients connected 
to it) may see a series of Unavailable or Timeout exceptions until the node is 
'warmed up' as connecting to the rest of the cluster is asynchronous from the 
rest of the startup process.

There are two aspects that drive a node's ability to successfully communicate 
with a peer after a bounce:
- marking the peer as 'alive' (state that is held in gossip). This affects the 
unavailable exceptions
- having both open outbound and inbound connections open and ready to each 
peer. This affects timeouts.

Details of each of these mechanisms are described in the comments below.

This ticket proposes adding a mechanism, optional and configurable, to delay 
opening the client native protocol port until some percentage of the peers in 
the cluster is marked alive and connected to/from. Thus while we potentially 
slow down startup (delay opening the client port), we alleviate the chance that 
queries made by clients don't hit transient unavailable/timeout exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to