[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239555#comment-16239555
 ] 

Jason Brown commented on CASSANDRA-13993:
-----------------------------------------

The details of the timeouts after startup are:

- a client request comes in on the native protocol, either a read or write
- the newly bounced node figures out which peers are responsible for the data 
(by partition key)
- the node sends the request to the peers, we have to build up both the 
outbound and inbound connections (note: internode messaging connections are 
unidrectional)
- if building those connections are not fast enough, the request will timeout 
(either at the coordinator or the client driver)

On each connection we have to build TCP connection, possiblly perform the TLS 
handshake, and then perform the c* internode messaging handshake. The time for 
this is exacerbated with nodes that are in remote datacenters, where the round 
trip time is significantly higher. In pre-4.0 (before CASSANDRA-8457), this is 
even worse as all those actions were performed sequentially, per-each 
connection attempt, [on the (single) accept 
thread|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/MessagingService.java#L1284].


> Add optional startup delay to wait until peers are ready
> --------------------------------------------------------
>
>                 Key: CASSANDRA-13993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Lifecycle
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>             Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to