[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239554#comment-16239554
 ] 

Jason Brown commented on CASSANDRA-13993:
-----------------------------------------

The details of the causes of unavailables after startup are:

- a client request comes in on the native protocol, either a read or write
- the newly bounced node figures out which peers are responsible for the data 
(by partition key)
- the node checks to see if it thinks the peers are available (see below)
- if not a sufficient enough number of replicas are alive to fulfill the 
request, the unavailable error is returned to the client 

a bouncing node determines if a peer is alive by:

- In StorageService#initServer(), add the IP addresses of previously known 
peers to gossip via {{Gossiper#addSavedEndpoint}}
- {{Gossiper#addSavedEndpoint}} sets up the local state about the peer, and 
marks the peer as dead ({{EndpointState#markDead}})
... time passes in the process startup sequence ...
- when we get gossip data from any peer in the cluster, we will start updating 
the known state in gossip about each peer
- for each peer updated that we think will be a live node (not decomissioned, 
shutdown, whatever), {{Gossiper#markAlive()}} will send an {{EchoMessage to the 
peer}}. This is sent on the {{OutboundMessagingPool#gossipChannel}} socket, 
which opens up a TCP socket, does the TCP handshake, and when we go to write 
the message to the socket (which will be the cassandra internode handshake), 
the TLS handshake is initiated and completed before the message bytes sent.
- The peer will respond with a simple request-response message. This (should 
be) sent on the peer's {{OutboundMessagingPool#gossipChannel}} [1], which 
requires it's own socket, TCP handhsake, TLS handshake, and so on before the 
request-response bytes are sent to the socket.
- The bounced node receives the request-response, and invokes the callback 
{{Gossiper#markRealAlive()}}. In that method we finally mark the peer as alive 
by invoking {{EndpointState#markAlive()}}.
- All clilent-initiated DML operations will look into the EndpointState for a 
peer inside of Gossiper to check if the peer is alive.

Thus, we must have a successful {{EchoMessage}} and response between any two 
nodes for the initiator to consider a peer as available for user-initiated 
queries.

[1] Actaully, there is a bug wherein the response is sent on the 
{{OutboundMessagingPool#smallMessageChannel}}. CASSANDRA-13714 exists to 
address it.

> Add optional startup delay to wait until peers are ready
> --------------------------------------------------------
>
>                 Key: CASSANDRA-13993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Lifecycle
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>             Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to