[ 
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239556#comment-16239556
 ] 

Jason Brown commented on CASSANDRA-13993:
-----------------------------------------

A mostly complete branch here:

||13993||
|[branch|https://github.com/jasobrown/cassandra/tree/13993]|
|[utests|https://circleci.com/gh/jasobrown/cassandra/tree/13993]|

The patch proposes to allow the operator to configure some extra time to wait 
until a configurable percentage of the peers in the cluster are marked alive 
(In {{Gossip.endpoitStateMap}}) and connected to. 

For the alives, we simply check each known peer's state in 
{{Gossip.endpoitStateMap}} to see if it is marked alive, using all the existing 
infrastructre in Gossiper (see {{Gossiper#markAlive()}}. 

For the connections, the bouncing node sends a new {{PingMessage}} to the peer, 
which will be sent on the small message channel. The peer responds with a 
{{PongMessage}}, sent on it's own small message channel. Thus, we eagerly 
create the outbound and inbound connections (small message channel) with each 
peer in the cluster before the client native protocol port is opened.

Note: the gossip outbound and inbound connections will be created by the 
{{EchoMessage}} and response that is sent by {{Gossiper#markAlive()}}.

There are a couple of open questions I'm still thinking through:
- should the configurable parameters be yaml properties? The current 
implementation naively uses System props, and hard coded default values at that 
(which will need to change before commit).
- I need to test how upgrades work, to make sure that nodes which do not know 
about the new messages (and their verbs), do not fail spectacularly. I think 
the new/unknown messages should just [be ignored at 
{{MessageDeliveryTask#run()}}|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/MessageDeliveryTask.java#L58].
 If there is a problem, I'll need to add a version check before sending the new 
message.


> Add optional startup delay to wait until peers are ready
> --------------------------------------------------------
>
>                 Key: CASSANDRA-13993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Lifecycle
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>             Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the 
> rest of the cluster as available. This is especially true if using TLS on 
> internode messaging connections. The bouncing node (and any clients connected 
> to it) may see a series of Unavailable or Timeout exceptions until the node 
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from 
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate 
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects 
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each 
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay 
> opening the client native protocol port until some percentage of the peers in 
> the cluster is marked alive and connected to/from. Thus while we potentially 
> slow down startup (delay opening the client port), we alleviate the chance 
> that queries made by clients don't hit transient unavailable/timeout 
> exceptions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to