[
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239556#comment-16239556
]
Jason Brown commented on CASSANDRA-13993:
-----------------------------------------
A mostly complete branch here:
||13993||
|[branch|https://github.com/jasobrown/cassandra/tree/13993]|
|[utests|https://circleci.com/gh/jasobrown/cassandra/tree/13993]|
The patch proposes to allow the operator to configure some extra time to wait
until a configurable percentage of the peers in the cluster are marked alive
(In {{Gossip.endpoitStateMap}}) and connected to.
For the alives, we simply check each known peer's state in
{{Gossip.endpoitStateMap}} to see if it is marked alive, using all the existing
infrastructre in Gossiper (see {{Gossiper#markAlive()}}.
For the connections, the bouncing node sends a new {{PingMessage}} to the peer,
which will be sent on the small message channel. The peer responds with a
{{PongMessage}}, sent on it's own small message channel. Thus, we eagerly
create the outbound and inbound connections (small message channel) with each
peer in the cluster before the client native protocol port is opened.
Note: the gossip outbound and inbound connections will be created by the
{{EchoMessage}} and response that is sent by {{Gossiper#markAlive()}}.
There are a couple of open questions I'm still thinking through:
- should the configurable parameters be yaml properties? The current
implementation naively uses System props, and hard coded default values at that
(which will need to change before commit).
- I need to test how upgrades work, to make sure that nodes which do not know
about the new messages (and their verbs), do not fail spectacularly. I think
the new/unknown messages should just [be ignored at
{{MessageDeliveryTask#run()}}|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/net/MessageDeliveryTask.java#L58].
If there is a problem, I'll need to add a version check before sending the new
message.
> Add optional startup delay to wait until peers are ready
> --------------------------------------------------------
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
> Issue Type: Improvement
> Components: Lifecycle
> Reporter: Jason Brown
> Assignee: Jason Brown
> Priority: Minor
> Fix For: 4.x
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the
> rest of the cluster as available. This is especially true if using TLS on
> internode messaging connections. The bouncing node (and any clients connected
> to it) may see a series of Unavailable or Timeout exceptions until the node
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay
> opening the client native protocol port until some percentage of the peers in
> the cluster is marked alive and connected to/from. Thus while we potentially
> slow down startup (delay opening the client port), we alleviate the chance
> that queries made by clients don't hit transient unavailable/timeout
> exceptions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]