[
https://issues.apache.org/jira/browse/CASSANDRA-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424314#comment-16424314
]
Aleksey Yeschenko commented on CASSANDRA-13993:
-----------------------------------------------
So, while the comment is before the {{UNUSED_}} verbs, we should still be doing
what the comment says, and add new verbs in the end. In our case - after
{{UNUSED_5}}.
Now, it doesn't often happen that thing go wrong in a way that forces us to
retroactively add new verbs to already released majors, but it does sometimes.
Imagine for example there is a bug that causes us to add a new verb to 2.2 and
3.0, to address some issue with reads. Normally we would go an see which unused
ranges overlap. In this case, {{UNUSED_1}} to {{UNUSED_3}} could be
appropriated. This is why we keep the buffer there. If 4.0 appropriates the
slot just before {{UNUSED_1}} - it's essentially taking over {{UNUSED_1}} spot,
reducing that available buffer by 1.
Now, it is unlikely that we are going to need 3 new verbs in 3.11/3.0/2.2, but
it's not like extra ordinals are a precious resource. So we might as well stick
to the ways of the old, and either, a) move {{PING}} verb to the end of the
list, after {{UNUSED_5}}, or b) Reuse one of the ancient deprecated verbs (we
did that at least for hints and batchlog recently).
> Add optional startup delay to wait until peers are ready
> --------------------------------------------------------
>
> Key: CASSANDRA-13993
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13993
> Project: Cassandra
> Issue Type: Improvement
> Components: Lifecycle
> Reporter: Jason Brown
> Assignee: Jason Brown
> Priority: Minor
> Fix For: 4.0
>
>
> When bouncing a node in a large cluster, is can take a while to recognize the
> rest of the cluster as available. This is especially true if using TLS on
> internode messaging connections. The bouncing node (and any clients connected
> to it) may see a series of Unavailable or Timeout exceptions until the node
> is 'warmed up' as connecting to the rest of the cluster is asynchronous from
> the rest of the startup process.
> There are two aspects that drive a node's ability to successfully communicate
> with a peer after a bounce:
> - marking the peer as 'alive' (state that is held in gossip). This affects
> the unavailable exceptions
> - having both open outbound and inbound connections open and ready to each
> peer. This affects timeouts.
> Details of each of these mechanisms are described in the comments below.
> This ticket proposes adding a mechanism, optional and configurable, to delay
> opening the client native protocol port until some percentage of the peers in
> the cluster is marked alive and connected to/from. Thus while we potentially
> slow down startup (delay opening the client port), we alleviate the chance
> that queries made by clients don't hit transient unavailable/timeout
> exceptions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]