Re: [DISCUSS] CEP-59: Graceful Disconnect – In-Band Connection Draining for Node Shutdown

Patrick McFadin Wed, 14 Jan 2026 13:33:35 -0800

Hi Jane,

Thank you for the thought-out CEP. I certainly see the use of a feature
like this to add resilience during cluster state changes. I have a few
questions after reading the CEP.

Driver compatibility: The way I read this, it's based on an ideal scenario
where client and server are on the same version to support this feature. In
my experience, client rollouts are never complete and often lag far behind
the cluster upgrade. What happens when the driver completely
ignores GRACEFUL_DISCONNECT? It might mean considering something on the
server side.

Discovery things: Speaking of the client, you want to use the SUPPORTED as
listed in the v4 spec[1], but why not add this to STARTUP? You mention
something in the "Rejected alternatives," but could you expand your
thinking here?

Signal multiplication: You have this in the CEP "Other protocols (HTTP/2,
PostgreSQL, Redis Cluster) use connection-local in-band signals to enable
safe draining." Our protocol guidance[1] explicitly notes that drivers
often keep multiple connections and should not register for events on all
of them, as this duplicates traffic. I don't know how you could ensure that
every connection would be aware of a GRACEFUL_DISCONNECT without changing
that aspect of the spec.

Event timing for operators: It's not clear to me when
the GRACEFUL_DISCONNECT is emitted when you do something like a drain,
disablebinary or just a JVM shutdown hook. This is crucial for operators to
understand how this could work and should be in the CEP spec for clarity. I
think it will matter to a lot of people.

Operator control: I've been on this push for a while and so I have to
mention it. Opt-in vs default. We need more controls in the config YAML.
graceful_disconnect_enabled

If there is a server-side component:
graceful_disconnect_grace_period_ms
graceful_disconnect_max_drain_ms

And finally, it needs more observability...
logging/metrics counters: connections_draining, forced_disconnects

Thanks for proposing this!

Patrick

1 -
https://cassandra.apache.org/doc/latest/cassandra/_attachments/native_protocol_v4.html

On Tue, Jan 13, 2026 at 4:30 PM Jane H <[email protected]> wrote:

> Hi all,
>
> I’d like to start a discussion on a CEP proposal: *CEP-59: Graceful
> Disconnect*, to make intentional node shutdown/drain less disruptive for
> clients (link:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406619103
> ).
>
> Today, intentional node shutdown (e.g., rolling restarts) can still be
> disruptive from a client perspective. Drivers often ignore DOWN events
> because they are not reliable, and outstanding requests can end up as
> client-facing TimeOut exceptions.
>
> The proposed solution is to add an in-band GRACEFUL_DISCONNECT event that
> both control and query connections can opt into via REGISTER. When a node
> is shutting down, it will emit the event to all subscribed connections.
> Drivers will stop sending new queries on that connection/host, allow
> in-flight requests to finish, then reconnect with exponential backoff.
>
> If you have thoughts on the proposed protocol, server shutdown behavior,
> driver expectations, edge cases, or general feedback, I’d really appreciate
> it.
>
> Regards,
> Jane
>

Re: [DISCUSS] CEP-59: Graceful Disconnect – In-Band Connection Draining for Node Shutdown

Reply via email to