[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Garvit Rajput updated ZOOKEEPER-4945:
-------------------------------------
    Description: 
*Motivation*
Currently, ZooKeeper clients must maintain an active session to retain 
ephemeral nodes. If a session timeout occurs due to a network glitch or GC 
pause, these nodes are deleted even if the client recovers shortly after.

*Proposed Feature*
Introduce a configurable *{*}auto-renewal heartbeat mechanism{*}* where the 
ZooKeeper client can *{*}extend the lifetime of ephemeral nodes{*}* for a grace 
period after temporary session disconnections — essentially a soft-reconnect 
buffer.

*This feature would:*
 - Reduce unintended ephemeral node deletion due to transient network failures.
 - Improve stability for clients with flaky connections.
 - Help cloud-native workloads where short-lived network interruptions are 
common.

*Implementation Ideas*
 - Introduce a `znode.ephemeral.gracePeriod` config on the server/client.
 - Allow clients to reattach to their ephemeral nodes within this window.
 - Maintain consistency and fencing semantics using a version hash or ephemeral 
token.

*Benefits*
This change would improve ZooKeeper's resilience in distributed environments 
without breaking the ephemeral node contract, as the node would still expire if 
the client doesn't reconnect within the grace period.

*Impact*
 - Fully backward-compatible
 - Opt-in via configuration
 - May require slight changes to session expiration logic

Let me know if this is a direction you'd consider. Happy to discuss design or 
help contribute a patch.

  was:
### Motivation
Currently, ZooKeeper clients must maintain an active session to retain 
ephemeral nodes. If a session timeout occurs due to a network glitch or GC 
pause, these nodes are deleted even if the client recovers shortly after.

### Proposed Feature
Introduce a configurable **auto-renewal heartbeat mechanism** where the 
ZooKeeper client can **extend the lifetime of ephemeral nodes** for a grace 
period after temporary session disconnections — essentially a soft-reconnect 
buffer.

This feature would:
- Reduce unintended ephemeral node deletion due to transient network failures.
- Improve stability for clients with flaky connections.
- Help cloud-native workloads where short-lived network interruptions are 
common.

### Implementation Ideas
- Introduce a `znode.ephemeral.gracePeriod` config on the server/client.
- Allow clients to reattach to their ephemeral nodes within this window.
- Maintain consistency and fencing semantics using a version hash or ephemeral 
token.

### Benefits
This change would improve ZooKeeper's resilience in distributed environments 
without breaking the ephemeral node contract, as the node would still expire if 
the client doesn't reconnect within the grace period.

### Impact
- Fully backward-compatible
- Opt-in via configuration
- May require slight changes to session expiration logic

Let me know if this is a direction you'd consider. Happy to discuss design or 
help contribute a patch.


> Support automatic renewal of ephemeral nodes via client heartbeats
> ------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-4945
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4945
>             Project: ZooKeeper
>          Issue Type: New Feature
>          Components: c client, server
>            Reporter: Garvit Rajput
>            Priority: Major
>
> *Motivation*
> Currently, ZooKeeper clients must maintain an active session to retain 
> ephemeral nodes. If a session timeout occurs due to a network glitch or GC 
> pause, these nodes are deleted even if the client recovers shortly after.
> *Proposed Feature*
> Introduce a configurable *{*}auto-renewal heartbeat mechanism{*}* where the 
> ZooKeeper client can *{*}extend the lifetime of ephemeral nodes{*}* for a 
> grace period after temporary session disconnections — essentially a 
> soft-reconnect buffer.
> *This feature would:*
>  - Reduce unintended ephemeral node deletion due to transient network 
> failures.
>  - Improve stability for clients with flaky connections.
>  - Help cloud-native workloads where short-lived network interruptions are 
> common.
> *Implementation Ideas*
>  - Introduce a `znode.ephemeral.gracePeriod` config on the server/client.
>  - Allow clients to reattach to their ephemeral nodes within this window.
>  - Maintain consistency and fencing semantics using a version hash or 
> ephemeral token.
> *Benefits*
> This change would improve ZooKeeper's resilience in distributed environments 
> without breaking the ephemeral node contract, as the node would still expire 
> if the client doesn't reconnect within the grace period.
> *Impact*
>  - Fully backward-compatible
>  - Opt-in via configuration
>  - May require slight changes to session expiration logic
> Let me know if this is a direction you'd consider. Happy to discuss design or 
> help contribute a patch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to