Shilun Fan created RATIS-2408:
---------------------------------

             Summary: Add configurable exponential backoff reconnection for 
Netty DataStream client
                 Key: RATIS-2408
                 URL: https://issues.apache.org/jira/browse/RATIS-2408
             Project: Ratis
          Issue Type: Improvement
          Components: Netty
            Reporter: Shilun Fan
            Assignee: Shilun Fan


## Problem
 
Currently, the Netty DataStream client uses a fixed 100ms delay for 
reconnection attempts when the connection fails. This approach has several 
limitations:

1. **Resource waste**: During network issues or server unavailability, constant 
100ms retry intervals create unnecessary load
2. **Thundering herd**: Multiple clients reconnecting simultaneously can 
overwhelm the server
3. **Lack of configurability**: Users cannot tune reconnection behavior for 
their specific use cases
 
 
## Solution

Implement configurable exponential backoff with jitter for DataStream client 
reconnections:

1. **Configuration Support**:
- `raft.client.datastream.reconnect.delay` - Initial reconnection delay 
(default: 100ms)
- `raft.client.datastream.reconnect.max-delay` - Maximum backoff delay 
(default: 5s)

2. **Exponential Backoff**:
- Delay doubles on each failed attempt: 100ms → 200ms → 400ms → 800ms → 1600ms 
→ 5000ms
- Resets to initial delay upon successful connection

3. **Jitter (0.5x-1.5x)**:
- Randomizes actual delay to avoid synchronized reconnection storms
- Example: 1000ms base → actual delay between 500ms-1500ms

4. **Concurrent Safety**:
- Prevents duplicate reconnection scheduling using atomic flags
- Ensures cleanup even if reconnection is short-circuited

5. **Adaptive Logging**:
- INFO level for short delays (≤500ms) - normal reconnection
- WARN level for long delays (>500ms) - persistent failures



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to