[PR] KAFKA-16799 Add exponential backoff to NetworkClientDelegate when node is not ready [kafka]

via GitHub Sat, 20 Dec 2025 17:14:48 -0800


devtrace404 opened a new pull request, #21198:
URL: https://github.com/apache/kafka/pull/21198


   ### Problem
   During stress testing, AsyncKafkaConsumer's NetworkClientDelegate did not 
back off when a node was not ready, causing excessive "Node is not ready" log 
messages. The Network client retried immediately, leading to log spam and 
unnecessary CPU usage.
   
   ### Solution
   Implements exponential backoff in NetworkClientDelegate to prevent immediate 
retries when a node is not ready:
   
   1. Tracks backoff state per node: Uses a `ConcurrentHashMap<Node, 
BackoffState>` to maintain separate backoff state for each node, allowing 
independent backoff tracking.
   2. Applies exponential backoff: When client.ready(node) returns false, the 
client now:
   
   - Records the failure using `recordNodeNotReady()`
   - Calculates the next retry time using ExponentialBackoff with the 
configured retry.backoff.ms (default 100ms) and retry.backoff.max.ms (default 
1000ms)
   - Skips retry attempts until the backoff period has elapsed
   
   3. Resets backoff on success: When a request is successfully sent to a node, 
the backoff state for that node is reset, ensuring the next failure starts with 
the initial backoff interval.
   
   ### Implementation Details
   #### NetworkClientDelegate.java:
   1. Added fields: `retryBackoffMs`, `retryBackoffMaxMs`, 
`exponentialBackoff`, `nodeBackoffStates`
   2. Modified `doSend()` to check backoff delay before sending and record 
failures
   3. Added helper methods: `getBackoffDelay()`, `recordNodeNotReady()`, 
`resetBackoff()`
   4. Added BackoffState inner class to track attempt count and next retry time 
per node
   
   #### NetworkClientDelegateTest.java:
   
   1. Added testExponentialBackoffReducesRetryAttemptsWhenNodeNotReady(): 
Verifies backoff reduces retry attempts and validates exponential timing 
sequence (100ms, 200ms, 400ms, 800ms, capped at 1000ms)
   2. Added testBackoffResetsAfterSuccessfulSend(): Ensures backoff state 
resets after successful send
   3. Updated newNetworkClientDelegate() helper to set default backoff 
properties for testing
   
   ### Configuration
   The backoff behavior uses existing consumer configuration:
   retry.backoff.ms (default: 100ms): Initial backoff interval
   retry.backoff.max.ms (default: 1000ms): Maximum backoff interval
   The exponential backoff uses a base of 2 with 20% jitter, as defined in 
CommonClientConfigs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] KAFKA-16799 Add exponential backoff to NetworkClientDelegate when node is not ready [kafka]

Reply via email to