devtrace404 opened a new pull request, #21198: URL: https://github.com/apache/kafka/pull/21198
### Problem During stress testing, AsyncKafkaConsumer's NetworkClientDelegate did not back off when a node was not ready, causing excessive "Node is not ready" log messages. The Network client retried immediately, leading to log spam and unnecessary CPU usage. ### Solution Implements exponential backoff in NetworkClientDelegate to prevent immediate retries when a node is not ready: 1. Tracks backoff state per node: Uses a `ConcurrentHashMap<Node, BackoffState>` to maintain separate backoff state for each node, allowing independent backoff tracking. 2. Applies exponential backoff: When client.ready(node) returns false, the client now: - Records the failure using `recordNodeNotReady()` - Calculates the next retry time using ExponentialBackoff with the configured retry.backoff.ms (default 100ms) and retry.backoff.max.ms (default 1000ms) - Skips retry attempts until the backoff period has elapsed 3. Resets backoff on success: When a request is successfully sent to a node, the backoff state for that node is reset, ensuring the next failure starts with the initial backoff interval. ### Implementation Details #### NetworkClientDelegate.java: 1. Added fields: `retryBackoffMs`, `retryBackoffMaxMs`, `exponentialBackoff`, `nodeBackoffStates` 2. Modified `doSend()` to check backoff delay before sending and record failures 3. Added helper methods: `getBackoffDelay()`, `recordNodeNotReady()`, `resetBackoff()` 4. Added BackoffState inner class to track attempt count and next retry time per node #### NetworkClientDelegateTest.java: 1. Added testExponentialBackoffReducesRetryAttemptsWhenNodeNotReady(): Verifies backoff reduces retry attempts and validates exponential timing sequence (100ms, 200ms, 400ms, 800ms, capped at 1000ms) 2. Added testBackoffResetsAfterSuccessfulSend(): Ensures backoff state resets after successful send 3. Updated newNetworkClientDelegate() helper to set default backoff properties for testing ### Configuration The backoff behavior uses existing consumer configuration: retry.backoff.ms (default: 100ms): Initial backoff interval retry.backoff.max.ms (default: 1000ms): Maximum backoff interval The exponential backoff uses a base of 2 with 20% jitter, as defined in CommonClientConfigs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
