[ https://issues.apache.org/jira/browse/KAFKA-9959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Tan updated KAFKA-9959: ----------------------------- Issue Type: Improvement (was: Bug) > leastLoadedNode() does not provide a node fairly > ------------------------------------------------ > > Key: KAFKA-9959 > URL: https://issues.apache.org/jira/browse/KAFKA-9959 > Project: Kafka > Issue Type: Improvement > Reporter: Cheng Tan > Assignee: Cheng Tan > Priority: Major > > Currently, leastLoadedNode() provides a node with the following criteria: > # Provide the connected node with least number of inflight requests > # If no connected node exists, provide the connecting node with the largest > index in the cached list of nodes. > # If no connected or connecting node exists, provide the disconnected node > which respects the reconnect backoff with the largest index in the cached > list of nodes. > However, criteria 2 and 3 may cause issues. > > Criteria 2: Since the timeoutCallsToSend() does not change the connection > status, the node will remain a connecting status after the request time out. > If no connected node exists, leastLoadedNode() will provide this same node > until the socket timeout reached. It would be better to overlook the > connecting node if any request has timed out on it. > > Criteria3: If the time interval between two invokes of leastLoadedNode() is > greater than the reconnect.backoff.ms, the same disconnected node may be > provided. We also want to pick a node with the least number of failed times. -- This message was sent by Atlassian Jira (v8.3.4#803005)