[ https://issues.apache.org/jira/browse/KAFKA-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213531#comment-16213531 ]
Sean Rohead commented on KAFKA-6101: ------------------------------------ Looking at your patch strictly from a code review perspective (not having yet tested it), I see that the current implementation of disconnected() calls updateReconnectBackoff(nodeState) which modifies the value of nodeState.reconnectBackoffMs. This value is not being preserved when you are creating the new nodeState in connecting() so the value will be reset back to reconnectBackoffInitMs. I not 100% certain, but I think the new nodeState should preserve that value across connection attempts. I guess my other question would be if it is possible to just leave the existing nodeState instance in the map (if there is one) instead of creating a new one -- just update the state and lastConnectAttemptMs. This is what the disconnected() method does. > Reconnecting to broker does not exponentially backoff > ----------------------------------------------------- > > Key: KAFKA-6101 > URL: https://issues.apache.org/jira/browse/KAFKA-6101 > Project: Kafka > Issue Type: Bug > Components: clients > Affects Versions: 0.11.0.0 > Reporter: Sean Rohead > Attachments: 6101.v2.txt, text.html > > > I am using com.typesafe.akka:akka-stream-kafka:0.17 which relies on > kafka-clients:0.11.0.0. > I have set the reconnect.backoff.max.ms property to 60000. > When I start the application without kafka running, I see a flood of the > following log message: > [warn] o.a.k.c.NetworkClient - Connection to node -1 could not be > established. Broker may not be available. > The log messages occur several times a second and the frequency of these > messages does not decrease over time as would be expected if exponential > backoff was working properly. > I set a breakpoint in the debugger in ClusterConnectionStates:188 and noticed > that every time this breakpoint is hit, nodeState.failedAttempts is always 0. > This is why the delay does not increase exponentially. It also appears that > every time the breakpoint is hit, it is on a different instance, so even > though the number of failedAttempts is incremented, we never get the > breakpoint for the same instance more than one time. -- This message was sent by Atlassian JIRA (v6.4.14#64029)