dajac commented on a change in pull request #9406: URL: https://github.com/apache/kafka/pull/9406#discussion_r509257293
########## File path: clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java ########## @@ -444,10 +444,25 @@ private boolean maybeSendAndPollTransactionalRequest() { AbstractRequest.Builder<?> requestBuilder = nextRequestHandler.requestBuilder(); Node targetNode = null; try { - targetNode = awaitNodeReady(nextRequestHandler.coordinatorType()); - if (targetNode == null) { + FindCoordinatorRequest.CoordinatorType coordinatorType = nextRequestHandler.coordinatorType(); + targetNode = coordinatorType != null ? + transactionManager.coordinator(coordinatorType) : + client.leastLoadedNode(time.milliseconds()); + if (targetNode != null) { + if (!awaitNodeReady(targetNode, coordinatorType)) { + log.trace("Target node {} not ready within request timeout, will retry when node is ready.", targetNode); + maybeFindCoordinatorAndRetry(nextRequestHandler); + return true; + } + } else if (coordinatorType != null) { + log.trace("Coordinator not known for {}, will retry {} after finding coordinator.", coordinatorType, requestBuilder.apiKey()); maybeFindCoordinatorAndRetry(nextRequestHandler); return true; + } else { + log.trace("No nodes available to send requests, will poll and retry when until a node is ready."); + transactionManager.retry(nextRequestHandler); + client.poll(retryBackoffMs, time.milliseconds()); + return true; Review comment: Looking at this branch again, sorry :). I was comparing it with the previous behavior and I have noticed that we would request a refresh of the metadata when the same conditions were met. That happened here: ``` private void maybeFindCoordinatorAndRetry(TransactionManager.TxnRequestHandler nextRequestHandler) { if (nextRequestHandler.needsCoordinator()) { transactionManager.lookupCoordinator(nextRequestHandler); } else { // For non-coordinator requests, sleep here to prevent a tight loop when no node is available time.sleep(retryBackoffMs); metadata.requestUpdate(); } transactionManager.retry(nextRequestHandler); } ``` When no node is available and `coordinatorType != null`, we ended up in the else branch here. I wonder if not doing `metadata.requestUpdate()` in our new implementation could be problematic and I also wonder if we could just swap that `time.sleep(retryBackoffMs)` by `client.poll(retryBackoffMs, time.milliseconds())` to achieve the same goal. The difference is that another metadata request would be sent in our particular case. Have you noticed this small difference? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org