dajac commented on a change in pull request #9406:
URL: https://github.com/apache/kafka/pull/9406#discussion_r509257293



##########
File path: 
clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java
##########
@@ -444,10 +444,25 @@ private boolean maybeSendAndPollTransactionalRequest() {
         AbstractRequest.Builder<?> requestBuilder = 
nextRequestHandler.requestBuilder();
         Node targetNode = null;
         try {
-            targetNode = awaitNodeReady(nextRequestHandler.coordinatorType());
-            if (targetNode == null) {
+            FindCoordinatorRequest.CoordinatorType coordinatorType = 
nextRequestHandler.coordinatorType();
+            targetNode = coordinatorType != null ?
+                    transactionManager.coordinator(coordinatorType) :
+                    client.leastLoadedNode(time.milliseconds());
+            if (targetNode != null) {
+                if (!awaitNodeReady(targetNode, coordinatorType)) {
+                    log.trace("Target node {} not ready within request 
timeout, will retry when node is ready.", targetNode);
+                    maybeFindCoordinatorAndRetry(nextRequestHandler);
+                    return true;
+                }
+            } else if (coordinatorType != null) {
+                log.trace("Coordinator not known for {}, will retry {} after 
finding coordinator.", coordinatorType, requestBuilder.apiKey());
                 maybeFindCoordinatorAndRetry(nextRequestHandler);
                 return true;
+            } else {
+                log.trace("No nodes available to send requests, will poll and 
retry when until a node is ready.");
+                transactionManager.retry(nextRequestHandler);
+                client.poll(retryBackoffMs, time.milliseconds());
+                return true;

Review comment:
       Looking at this branch again, sorry :). I was comparing it with the 
previous behavior and I have noticed that we would request a refresh of the 
metadata when the same conditions were met. That happened here:
   ```
       private void 
maybeFindCoordinatorAndRetry(TransactionManager.TxnRequestHandler 
nextRequestHandler) {
           if (nextRequestHandler.needsCoordinator()) {
               transactionManager.lookupCoordinator(nextRequestHandler);
           } else {
               // For non-coordinator requests, sleep here to prevent a tight 
loop when no node is available
               time.sleep(retryBackoffMs);
               metadata.requestUpdate();
           }
   
           transactionManager.retry(nextRequestHandler);
       }
   ```
   When no node is available and `coordinatorType != null`, we ended up in the 
else branch here. I wonder if not doing `metadata.requestUpdate()` in our new 
implementation could be problematic and I also wonder if we could just swap 
that `time.sleep(retryBackoffMs)` by `client.poll(retryBackoffMs, 
time.milliseconds())` to achieve the same goal. The difference is that another 
metadata request would be sent in our particular case.
   
   Have you noticed this small difference?  
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to