ableegoldman commented on a change in pull request #9671: URL: https://github.com/apache/kafka/pull/9671#discussion_r537990731
########## File path: clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java ########## @@ -248,18 +243,26 @@ protected synchronized boolean ensureCoordinatorReady(final Timer timer) { break; } + RuntimeException fatalException = null; + if (future.failed()) { if (future.isRetriable()) { log.debug("Coordinator discovery failed, refreshing metadata", future.exception()); client.awaitMetadataUpdate(timer); - } else - throw future.exception(); + } else { + log.info("FindCoordinator request hit fatal exception", fatalException); + fatalException = future.exception(); + } } else if (coordinator != null && client.isUnavailable(coordinator)) { // we found the coordinator, but the connection has failed, so mark // it dead and backoff before retrying discovery markCoordinatorUnknown(); timer.sleep(rebalanceConfig.retryBackoffMs); } + + clearFindCoordinatorFuture(); Review comment: Ah, good point...actually I think that's probably not ok for it to only ever be cleared in the main thread, since eg the main thread might be stuck in long processing while the hb threads should not be blocked from looking up the coordinator. So, maybe we should also call `clearFindCoordinatorFuture` inside the hb thread in the `if (findCoordinatorFuture != null || lookupCoordinator().failed())` block (if it did indeed finish and has failed) -- WDYT? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org