lianetm commented on PR #17440:
URL: https://github.com/apache/kafka/pull/17440#issuecomment-2489427648

   Hey @m1a2st, sharing a thought in case it helps. First, the problem we have 
is that api calls like position/endOffsets trigger events that should fail with 
topic metadata errors but they don't, and are left hanging until they time out. 
So, with that in mind, it occurred to me that we do have all the events that 
are awaiting responses in hand when then `ConsumerNetworkThread.runOnce` 
happens, because we have them within the reaper, that keeps all the 
completableEvents so they can be expired eventually. Couldn't we take those 
events and let them know about the error when it happens? Then each event 
decides if it should fail on topic metadata error or not. I'm picturing 
something along these lines:
   
   On ConsumerNetworkThread.runOnce:
   ```
           // 1. get metadata error that happens here
           networkClientDelegate.poll(pollWaitTimeMs, currentTimeMs);
           ...
           // 2. get all awaiting events after expiration applies (the reaper 
has them all, not just the ones generated on the current runOnce)
           List<CompletableApplicationEvent> awaitingEvents = 
reapExpiredApplicationEvents(currentTimeMs);
   
           // 3. notify awaiting events about the metadata error
           if (metadataError != null) {
               awaitingEvents.forEach(e -> e.onMetadataError(metadataError));
           }
   ```
   Would that work?  I see that the main advantages would be to avoid the 
complexity of metadata future errors passed around to specific manager calls, 
and also it would be a solution applied consistently to all events (each event 
type then deciding if it should fail or not on topic metadata errors). 
onMetadataError, events could no-op by default, and some should override to 
simply do future.completeExceptionally, ex. `CheckAndUpdatePositionsEvent`, 
`CommitEvent` (these two seem to be the ones leading to the failed tests in the 
Authorizer file, we can get into details later about what others should 
consider the error).
   
   I could be missing something but sharing in case it helps! Let me know. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to