lianetm commented on PR #16833: URL: https://github.com/apache/kafka/pull/16833#issuecomment-2293832197
Hey both! Seems to me we're on more or less on the same page :) (and I'll recap further down), but first, the only thing I don't quite get. You both refer to how the `NetworkClient` handles time outs, and I do agree it does, but isn't that a different timeout? The network layer has a timeout applied generally, not specific per-request, taken from the `REQUEST_TIMEOUT_MS` config, and passed into the network layer when creating a new consumer, follow from [here](https://github.com/apache/kafka/blob/cd47b3c1cce7a9f1881e40d38742b9ec8b30cf32/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ClassicKafkaConsumer.java#L197), for instance). Basically, that's how long a client will wait for responses from the server. That's not the same as the timeout provided by a specific api call like `committed` or `commit` (the ones we're trying to respect with all these failAndRemove logic), indicating how long that specific call should block waiting to complete. Am I missing somethin g here? That clarification aside, I would say I agree with your comments: > I feel fire-completed-request-and-forget is more accurate 😄 . IMHO, re-sending a "completed/timeout" request is a weird behavior. If we do want to have send-and-forget mode for the timeout request. Totally agree, but the intention is really not re-sending expired, it's just to make sure that we are sending at least once (even if timeout 0, for ex). That being said, totally agree that in the fetch path is weird and seems to serve no purpose. It's useful/wanted in other paths though, commit for instance, where we do want to trigger the commit even though we fail/timeout right away because of timeout 0, just meaning that the client won't wait for a response. > we can just remove failAndRemoveExpiredFetchRequests since NetworkClient can handle the timeout too. Agreed, seems sensible to remove the `failAndRemoveExpiredFetchRequests`, but I would say it's because what it does it already done somewhere else (not the NetworkClient). It does 3 things: 1. completeExceptionally the request if it's expired -> this is already done by the event reaping logic that applies to all events, and does completeExceptionally if expired, right after polling the managers ([ConsumerNetworkTread#reapExpiredApplicationEvents](https://github.com/apache/kafka/blob/cd47b3c1cce7a9f1881e40d38742b9ec8b30cf32/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L156)). 2. remove the request from the internal buffer `unsentOffsetFetches` -> this is already done in the commitMgr after it gets the requests to send and calls [clearAll](https://github.com/apache/kafka/blob/cd47b3c1cce7a9f1881e40d38742b9ec8b30cf32/clients/src/main/java/org/apache/kafka/clients/consumer/internals/CommitRequestManager.java#L1208) 3. ensures that the event was sent at least once (because of where it was called) -> this would still be the case even if we remove it, mainly because the requests would only be removed on clearAll after having being added to the `inflightOffsetFetches` I totally be missing something though, thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org