lianetm commented on PR #16241: URL: https://github.com/apache/kafka/pull/16241#issuecomment-2161273163
High level comment about the approach. Getting into caching results of fetch requests seems very tricky, given how easily they can become stale. The simple dedup approach to ensure we don't send a fetch request if there is another one in-flight for the same partitions seemed simple and safe (but not enough for poll, agreed). The problem was: fetch requests issued from within the poll loop were being expired with the same poll timeout, which could prevent the consumer from ever fetching offsets on low poll timeout. Sorry if I'm missing something, but couldn't we simply ensure that fetch requests issued from the poll have no timeout, and we apply the timeout only when getting responses. This means that we would leave the fetch request running, and on the next call to poll it would be either completed (so we would get the results), or in-flight (as long as the assignment does not change, we wouldn't issue a new fetch). The change I refer to is this snippet from [initWithCommittedOffsetsIfNeeded](https://github.com/apache/kafka/blob/aecaf4447561edd8da9f06e3abdf46f382dc9d89/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1670-L1675). It currently does fetch with timeout + get results. But what about just moving the timeout to the get result? 1. issue fetch (max_value as timeout) -> will run across poll iterations for the same set of partitions 2. get result with poll timeout -> this ensures that we respect the poll iteration timeout, while leaving fetch request running to be used on the next poll. Would this work to solve our problem? (while avoiding caching offsets on the client). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
