dajac commented on PR #12674: URL: https://github.com/apache/kafka/pull/12674#issuecomment-1273387524
> I think one perspective we could take is that consumers which have not implemented the protocol should not be supported. So perhaps we should only check whether leader epoch is provided and reject the request otherwise. After all, even if follower fetching is intended, the user might still encounter spurious `OFFSET_OUT_OF_RANGE` errors if the client has not implemented the protocol. That would be ideal but that seems quite brutal for existing deployment. There are definitely a few users using those clients with follower fetching. I wonder if we should consider something like bumping the fetch request version and do this as well in order to strengthen the protocol. That would at least give a chance to those clients to rectify their implementation if they want to use newer fetch versions in the future. I am not sure that it is worth it though. > Alternatively, maybe we should not return that error in the first place from followers if the client is not providing the leader epoch? We could return `OFFSET_NOT_AVAILABLE` instead if the leader epoch is -1. Would that work? librdkafka does not handle `OFFSET_NOT_AVAILABLE` well, unfortunately. It only retries the fetch request with a small backoff when it receives it but it does not refresh its metadata so the client would likely not rediscover the correct leader. It seems that there is no good solution to mitigate the issue for those existing clients. We've discussed the following approaches: 1) Disallow fetch from follower when there is no replica selector. This seems pretty safe but could be an issue when the cluster is rolled to enable the selector. We could make the config dynamic to mitigate this. Clients using follower fetching would still be subject to the issue though. 2) Disallow fetch from follower when there is no leader epoch and no rack id in the request. This would ensure that existing clients that does not use follower fetching are safe. However, we have no guarantee that it would not break existing deployment because a selector does not necessary use rack.id. Clients using follower fetching would still be subject to the issue though. 3) Disallow fetch from follower when there is no leader epoch in the request. This would prevent existing clients from working. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
