dajac commented on PR #12674:
URL: https://github.com/apache/kafka/pull/12674#issuecomment-1273387524

   > I think one perspective we could take is that consumers which have not 
implemented the protocol should not be supported. So perhaps we should only 
check whether leader epoch is provided and reject the request otherwise. After 
all, even if follower fetching is intended, the user might still encounter 
spurious `OFFSET_OUT_OF_RANGE` errors if the client has not implemented the 
protocol.
   
   That would be ideal but that seems quite brutal for existing deployment. 
There are definitely a few users using those clients with follower fetching. I 
wonder if we should consider something like bumping the fetch request version 
and do this as well in order to strengthen the protocol. That would at least 
give a chance to those clients to rectify their implementation if they want to 
use newer fetch versions in the future. I am not sure that it is worth it 
though.
   
   > Alternatively, maybe we should not return that error in the first place 
from followers if the client is not providing the leader epoch? We could return 
`OFFSET_NOT_AVAILABLE` instead if the leader epoch is -1. Would that work?
   
   librdkafka does not handle `OFFSET_NOT_AVAILABLE` well, unfortunately. It 
only retries the fetch request with a small backoff when it receives it but it 
does not refresh its metadata so the client would likely not rediscover the 
correct leader.
   
   It seems that there is no good solution to mitigate the issue for those 
existing clients. We've discussed the following approaches:
   1) Disallow fetch from follower when there is no replica selector. This 
seems pretty safe but could be an issue when the cluster is rolled to enable 
the selector. We could make the config dynamic to mitigate this. Clients using 
follower fetching would still be subject to the issue though.
   2) Disallow fetch from follower when there is no leader epoch and no rack id 
in the request. This would ensure that existing clients that does not use 
follower fetching are safe. However, we have no guarantee that it would not 
break existing deployment because a selector does not necessary use rack.id. 
Clients using follower fetching would still be subject to the issue though.
   3) Disallow fetch from follower when there is no leader epoch in the 
request. This would prevent existing clients from working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to