apoorvmittal10 commented on PR #16842:
URL: https://github.com/apache/kafka/pull/16842#issuecomment-2442959652

   So here is the summary. Sorry the PR is growing as now it's more of error 
handling and less of just leader epoch propagation.
   
   The current PR has following:
   
   - Fetches LeaderEpoch and passes to SharePartition. If leader epoch (fenced 
or state epoch), unknown topic partiton or unknown group error arrives then 
Share Partition will be removed from cache. At the same time the share 
partition will be marked fenced.
   - For same Fenced state has been added in the PR.
   - There exists 2 checks for fenced state in SharePartition a) While 
acquiring records b) While acquiring fetch lock. Hence any inflight request 
which tries to acquire records will and prior to that even while acquiring 
lock. This prevents us to send any new records to consumer for fenced share 
partition. However, I didn't add the check on acknowledge and release API as 
they will eventually fail while persisting, if should.
   - Added the error handling for leader epoch at SharePartiton level, which 
means if in a request for 5 topic partitions, if 3 are fenced then the request 
should proceed for remaining 2 share partitions rather failing completely.
   
   Some follow ups to do:
   - https://issues.apache.org/jira/browse/KAFKA-17510 - Additional error 
handling and continue fetching once the initilization of share partition is 
completed and the request is in purgatory.
   - https://issues.apache.org/jira/browse/KAFKA-17887 - Error handling for 
response log result in delayed share fetch.
   - Better state machine transition in Share Partition.
   - Consider using listener to fence and remove Share Partition.
   - Add a check for leader epoch prior removing share partition instance.
   - Send error partitions response as well, currenlty if all partitions fail 
or succeed then response is sent for all else only for which data is fetched.
   - Additionally I am thinking to rename ShareFetchData to ShareFetch and 
provide better handling of future completion as the code is a bit fragmented.
   
   I am planning to have couple of follow up PRs, if @junrao @AndrewJSchofield 
@mumrah you think is fine. As with tests and changes, this PR is getting bigger 
and bigger.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to