apoorvmittal10 commented on PR #16842: URL: https://github.com/apache/kafka/pull/16842#issuecomment-2442959652
So here is the summary. Sorry the PR is growing as now it's more of error handling and less of just leader epoch propagation. The current PR has following: - Fetches LeaderEpoch and passes to SharePartition. If leader epoch (fenced or state epoch), unknown topic partiton or unknown group error arrives then Share Partition will be removed from cache. At the same time the share partition will be marked fenced. - For same Fenced state has been added in the PR. - There exists 2 checks for fenced state in SharePartition a) While acquiring records b) While acquiring fetch lock. Hence any inflight request which tries to acquire records will and prior to that even while acquiring lock. This prevents us to send any new records to consumer for fenced share partition. However, I didn't add the check on acknowledge and release API as they will eventually fail while persisting, if should. - Added the error handling for leader epoch at SharePartiton level, which means if in a request for 5 topic partitions, if 3 are fenced then the request should proceed for remaining 2 share partitions rather failing completely. Some follow ups to do: - https://issues.apache.org/jira/browse/KAFKA-17510 - Additional error handling and continue fetching once the initilization of share partition is completed and the request is in purgatory. - https://issues.apache.org/jira/browse/KAFKA-17887 - Error handling for response log result in delayed share fetch. - Better state machine transition in Share Partition. - Consider using listener to fence and remove Share Partition. - Add a check for leader epoch prior removing share partition instance. - Send error partitions response as well, currenlty if all partitions fail or succeed then response is sent for all else only for which data is fetched. - Additionally I am thinking to rename ShareFetchData to ShareFetch and provide better handling of future completion as the code is a bit fragmented. I am planning to have couple of follow up PRs, if @junrao @AndrewJSchofield @mumrah you think is fine. As with tests and changes, this PR is getting bigger and bigger. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
