showuon commented on PR #14136: URL: https://github.com/apache/kafka/pull/14136#issuecomment-1693132900
@clolov , thanks for the detailed analysis. I admit I didn't check it so detailed last time. I just saw in `RLM#onLeadershipChange`, we did use `metadataCache`, so we should make sure `metadataCache` comes first before LISR. Had another look, we can still pass the `topicIDs` into `RLM#onLeadershipChange` to bypass it. So it looks fine. For `fetchRemoteLogSegmentMetadata` and `findOffsetByTimestamp`, I have no concern since like you said, they are used in consumer, which means the metadataCache must be have updated. Although they can also be used by adminClient, some delay for the adminClient usage should be acceptable. For `stopPartition`, I found not only Controller will trigger it, the client can send `deleteTopic` to trigger it. In this case, we might delete less partitions than expected like this case: 1. LISR arrived in RLM, created some RLM tasks for the leader partitions 2. delete topic comes in, miss the new added partitions in step (1) 3. updateMetadata arrived, updated the cache I'm not quite sure if it's possible. TBH, I'm a little worried about introducing potential bugs due to some edge case we didn't know, due to we already confirmed there are some time gap between LISR and UMR. That's my thought. I'd also like to hear other reviewers' comments. Thanks again for the detailed analysis. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org