showuon commented on PR #14136:
URL: https://github.com/apache/kafka/pull/14136#issuecomment-1693132900

   @clolov , thanks for the detailed analysis. I admit I didn't check it so 
detailed last time. I just saw in `RLM#onLeadershipChange`, we did use 
`metadataCache`, so we should make sure `metadataCache` comes first before 
LISR. Had another look, we can still pass the `topicIDs` into 
`RLM#onLeadershipChange` to bypass it. So it looks fine.
   
   For `fetchRemoteLogSegmentMetadata` and `findOffsetByTimestamp`, I have no 
concern since like you said, they are used in consumer, which means the 
metadataCache must be have updated. Although they can also be used by 
adminClient, some delay for the adminClient usage should be acceptable.
   
   For `stopPartition`, I found not only Controller will trigger it, the client 
can send `deleteTopic` to trigger it. In this case, we might delete less 
partitions than expected like this case:
   1. LISR arrived in RLM, created some RLM tasks for the leader partitions
   2. delete topic comes in, miss the new added partitions in step (1)
   3. updateMetadata arrived, updated the cache
   
   I'm not quite sure if it's possible. TBH, I'm a little worried about 
introducing potential bugs due to some edge case we didn't know, due to we 
already confirmed there are some time gap between LISR and UMR. That's my 
thought. I'd also like to hear other reviewers' comments. 
   
   Thanks again for the detailed analysis.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to