divijvaidya commented on PR #13111:
URL: https://github.com/apache/kafka/pull/13111#issuecomment-1401824565

   @dajac 
   > Will this code still be around by the time tiered storage is completed?
   I don't know but my point is that this code change is simple and safe enough 
to add it to the current code as of today.
   
   @jolshan 
   > My other concern here is that even though this fixes the issue in the case 
where the controller stays the same, it doesn't cover controller re-election. 
This means we would still have to share and support the recovery methods. If 
this is a big issue for tiered storage, then we could still be in trouble.
   
   To be very precise here, this fix won't work, if the controller context does 
not have the old topic Id. It will only happen when controller failover took 
place exactly between the duration when admin overwrote Zk and controller. Note 
that controller failover during all other time will work fine (since controller 
will recreate controller context from Zk which would have been updated with 
oldTopicId earlier). 
   
   And yes, I agree this is not a 100% fix but it's a start. Since, it's a safe 
fix and doesn't have side effects, we should push it out.
   
   > Also curious if we can upload a segment with the wrong ID if the leader 
and ISR request is blocked (and thus can't become a leader or follower)
   
   Great question! The topic Id mismatch check [during handling of LISR 
request](https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1495)
 is based on matching the local topic Id in the broker with the one that is 
sent with LISR. However, it's very much possible to not have any topicId 
locally. As an example, let's say the partition reassignment leads to partition 
placement on a broker where log hasn't been created so far. In such cases, LISR 
won't throw a topic mismatch error and it won't be blocked. Instead it will 
start operating with new topic Id. Now, we will have some followers working 
with old topic Id (where LISR was blocked) and some with new topic Id. If a 
failover happens to the one with new topic Id, it will start uploading segments 
to tiered storage with new topic Id and thus, for the same topic partition, we 
will have segments with old topic Id as well as new topic Id.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to