[GitHub] [kafka] divijvaidya commented on pull request #13111: KAFKA-14190: Update Zk TopicId from locally stored cache in controller

via GitHub Tue, 24 Jan 2023 04:02:16 -0800


divijvaidya commented on PR #13111:
URL: https://github.com/apache/kafka/pull/13111#issuecomment-1401824565

@dajac
> Will this code still be around by the time tiered storage is completed?
I don't know but my point is that this code change is simple and safe enough
to add it to the current code as of today.

@jolshan
> My other concern here is that even though this fixes the issue in the case
where the controller stays the same, it doesn't cover controller re-election.
This means we would still have to share and support the recovery methods. If
this is a big issue for tiered storage, then we could still be in trouble.

To be very precise here, this fix won't work, if the controller context does
not have the old topic Id. It will only happen when controller failover took
place exactly between the duration when admin overwrote Zk and controller. Note
that controller failover during all other time will work fine (since controller
will recreate controller context from Zk which would have been updated with
oldTopicId earlier).

And yes, I agree this is not a 100% fix but it's a start. Since, it's a safe
fix and doesn't have side effects, we should push it out.

> Also curious if we can upload a segment with the wrong ID if the leader
and ISR request is blocked (and thus can't become a leader or follower)

Great question! The topic Id mismatch check [during handling of LISR
request](https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1495)
is based on matching the local topic Id in the broker with the one that is
sent with LISR. However, it's very much possible to not have any topicId
locally. As an example, let's say the partition reassignment leads to partition
placement on a broker where log hasn't been created so far. In such cases, LISR
won't throw a topic mismatch error and it won't be blocked. Instead it will
start operating with new topic Id. Now, we will have some followers working
with old topic Id (where LISR was blocked) and some with new topic Id. If a
failover happens to the one with new topic Id, it will start uploading segments
to tiered storage with new topic Id and thus, for the same topic partition, we
will have segments with old topic Id as well as new topic Id.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] divijvaidya commented on pull request #13111: KAFKA-14190: Update Zk TopicId from locally stored cache in controller

Reply via email to