jsancio commented on PR #16862: URL: https://github.com/apache/kafka/pull/16862#issuecomment-2287395733
> @jsancio thank for the patch! I had a few questions after a brief review > > If I understand this change, it looks like we are no longer tracking the next expected write offset and instead just appending to Raft to learn the last offset of the batch. I added a description to the PR and I think I address the motivation of this change there. Let me know if you have any other questions. > > 1. Is it safe to schedule the write to Raft before the replay calls? I think it probably is. These two operations can tasks (writing to log and updating in-memory state) can probably be thought of as concurrent. As long as the in-memory state creates a snapshot with the correct offset, it should be fine. WDYT? It is not safe to schedule write before replay but this change doesn't do that. At a high-level the new controller algorithm for write events is: 1. generate the cluster metadata records 2. stage them in the RaftClient so that the offsets are allocated (RaftClient#prepareAppend) 3. replay the cluster metadata records against the active controller state machine 4. tell the RaftClient client that is can write/append the staged batches to the log (RaftClient#schedulePreparedAppend) > 2. Do we need to track nextWriteOffset any more if we aren't using it? We the nextWriteOffset to set the broker epoch when registering brokers. I think this just needs to be monotonic relative to previous broker epochs. Maybe we can remove nextWriteOffset from OffsetControlManager and just track the "last written offset" instead? Isn't that the same thing? Isn't `"last written offset" = "next write offset" - 1`? > 3. In the previous code we tracked the next expected offset and passed it down to Raft each time we did an append. AIUI, this sort of acted like an optimistic concurrency control. E.g., "only do this operation if the next offset is X". We still have the controller epoch which serves a similar function, but I'm curious if there are side effects to removing the external offset tracking. Yes. The issue with that solution is that is doesn't allow for the KRaft leader to append control records to the log. Appending control records to the log would make it so that the controller known offset doesn't match the kraft leader's LEO. The KRaft leader needs to append control records to implement dynamically adding and removing controller/voters. Take a look at the description of the PR. I have updated it since you last reviewed this PR. I tried to motivate the problem and explain the solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
