Re: [PR] KAFKA-14619: KRaft validate snapshot id are at batch boundries [kafka]

via GitHub Mon, 14 Oct 2024 12:07:21 -0700


jsancio commented on code in PR #17500:
URL: https://github.com/apache/kafka/pull/17500#discussion_r1799965560



##########
raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java:
##########
@@ -258,7 +260,8 @@ public KafkaRaftClient(
             localSupportedKRaftVersion,
             logContext,
             new Random(),
-            quorumConfig
+            quorumConfig,
+            new HashSet<>()

Review Comment:
   KRaft should not track the starting offsets in this way. There are many 
issues with this approach. The more important ones are:
   
   1. This doesn't track batch start offsets that were added to the log prior 
to starting the KRaft replica. It also doesn't track batch start offsets that 
were added by the leader.
   2. This doesn't handle log truncation.
   3. It is very memory inefficient. It is possible for KRaft to have a lot of 
small batches and each one of them would have an entry in this set.
   
   The log layer already track these offsets. We shouldn't duplicate this 
information in the KRaft layer. Take a look at the `KafkaMetadataLog#read` 
method to see how the log layer computes the batch start offset. Specifically, 
you can take a look at `LogSegment#translateOffset`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-14619: KRaft validate snapshot id are at batch boundries [kafka]

Reply via email to