[ https://issues.apache.org/jira/browse/KAFKA-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229273#comment-17229273 ]
Sagar Rao commented on KAFKA-10652: ----------------------------------- hi [~hachikuji], while the PR for 10634 is getting reviewed, I started looking at this issue as well in terms of code changes. As far as my analysis goes, this is the place: [https://github.com/apache/kafka/blob/trunk/raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java#L1496-L1514] where the batches are appended and the logs are flushed. Here, it is checked the time to flush has arrived and if it has, then it is fsynced. I was thinking to add the check for exceeding the configured bytes, then the records are flushed. Going by your statement in the description: *In other words, if we accumulate a configurable N bytes, then we should not wait for linger expiration and should just fsync immediately* it seems like we want to have more priority for this than the time based linger, so probably we can add this check before checking for the time to flush check here. Couple of points which I wanted to make: 1) this method maybeAppendBatches returns a timeToFlush which is used by the NetworkChannel to drain messages. If I were to add the size based flush here, then what do we return as timeToFlush from here? 2) There is a field called maxBatchSize which is configured in BatchAccumulator which seems to me as the max batch size. Currently, it is set to 2 ^ 20. We will need to add a cap on the min size shouldn't exceed certain values (or atleast not the maxBatchSize)? Plz let me know your thoughts on this. > Raft leader should flush accumulated writes after a min size is reached > ----------------------------------------------------------------------- > > Key: KAFKA-10652 > URL: https://issues.apache.org/jira/browse/KAFKA-10652 > Project: Kafka > Issue Type: Sub-task > Reporter: Jason Gustafson > Assignee: Sagar Rao > Priority: Major > > In KAFKA-10601, we implemented linger semantics similar to the producer to > let the leader accumulate a batch of writes before fsyncing them to disk. > Currently the fsync is only based on the linger time, but it would be helpful > to make it size-based as well. In other words, if we accumulate a > configurable N bytes, then we should not wait for linger expiration and > should just fsync immediately. -- This message was sent by Atlassian Jira (v8.3.4#803005)