Hi Jorge, You should check the JIRA: https://issues.apache.org/jira/browse/KAFKA-16385 where we had some discussion. Welcome to provide your thoughts there.
Thanks. Luke On Thu, Mar 21, 2024 at 3:33 PM Jorge Esteban Quilcate Otoya < quilcate.jo...@gmail.com> wrote: > Hi dev community, > > I'd like to share some findings on how rotation of active segment differ > depending on whether topic retention is time- or size-based. > > I was (wrongly) under the assumption that active segments are only rotated > when segment configs (segment.bytes (1GiB) or segment.ms (7d)) or global > log configs (log.roll.ms) force it -- regardless of the retention > configuration. > This seems to be different depending on how retention is defined: > > - If a topic has a retention based on time[1], the condition to rotate the > active segment is based on the latest timestamp. If the difference with > current time is largest than retention time, then segment (including > active) should be deleted. Active segment is rotated, and in next round is > deleted. > > - If a topic has retention based on size[2] though, the condition not only > depends on the size of the segment itself but first on the total log size, > forcing to always have at least a single (active) segment: first difference > between total log size and retention is calculated, let's say a single > segment of 5MB and retention is 1MB; then total difference is 4MB, then the > condition to delete validates if the difference of the current segment and > the total difference is higher than zero, then delete. As the segment size > will always be higher than the total difference when there is a single > segment, then there will always be at least 1 segment. In this case the > only case where active segment is rotated it is when a new message arrives. > > Added steps to reproduce[3]. > > Maybe I missing something obvious, but this seems inconsistent to me. > Either both retention configs should rotate active segments, or none of > them should and active segment should be only governed by segment bytes|ms > configs or log.roll config. > > I believe it's a useful feature to "force" active segment rotation without > changing segment of global log rotation given that features like Compaction > and Tiered Storage can benefit from this; but would like to clarify this > behavior and make it consistent for both retention options, and/or call it > out explicitly in the documentation. > > Looking forward to your feedback! > > Jorge. > > [1]: > > https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1566 > [2]: > > https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583 > > [3]: https://gist.github.com/jeqo/d32cf07493ee61f3da58ac5e77b192b2 >