Hi dev community, I'd like to share some findings on how rotation of active segment differ depending on whether topic retention is time- or size-based.
I was (wrongly) under the assumption that active segments are only rotated when segment configs (segment.bytes (1GiB) or segment.ms (7d)) or global log configs (log.roll.ms) force it -- regardless of the retention configuration. This seems to be different depending on how retention is defined: - If a topic has a retention based on time[1], the condition to rotate the active segment is based on the latest timestamp. If the difference with current time is largest than retention time, then segment (including active) should be deleted. Active segment is rotated, and in next round is deleted. - If a topic has retention based on size[2] though, the condition not only depends on the size of the segment itself but first on the total log size, forcing to always have at least a single (active) segment: first difference between total log size and retention is calculated, let's say a single segment of 5MB and retention is 1MB; then total difference is 4MB, then the condition to delete validates if the difference of the current segment and the total difference is higher than zero, then delete. As the segment size will always be higher than the total difference when there is a single segment, then there will always be at least 1 segment. In this case the only case where active segment is rotated it is when a new message arrives. Added steps to reproduce[3]. Maybe I missing something obvious, but this seems inconsistent to me. Either both retention configs should rotate active segments, or none of them should and active segment should be only governed by segment bytes|ms configs or log.roll config. I believe it's a useful feature to "force" active segment rotation without changing segment of global log rotation given that features like Compaction and Tiered Storage can benefit from this; but would like to clarify this behavior and make it consistent for both retention options, and/or call it out explicitly in the documentation. Looking forward to your feedback! Jorge. [1]: https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1566 [2]: https://github.com/apache/kafka/blob/55a6d30ccbe971f4d2e99aeb3b1a773ffe5792a2/core/src/main/scala/kafka/log/UnifiedLog.scala#L1575-L1583 [3]: https://gist.github.com/jeqo/d32cf07493ee61f3da58ac5e77b192b2