[ 
https://issues.apache.org/jira/browse/KAFKA-16385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828594#comment-17828594
 ] 

Chia-Ping Tsai commented on KAFKA-16385:
----------------------------------------

{quote}
One potential way to improve this is to use the timestamp index to find the 
cutoff offset in the active segment and move the logStartOffset to that point. 
We need to understand if there is any additional I/O impact because of this.
{quote}

 not sure whether it is worthwhile improvement. We should not encourage users 
to expect that the cleanup can delete segments accurately. Especially, user can 
define their timestamp so the expired records could be still existent even 
though we can move the logStartOffset. For example: (non-expired record has 
offset=2, timestamp=100) and (expired record has offset=3, timestamp=90)

{quote}
 As you observed, the current implementation is a bit weird since it depends on 
whether there are new records or not. 
{quote}

That probably makes sense: The segment is NOT expired as it has new records :)

In short, the implementation of retention.ms could roll and then delete the 
active segment. We should improve the documents for such scenario.

> Segment is rolled before segment.ms or segment.bytes breached
> -------------------------------------------------------------
>
>                 Key: KAFKA-16385
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16385
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.5.1, 3.7.0
>            Reporter: Luke Chen
>            Assignee: Kuan Po Tseng
>            Priority: Major
>
> Steps to reproduce:
> 0. Startup a broker with `log.retention.check.interval.ms=1000` to speed up 
> the test.
> 1. Creating a topic with the config: segment.ms=7days , segment.bytes=1GB, 
> retention.ms=1sec .
> 2. Send a record "aaa" to the topic
> 3. Wait for 1 second
> Will this segment will rolled? I thought no.
> But what I have tested is it will roll:
> {code:java}
> [2024-03-19 15:23:13,924] INFO [LocalLog partition=t2-1, 
> dir=/tmp/kafka-logs_jbod] Rolled new log segment at offset 1 in 3 ms. 
> (kafka.log.LocalLog)
> [2024-03-19 15:23:13,925] INFO [ProducerStateManager partition=t2-1] Wrote 
> producer snapshot at offset 1 with 1 producer ids in 1 ms. 
> (org.apache.kafka.storage.internals.log.ProducerStateManager)
> [2024-03-19 15:23:13,925] INFO [UnifiedLog partition=t2-1, 
> dir=/tmp/kafka-logs_jbod] Deleting segment LogSegment(baseOffset=0, size=71, 
> lastModifiedTime=1710832993131, largestRecordTimestamp=1710832992125) due to 
> log retention time 1000ms breach based on the largest record timestamp in the 
> segment (kafka.log.UnifiedLog)
> {code}
> The segment is rolled due to log retention time 1000ms breached, which is 
> unexpected.
> Tested in v3.5.1, it has the same issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to