[ 
https://issues.apache.org/jira/browse/KAFKA-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15750154#comment-15750154
 ] 

Jun Rao commented on KAFKA-4545:
--------------------------------

One potential way to fix this is when cleaning a segment after the dirty 
marker, we don't inherit the last modified time of the original segment. If we 
clean a segment before the dirty marker, we inherit the last modified time. 
This way, the last modified time of a cleaned segment is the time when it first 
gets cleaned. Not sure if this completely address this issue though.

> tombstone needs to be removed after delete.retention.ms has passed after it 
> has been cleaned
> --------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-4545
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4545
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.0
>            Reporter: Jun Rao
>
> The algorithm for removing the tombstone in a compacted is supposed to be the 
> following.
> 1. Tombstone is never removed when it's still in the dirty portion of the log.
> 2. After the tombstone is in the cleaned portion of the log, we further delay 
> the removal of the tombstone by delete.retention.ms since the time the 
> tombstone is in the cleaned portion.
> Once the tombstone is in the cleaned portion, we know there can't be any 
> message with the same key before the tombstone. Therefore, for any consumer, 
> if it reads a non-tombstone message before the tombstone, but can read to the 
> end of the log within delete.retention.ms, it's guaranteed to see the 
> tombstone.
> However, the current implementation doesn't seem correct. We delay the 
> removal of the tombstone by delete.retention.ms since the last modified time 
> of the last cleaned segment. However, the last modified time is inherited 
> from the original segment, which could be arbitrarily old. So, the tombstone 
> may not be preserved as long as it needs to be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to