Can you take a look at KIP-280:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction
?

On Mon, Aug 6, 2018 at 10:55 AM, Jan Lukavský <je...@seznam.cz> wrote:

> Hi,
>
> I have a question about log compaction. LogCleaner's JavaDoc states that:
>
> {quote}
>
> A message with key K and offset O is obsolete if there exists a message
> with key K and offset O' such that O < O'.
>
> {/quote}
>
> That works fine if messages are arriving "in-order", i.e. with timestamp
> assigned by log-append time (with some possible problems with clock
> synchronization during leader rebalance), but if topic might contain
> messages, that are late (because producer explicitly assignes timestamp to
> each message), then compacting purely by offset might cause message with
> older timestamp to be kept in the log in favor of newer message. Is this
> intentional? Would it be possible to relax this so that the log compaction
> would prefer message's timestamp instead of offset? What if the behavior of
> the LogCleaner would be changed to something like this:
>
> {quote}
>
> A message with key K, timestamp T1 and offset O1 is obsolete if there
> exists a message with key K, timestamp T2 and offset O2' such that T1 < T2
> or T1 = T2 and O1 < O2'.
>
> {/quote}
>
> I'm aware that this would be much more complicated (because of the clock
> synchronization problem that would have to be resolved), but this
> definition seems to be more aligned with time characteristic of the data.
> Should I try to create a KIP or this was already discussed and considered
> unwanted (or even impossible) feature?
>
> Thanks for any comments,
>
>  Jan
>
>

Reply via email to