[
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15603377#comment-15603377
]
Jun Rao commented on KAFKA-4099:
--------------------------------
[~becket_qin], thanks the explanation. What you described makes sense. So the
issue is probably not that bad since the log won't be rolled as frequently as I
thought. In the worse case, if we hit this issue, we may create twice as many
segments as we ideally want to have in the interim. However, since this is
relatively rare, we can probably just leave the current implementation as it is.
A related issue is on log retention. Suppose that an app reprocesses data from
more than 7 days ago. What will happen is that those data will be written to a
log segment only to be deleted when the log retention thread kicks in, at which
point, a new segment will be rolled. So, in this case, a log will be rolled as
frequently as log.retention.check.interval.ms, which defaults to 5 mins. I am
wondering if we should improve this by configuring
log.message.timestamp.difference.max.ms to match log.retention.ms. This will
avoid older data to be unnecessarily written to the log. It will help
time-based log rolling as well.
> Change the time based log rolling to only based on the message timestamp.
> -------------------------------------------------------------------------
>
> Key: KAFKA-4099
> URL: https://issues.apache.org/jira/browse/KAFKA-4099
> Project: Kafka
> Issue Type: Bug
> Components: core
> Reporter: Jiangjie Qin
> Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs,
> the newly created replica may have messages with old timestamp and cause the
> log segment rolling for each message. The fix is to change the log rolling
> behavior to only based on the message timestamp when the messages are in
> message format 0.10.0 or above. If the first message in the segment does not
> have a timetamp, we will fall back to use the wall clock time for log rolling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)