[
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15596037#comment-15596037
]
Jun Rao commented on KAFKA-4099:
--------------------------------
I had two use cases of time-based rolling in mind. The first one is for users
who don't want to retain a message (say sensitive data) in the log for too
long. In this case, we want to be able to roll the log periodically based on
time such that it will freeze the largest timestamp in the rolled segment and
cause it to be deleted when the time limit has been reached. The second one is
for log cleaner to happen quicker since the cleaner never cleans the active
segment. In both cases, we really just want to be able to roll the log at some
predicable time interval. There are different implementations can achieve this.
The issue with the current implementation is that if data with oscillating
timestamp are published at the same time, it causes the log to roll to quickly,
which will surprise people. We can ask people to turn off log rolling in most
cases. However, the default log rolling is 7 days and people could hit this
issue before realizing it. In some of the rare cases, people may indeed want to
configure time-based log rolling and may still send data with oscillating
timestamp. It would be good if the underlying system can support his without
any performance impact.
As for a better implementation, the original approach of just rolling based on
create time addresses both use cases in the common cases, without the risk of
rolling too frequently. The only thing is that create time will be reset when
segments get moved. However, that happens rarely though. So, if there are no
other better solutions that we could think of, this could be a safer
implementation.
> Change the time based log rolling to only based on the message timestamp.
> -------------------------------------------------------------------------
>
> Key: KAFKA-4099
> URL: https://issues.apache.org/jira/browse/KAFKA-4099
> Project: Kafka
> Issue Type: Bug
> Components: core
> Reporter: Jiangjie Qin
> Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs,
> the newly created replica may have messages with old timestamp and cause the
> log segment rolling for each message. The fix is to change the log rolling
> behavior to only based on the message timestamp when the messages are in
> message format 0.10.0 or above. If the first message in the segment does not
> have a timetamp, we will fall back to use the wall clock time for log rolling.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)