Mehari Beyene created KAFKA-14991:
-------------------------------------
Summary: Improving Producer's record timestamp validation
Key: KAFKA-14991
URL: https://issues.apache.org/jira/browse/KAFKA-14991
Project: Kafka
Issue Type: Improvement
Components: core, log
Reporter: Mehari Beyene
When time-based retention is configured, the timestamp provided by the producer
is used by default to determine the retention period of the log. Customers have
the option of changing the configuration to use the broker's timestamp by
overriding the configuration for "log.message.timestamp.type", but by default,
the producer's timestamp is used. The producer's record timestamp can be in the
past or future. Kafka determines the retention time of the log by comparing the
broker's time with the record's time.
Arguably, there can be use cases for a producer to send records with timestamps
that are in the past (for example, for replaying old data), but it is
inaccurate for records to have a timestamp that is far in the future compared
to the broker's current time.
There is a configurable property called "message.timestamp.difference.max.ms"
that customers can use to control the allowed time difference between the
broker's current time and the record timestamp. However, the validation from
the Kafka code side can be improved by rejecting records with future timestamps
from being written in the first place.
Customers have run into this issue in the past where a producer is configured
erroneously to set the record timestamp in nanoseconds instead of milliseconds,
resulting in a record timestamp that is in the future, and the time-based
retention policy did not kick in as expected.
The improvement I am proposing is to add basic validation in
org.apache.kafka.storage.internals.log.LogValidator to reject record timestamps
that are in the future compared to the broker current timestamp after
accounting for a sensible tolerance for potential clock skew.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)