Mehari Beyene created KAFKA-14991:
-------------------------------------

             Summary: Improving Producer's record timestamp validation
                 Key: KAFKA-14991
                 URL: https://issues.apache.org/jira/browse/KAFKA-14991
             Project: Kafka
          Issue Type: Improvement
          Components: core, log
            Reporter: Mehari Beyene


When time-based retention is configured, the timestamp provided by the producer 
is used by default to determine the retention period of the log. Customers have 
the option of changing the configuration to use the broker's timestamp by 
overriding the configuration for "log.message.timestamp.type", but by default, 
the producer's timestamp is used. The producer's record timestamp can be in the 
past or future. Kafka determines the retention time of the log by comparing the 
broker's time with the record's time.

Arguably, there can be use cases for a producer to send records with timestamps 
that are in the past (for example, for replaying old data), but it is 
inaccurate for records to have a timestamp that is far in the future compared 
to the broker's current time.

There is a configurable property called "message.timestamp.difference.max.ms" 
that customers can use to control the allowed time difference between the 
broker's current time and the record timestamp. However, the validation from 
the Kafka code side can be improved by rejecting records with future timestamps 
from being written in the first place.

Customers have run into this issue in the past where a producer is configured 
erroneously to set the record timestamp in nanoseconds instead of milliseconds, 
resulting in a record timestamp that is in the future, and the time-based 
retention policy did not kick in as expected.

The improvement I am proposing is to add basic validation in 
org.apache.kafka.storage.internals.log.LogValidator to reject record timestamps 
that are in the future compared to the broker current timestamp after 
accounting for a sensible tolerance for potential clock skew.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to