Dmitry Vsekhvalnov created KAFKA-6614:
Summary: kafka-streams to configure internal topics
Issue Type: Improvement
Reporter: Dmitry Vsekhvalnov
After fixing KAFKA-4785 all internal topics using built-in
*RecordMetadataTimestampExtractor* to read timestamps.
Which doesn't seem to work correctly out of box with kafka brokers configured
with *log.message.timestamp.type=LogAppendTime* when using custom message
Example use-case windowed grouping + aggregation on late data:
KTable<Windowed<Tuple>, Long> summaries = in
.groupBy((key, value) -> ......)
when processing late events:
# custom timestamp extractor will pick up timestamp in the past from message
(let's say hour ago)
# re-partition topic during grouping phase will be written back to kafka using
timestamp from (1)
# kafka broker will ignore provided timestamp in (2) to favor ingestion time
# streams lib will read re-partitioned topic back with
# and will get ingestion timestamp (3), which usually close to "now"
# window start/end will be incorrectly set based on "now" instead of original
timestamp from payload
Understand there are ways to configure per-topic timestamp type in kafka
brokers to solve this, but it will be really nice if kafka-streams library can
take care of it itself.
To follow "least-surprise" principle. If library relies on timestamp.type for
topic it manages it should enforce it.
CC [~guozhang] based on user group email discussion.
This message was sent by Atlassian JIRA