Matthias J. Sax created KAFKA-20457:
---------------------------------------
Summary: Consider to exclude repartition topics from
auto.offset.reset
Key: KAFKA-20457
URL: https://issues.apache.org/jira/browse/KAFKA-20457
Project: Kafka
Issue Type: Improvement
Components: streams
Reporter: Matthias J. Sax
In Kafka Streams, repartition topics serve a very special purpose to shuffle
intermediate data, that is not fully processed yet. To avoid any data-loss,
these topics are configured with infinite retention time and use explicit
"delete record" requests.
However, repartition topic still apply auto.offset.reset strategy. While it is
expected that auto.offset.rest would only fire a single time at startup, when
the repartition topic is still empty (and thus "latest" vs "earliest" does not
matter), there is not really a guard to catch any unexpected reset. Any
unexpected reset most likely indicates some severe issue, and potential data
loss.
Thus, it seems best to hard-code reset strategy "none" for repartition topics.
For this case, we would still need to set some start offset to avoid triggering
the auto.offset.reset mechanism.
We could extend the "create repartition topic" step (either client side during
rebalance or broker side via KIP-1071) with a "commit offset zero" step to
close this gap.
We might also need to double check application reset (both "classic" and
"streams"), and explicit topic initialization
([https://cwiki.apache.org/confluence/display/KAFKA/KIP-698%3A+Add+Explicit+User+Initialization+of+Broker-side+State+to+Kafka+Streams])
to ensure nothing breaks.
While this is some change in behavior, I don't think we would need a KIP for
this, as I would consider it more like a bug-fix, than a new feature.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)