Matthias J. Sax created KAFKA-20457:
---------------------------------------

             Summary: Consider to exclude repartition topics from 
auto.offset.reset
                 Key: KAFKA-20457
                 URL: https://issues.apache.org/jira/browse/KAFKA-20457
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Matthias J. Sax


In Kafka Streams, repartition topics serve a very special purpose to shuffle 
intermediate data, that is not fully processed yet. To avoid any data-loss, 
these topics are configured with infinite retention time and use explicit 
"delete record" requests.

However, repartition topic still apply auto.offset.reset strategy. While it is 
expected that auto.offset.rest would only fire a single time at startup, when 
the repartition topic is still empty (and thus "latest" vs "earliest" does not 
matter), there is not really a guard to catch any unexpected reset. Any 
unexpected reset most likely indicates some severe issue, and potential data 
loss.

Thus, it seems best to hard-code reset strategy "none" for repartition topics. 
For this case, we would still need to set some start offset to avoid triggering 
the auto.offset.reset mechanism.

We could extend the "create repartition topic" step (either client side during 
rebalance or broker side via KIP-1071) with a "commit offset zero" step to 
close this gap.

We might also need to double check application reset (both "classic" and 
"streams"), and explicit topic initialization 
([https://cwiki.apache.org/confluence/display/KAFKA/KIP-698%3A+Add+Explicit+User+Initialization+of+Broker-side+State+to+Kafka+Streams])
 to ensure nothing breaks.

While this is some change in behavior, I don't think we would need a KIP for 
this, as I would consider it more like a bug-fix, than a new feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to