[ https://issues.apache.org/jira/browse/KAFKA-10357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17177007#comment-17177007 ]
Bruno Cadonna commented on KAFKA-10357: --------------------------------------- The {{KafkaStreams#initialize()}} approach would not solve the non-rolling upgrade scenario, right? Moreover, {{KafkaStreams#initialize()}} does not avoid data loss completely, because a repartition topic deletion could happen and a new Streams client could be started before the rebalance that should report the error took place. In that case, the error would not even be reported, because the {{KafkaStreams#initialize()}} would have already created the repartition topic. > Handle accidental deletion of repartition-topics as exceptional failure > ----------------------------------------------------------------------- > > Key: KAFKA-10357 > URL: https://issues.apache.org/jira/browse/KAFKA-10357 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Guozhang Wang > Assignee: Bruno Cadonna > Priority: Major > > Repartition topics are both written by Stream's producer and read by Stream's > consumer, so when they are accidentally deleted both clients may be notified. > But in practice the consumer would react to it much quicker than producer > since the latter has a delivery timeout expiration period (see > https://issues.apache.org/jira/browse/KAFKA-10356). When consumer reacts to > it, it will re-join the group since metadata changed and during the triggered > rebalance it would auto-recreate the topic silently and continue, causing > data lost silently. > One idea, is to only create all repartition topics *once* in the first > rebalance and not auto-create them any more in future rebalances, instead it > would be treated similar as INCOMPLETE_SOURCE_TOPIC_METADATA error code > (https://issues.apache.org/jira/browse/KAFKA-10355). > The challenge part would be, how to determine if it is the first-ever > rebalance, and there are several wild ideas I'd like to throw out here: > 1) change the thread state transition diagram so that STARTING state would > not transit to PARTITION_REVOKED but only to PARTITION_ASSIGNED, then in the > assign function we can check if the state is still in CREATED and not RUNNING. > 2) augment the subscriptionInfo to encode whether or not this is the first > time ever rebalance. -- This message was sent by Atlassian Jira (v8.3.4#803005)