[jira] [Commented] (KAFKA-10357) Handle accidental deletion of repartition-topics as exceptional failure

Guozhang Wang (Jira) Mon, 24 Aug 2020 15:39:26 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-10357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183626#comment-17183626
 ]


Guozhang Wang commented on KAFKA-10357:
---------------------------------------

Theoretically, I think today there's no perfect solution to 2) above since in 
the extreme case, one can, a) delete topic, b) re-create topic, and c) re-fill 
the topic to make it has the same offsets as before, together in between of two 
consecutive consumer fetches, in which case consumer would never detect there's 
an issue happened. But on the other hand, I also agree with [~ableegoldman] 
that this is not the primary scenario that we want to guard against anyways and 
if people really go wild to make that procedure it is out of Kafka's processing 
guarantees today. Our focus should be just 1) and to avoid us (Streams) 
re-creating the topics.

Given that, I think at the moment #initialize plus an internal config to 
disable auto-internal-topic-creation (by default we would still enable it for 
compatibility) would be the easiest way to tackle 1), and it pushes the 
responsibility to users that they need to:

* Ideally, only pick one instance of their streams app to call initialize when 
starting their app for the first time --- note, if a query is "reset" then 
restarting is the same as starting for the first time.
* Set the internal config to disable auto-internal-topic-creation.

KAFKA-3370 can be helpful for both 1) and 2) but again it is not "perfect", so 
if we would have to eventually push it to users, then we'd better do it early 
than later.

> Handle accidental deletion of repartition-topics as exceptional failure
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-10357
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10357
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Bruno Cadonna
>            Priority: Major
>
> Repartition topics are both written by Stream's producer and read by Stream's 
> consumer, so when they are accidentally deleted both clients may be notified. 
> But in practice the consumer would react to it much quicker than producer 
> since the latter has a delivery timeout expiration period (see 
> https://issues.apache.org/jira/browse/KAFKA-10356). When consumer reacts to 
> it, it will re-join the group since metadata changed and during the triggered 
> rebalance it would auto-recreate the topic silently and continue, causing 
> data lost silently. 
> One idea, is to only create all repartition topics *once* in the first 
> rebalance and not auto-create them any more in future rebalances, instead it 
> would be treated similar as INCOMPLETE_SOURCE_TOPIC_METADATA error code 
> (https://issues.apache.org/jira/browse/KAFKA-10355).
> The challenge part would be, how to determine if it is the first-ever 
> rebalance, and there are several wild ideas I'd like to throw out here:
> 1) change the thread state transition diagram so that STARTING state would 
> not transit to PARTITION_REVOKED but only to PARTITION_ASSIGNED, then in the 
> assign function we can check if the state is still in CREATED and not RUNNING.
> 2) augment the subscriptionInfo to encode whether or not this is the first 
> time ever rebalance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-10357) Handle accidental deletion of repartition-topics as exceptional failure

Reply via email to