[ 
https://issues.apache.org/jira/browse/KAFKA-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16418073#comment-16418073
 ] 

Daniel Wojda commented on KAFKA-6437:
-------------------------------------

I would like to add my comment as a user of Kafka Streams and author of 
KAFKA-6720.
Important missing information here is that if you start Kafka Streams 
application without input topics created, it'll log a warning and stays in this 
"idle" state until you create that topic(s) *AND* a rebalancing happens. If you 
check the status of stream it will be "RUNNING". What is more, please correct 
me if I'm wrong, checking consumer lag will not help, because lag will be 0 
(number of messages in non-existing topic is 0). 

As [~mjsax] already mentioned "it's well documented that you need to create all 
input topics before you start your application", so in my opinion "stopping the 
world and failing" is a better option than starting a "zombie" application. 
I understand that Kafka Streams has many users, other developers can have a 
different opinion than me, but in that case I'd suggest introducing a new 
config. "fail-on-missing-topic"? WDYT?

> Streams does not warn about missing input topics, but hangs
> -----------------------------------------------------------
>
>                 Key: KAFKA-6437
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6437
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 1.0.0
>         Environment: Single client on single node broker
>            Reporter: Chris Schwarzfischer
>            Assignee: Mariam John
>            Priority: Minor
>              Labels: newbie
>
> *Case*
> Streams application with two input topics being used for a left join.
> When the left side topic is missing upon starting the streams application, it 
> hangs "in the middle" of the topology (at …00009, see below). Only parts of 
> the intermediate topics are created (up to …00009)
> When the missing input topic is created, the streams application resumes 
> processing.
> {noformat}
> Topology:
> StreamsTask taskId: 2_0
>       ProcessorTopology:
>               KSTREAM-SOURCE-0000000011:
>                       topics:         
> [mystreams_app-KTABLE-AGGREGATE-STATE-STORE-0000000009-repartition]
>                       children:       [KTABLE-AGGREGATE-0000000012]
>               KTABLE-AGGREGATE-0000000012:
>                       states:         
> [KTABLE-AGGREGATE-STATE-STORE-0000000009]
>                       children:       [KTABLE-TOSTREAM-0000000020]
>               KTABLE-TOSTREAM-0000000020:
>                       children:       [KSTREAM-SINK-0000000021]
>               KSTREAM-SINK-0000000021:
>                       topic:          data_udr_month_customer_aggregration
>               KSTREAM-SOURCE-0000000017:
>                       topics:         
> [mystreams_app-KSTREAM-MAP-0000000014-repartition]
>                       children:       [KSTREAM-LEFTJOIN-0000000018]
>               KSTREAM-LEFTJOIN-0000000018:
>                       states:         
> [KTABLE-AGGREGATE-STATE-STORE-0000000009]
>                       children:       [KSTREAM-SINK-0000000019]
>               KSTREAM-SINK-0000000019:
>                       topic:          data_UDR_joined
> Partitions [mystreams_app-KSTREAM-MAP-0000000014-repartition-0, 
> mystreams_app-KTABLE-AGGREGATE-STATE-STORE-0000000009-repartition-0]
> {noformat}
> *Why this matters*
> The applications does quite a lot of preprocessing before joining with the 
> missing input topic. This preprocessing won't happen without the topic, 
> creating a huge backlog of data.
> *Fix*
> Issue an `warn` or `error` level message at start to inform about the missing 
> topic and it's consequences.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to