[ 
https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110447#comment-17110447
 ] 

Guozhang Wang commented on KAFKA-6520:
--------------------------------------

Just to throw some more ideas here: the embedded clients are now all using 
async network IO and hence one would never get ClosedChannelException, instead, 
they will eventually get TimeoutException when the broker is actually offline. 
[~mjsax] is currently working on KIP-572 to let Streams be more resilient to 
such connectivity issues (broker unavailable is exposed the same as network 
in-connectivity), while if we have N tasks, we would still continue when only a 
subset of them cannot progress. On the other hand, a Streams client may talk to 
multiple brokers on behalf of different tasks, and as long as one of the tasks 
can still progress -- meaning, its corresponding required brokers are still 
reachable -- then we would not need to mark the client as disconnected.

Following this train of thoughts, I feel that we would only transit to the 
DISCONNECTED state if none of the tasks are progressing, indicating that none 
of the required brokers are available at the moment. Does that make sense? If 
yes then the scope of it can be much simplified, and maybe we can also just 
piggyback the proposal as part of KIP-572 so that we do not need a separate 
KIP. Of course, implementation wise Vince and Matthias can still proceed 
independently.

> When a Kafka Stream can't communicate with the server, it's Status stays 
> RUNNING
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-6520
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6520
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Michael Kohout
>            Priority: Major
>              Labels: newbie, user-experience
>
> KIP WIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams]
> When you execute the following scenario the application is always in RUNNING 
> state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer 
> connected(Stream State is still RUNNING, and the uncaught exception handler 
> isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
>  See 
> [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] 
> for a discussion from the google user forum.  This is a link to a related 
> issue.
> -------------------------
> Update: there are some discussions on the PR itself which leads me to think 
> that a more general solution should be at the ClusterConnectionStates rather 
> than at the Streams or even Consumer level. One proposal would be:
>  * Add a new metric named `failedConnection` in SelectorMetrics which is 
> recorded at `connect()` and `pollSelectionKeys()` functions, upon capture the 
> IOException / RuntimeException which indicates the connection disconnected.
>  * And then users of Consumer / Streams can monitor on this metric, which 
> normally will only have close to zero values as we have transient 
> disconnects, if it is spiking it means the brokers are consistently being 
> unavailable indicting the state.
> [~Yohan123] WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to