[ 
https://issues.apache.org/jira/browse/KAFKA-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115097#comment-17115097
 ] 

Vince Mu commented on KAFKA-6520:
---------------------------------

[~mjsax] your explanation about coordinator metadata and liveness makes perfect 
sense. Thanks for that.

Regarding the implementation of a disconnected timeout, I'm not sure whether 
introducing a disconnected timeout and measuring the timeout of each fetch 
request is necessary. 
It seems like the ConsumerNetworkClient and NetworkClient will already fail an 
unsent or  transmitted fetch request with a disconnected exception it's 
connection to a node dies. So instead of throwing a disconnected exception 
based on whether all fetch requests timeout, we could instead throw a 
disconnected exception if all the fetch requests fail with a disconnected 
exception. I feel like this might be a simpler solution that uses what is 
already there. Thoughts on this?

Please feel free to correct me. I'm still learning the code base bit by bit. 

> When a Kafka Stream can't communicate with the server, it's Status stays 
> RUNNING
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-6520
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6520
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Michael Kohout
>            Priority: Major
>              Labels: newbie, user-experience
>
> KIP WIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams]
> When you execute the following scenario the application is always in RUNNING 
> state
>   
>  1)start kafka
>  2)start app, app connects to kafka and starts processing
>  3)kill kafka(stop docker container)
>  4)the application doesn't give any indication that it's no longer 
> connected(Stream State is still RUNNING, and the uncaught exception handler 
> isn't invoked)
>   
>   
>  It would be useful if the Stream State had a DISCONNECTED status.
>   
>  See 
> [this|https://groups.google.com/forum/#!topic/confluent-platform/nQh2ohgdrIQ] 
> for a discussion from the google user forum.  This is a link to a related 
> issue.
> -------------------------
> Update: there are some discussions on the PR itself which leads me to think 
> that a more general solution should be at the ClusterConnectionStates rather 
> than at the Streams or even Consumer level. One proposal would be:
>  * Add a new metric named `failedConnection` in SelectorMetrics which is 
> recorded at `connect()` and `pollSelectionKeys()` functions, upon capture the 
> IOException / RuntimeException which indicates the connection disconnected.
>  * And then users of Consumer / Streams can monitor on this metric, which 
> normally will only have close to zero values as we have transient 
> disconnects, if it is spiking it means the brokers are consistently being 
> unavailable indicting the state.
> [~Yohan123] WDYT?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to