[ https://issues.apache.org/jira/browse/KAFKA-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205740#comment-17205740 ]
Sophie Blee-Goldman commented on KAFKA-10555: --------------------------------------------- I was also thinking we should not transit to ERROR if, for example, the user requests an application shutdown in the new exception handler. I would consider this to be a graceful shutdown and transit to NOT_RUNNING – unless of course an error occurs during the graceful shutdown. Then we should transit to ERROR. But if we transit to ERROR no matter what, then the state machine will not differentiate between a successful graceful shutdown and an actual error occurring. Similarly, I think we should transit to NOT_RUNNING if the user chooses the SHUTDOWN_KAFKA_STREAMS_CLIENT option in the exception handler: this is also a graceful shutdown, equivalent to the user calling KafkaStreams#shutdown. But I'd be happy to include both options in the new Streams exception handler and allow users to choose which terminal state to end up in. I can see how a user may want to decide between ERROR and NOT_RUNNING based on the specific exception thrown. That said, I'm much less concerned about the "weird position" where a dying thread on a multithreaded app can be replaced whereas a dying thread on single-threaded app cannot. For one thing, we plan to have a "REPLACE_STREAM_THREAD" enum in the new Streams uncaught exception handler – presumably, this would be implemented such that if the only thread dies, a new thread will be started up to replace it before transiting to ERROR. But if you allow the thread to die without choosing to start up a new thread, and the dead thread was your last one, then transiting to ERROR seems totally appropriate imo. You had the chance to start up a new thread and didn't take it. It also seems useful to me to retain the ERROR state as a way to notify users of the death of the final thread. Whether this was the fifth thread out of 5, or the first thread out of 1, seems besides the point. cc [~mjsax] [~wcarlson5] [~cadonna] > Improve client state machine > ---------------------------- > > Key: KAFKA-10555 > URL: https://issues.apache.org/jira/browse/KAFKA-10555 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Priority: Major > Labels: needs-kip > > The KafkaStreams client exposes its state to the user for monitoring purpose > (ie, RUNNING, REBALANCING etc). The state of the client depends on the > state(s) of the internal StreamThreads that have their own states. > Furthermore, the client state has impact on what the user can do with the > client. For example, active task can only be queried in RUNNING state and > similar. > With KIP-671 and KIP-663 we improved error handling capabilities and allow to > add/remove stream thread dynamically. We allow adding/removing threads only > in RUNNING and REBALANCING state. This puts us in a "weird" position, because > if we enter ERROR state (ie, if the last thread dies), we cannot add new > threads and longer. However, if we have multiple threads and one dies, we > don't enter ERROR state and do allow to recover the thread. > Before the KIPs the definition of ERROR state was clear, however, with both > KIPs it seem that we should revisit its semantics. -- This message was sent by Atlassian Jira (v8.3.4#803005)