[ https://issues.apache.org/jira/browse/KAFKA-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208492#comment-17208492 ]
Matthias J. Sax commented on KAFKA-10555: ----------------------------------------- {quote}I thought we were only considering to transit to ERROR if the last thread died, but to transit to NOT_RUNNING if the last thread was removed by the user. This seems consistent with the current behavior and maintains the same semantic meaning of the ERROR state, imo. {quote} This would be the state after the KIP (without addressing this ticket). Transiting to NOT_RUNNING might be an option, but it would also be a change to the state machine, as currently, NOT_RUNNING is a terminal state after the client was closed. This, it won't be possible to add new thread when in NOT_RUNNING state following the current proposal of the KIP. However, I don't agree that it make sense to go ERROR state when the last thread dies _and_ to disallow adding new thread when in ERROR state. IMHO, there are two options: # go to ERROR state when any thread dies and disallow to add/remove threads for this case (as if a thread dies, something went wrong and we want to "lock" the client). # go to ERROR state only when the last thread dies, but allow to add new threads and thus allow to transit from ERROR back to RUNNING (via REBALANCING of course); for this case, ERROR means that we stopped processing due to an error; for this semantic interpretation of ERROR state, there is no reason to not allow adding new threads IMHO (in contrast to (1) for which we say, something bad happens we want to lock the client as we think it's unsafe to add/remove threads any longer). I personally prefer (2) over (1), as I don't think that there is a good reason to lock down the client after a thread dies (also not, after the last thread died). Also note, even if we stay in RUNNING state with zero threads, it might be ok, as users can consult `localThreadMetadata` and/or the `num-thread-alive` metric to inspect if there are any running thread. Ie, stopping the last running thread via `removeThread()` could be the same as if the last thread just died. > Improve client state machine > ---------------------------- > > Key: KAFKA-10555 > URL: https://issues.apache.org/jira/browse/KAFKA-10555 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Priority: Major > Labels: needs-kip > > The KafkaStreams client exposes its state to the user for monitoring purpose > (ie, RUNNING, REBALANCING etc). The state of the client depends on the > state(s) of the internal StreamThreads that have their own states. > Furthermore, the client state has impact on what the user can do with the > client. For example, active task can only be queried in RUNNING state and > similar. > With KIP-671 and KIP-663 we improved error handling capabilities and allow to > add/remove stream thread dynamically. We allow adding/removing threads only > in RUNNING and REBALANCING state. This puts us in a "weird" position, because > if we enter ERROR state (ie, if the last thread dies), we cannot add new > threads and longer. However, if we have multiple threads and one dies, we > don't enter ERROR state and do allow to recover the thread. > Before the KIPs the definition of ERROR state was clear, however, with both > KIPs it seem that we should revisit its semantics. -- This message was sent by Atlassian Jira (v8.3.4#803005)