[ 
https://issues.apache.org/jira/browse/KAFKA-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208492#comment-17208492
 ] 

Matthias J. Sax commented on KAFKA-10555:
-----------------------------------------

{quote}I thought we were only considering to transit to ERROR if the last 
thread died, but to transit to NOT_RUNNING if the last thread was removed by 
the user. This seems consistent with the current behavior and maintains the 
same semantic meaning of the ERROR state, imo.
{quote}
This would be the state after the KIP (without addressing this ticket).

Transiting to NOT_RUNNING might be an option, but it would also be a change to 
the state machine, as currently, NOT_RUNNING is a terminal state after the 
client was closed. This, it won't be possible to add new thread when in 
NOT_RUNNING state following the current proposal of the KIP.

However, I don't agree that it make sense to go ERROR state when the last 
thread dies _and_ to disallow adding new thread when in ERROR state. IMHO, 
there are two options:
 # go to ERROR state when any thread dies and disallow to add/remove threads 
for this case (as if a thread dies, something went wrong and we want to "lock" 
the client).
 # go to ERROR state only when the last thread dies, but allow to add new 
threads and thus allow to transit from ERROR back to RUNNING (via REBALANCING 
of course); for this case, ERROR means that we stopped processing due to an 
error; for this semantic interpretation of ERROR state, there is no reason to 
not allow adding new threads IMHO (in contrast to (1) for which we say, 
something bad happens we want to lock the client as we think it's unsafe to 
add/remove threads any longer).

I personally prefer (2) over (1), as I don't think that there is a good reason 
to lock down the client after a thread dies (also not, after the last thread 
died). Also note, even if we stay in RUNNING state with zero threads, it might 
be ok, as users can consult `localThreadMetadata` and/or the `num-thread-alive` 
metric to inspect if there are any running thread. Ie, stopping the last 
running thread via `removeThread()` could be the same as if the last thread 
just died.

> Improve client state machine
> ----------------------------
>
>                 Key: KAFKA-10555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Major
>              Labels: needs-kip
>
> The KafkaStreams client exposes its state to the user for monitoring purpose 
> (ie, RUNNING, REBALANCING etc). The state of the client depends on the 
> state(s) of the internal StreamThreads that have their own states.
> Furthermore, the client state has impact on what the user can do with the 
> client. For example, active task can only be queried in RUNNING state and 
> similar.
> With KIP-671 and KIP-663 we improved error handling capabilities and allow to 
> add/remove stream thread dynamically. We allow adding/removing threads only 
> in RUNNING and REBALANCING state. This puts us in a "weird" position, because 
> if we enter ERROR state (ie, if the last thread dies), we cannot add new 
> threads and longer. However, if we have multiple threads and one dies, we 
> don't enter ERROR state and do allow to recover the thread.
> Before the KIPs the definition of ERROR state was clear, however, with both 
> KIPs it seem that we should revisit its semantics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to