Bruno Cadonna created KAFKA-10015:
-------------------------------------
Summary: React to Unexpected Errors on Stream Threads
Key: KAFKA-10015
URL: https://issues.apache.org/jira/browse/KAFKA-10015
Project: Kafka
Issue Type: Improvement
Components: streams
Reporter: Bruno Cadonna
Currently, if an unexpected error occurs on a stream thread, the stream thread
dies, a rebalance is triggered, and the Streams' client continues to run with
less stream threads.
Some errors trigger a cascading of stream thread death, i.e., after the
rebalance that resulted from the death of the first thread the next thread
dies, then a rebalance is triggered, the next thread dies, and so forth until
all stream threads are dead and the instance shuts down. Such a chain of
rebalances could be avoided if an error could be recognized as the cause of
cascading stream deaths and as a consequence the Streams' client could be shut
down after the first stream thread death.
On the other hand, some unexpected errors are transient and the stream thread
could safely be restarted without causing further errors and without the need
to restart the Streams' client.
The goal of this ticket is to classify errors and to automatically react to the
errors in a way to avoid cascading deaths and to recover stream threads if
possible.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)