Bruno Cadonna created KAFKA-17489: ------------------------------------- Summary: IllegalStateException if failed task is removed from state updater Key: KAFKA-17489 URL: https://issues.apache.org/jira/browse/KAFKA-17489 Project: Kafka Issue Type: Task Components: streams Reporter: Bruno Cadonna Assignee: Bruno Cadonna Fix For: 3.9.0
If a task that is managed by the state updater fails (e.g. {{OffsetOutOfRangeException}}) and this same task is removed from the state updater, the task is regarded as corrupted and put into the task registry waiting for handling. Now there are multiple ways this leads to an {{IllegalStateException}}: 1. In {{handleAssignment()}} the tasks in the state updater are handled before the tasks in the task registry. It could happen that a failed standby task is removed from the state updater and is put in the task registry. When the tasks in the task registry are handled, the standby task is identified. However, with the state updater it is illegal to have standby tasks in the task regsitry. The following {{IllegalStateException}} is thrown: {code:java} java.lang.IllegalStateException: Standby tasks should only be managed by the state updater, but standby task 1_0 is managed by the stream thread {code} 2. If a failed active task is removed from the state updater during handle revocation ({{onPartitionRevoked()}} call in the {{ConsumerCoordinator}}), the exception of the failed task is not immediately thrown by the {{ConsumerCoordinator#onJoinComplete()}} method. The exception is stored and {{onAssignment}} is called. Additionally, the failed task is put into the task registry for later handling. Method {{onAssignment}} calls the {{handleAssignment()}} which as above handles the tasks in the task registry. Here two {{IllegalStateException}} are thrown: {code:java} java.lang.IllegalStateException: Illegal state RESTORING while recycling active task 2_1 {code} (This exception may differ according to the handling, e.g., recycling or re-assigning) and {code:java} java.lang.IllegalStateException: Task unknown: 2_1 {code} The latter occurs because the failed task is handled and remove from the task regsistry in {{handleAssignment()}} although it should stay there until the original exception is handled. -- This message was sent by Atlassian Jira (v8.20.10#820010)