[ 
https://issues.apache.org/jira/browse/KAFKA-8620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877440#comment-16877440
 ] 

Boyang Chen commented on KAFKA-8620:
------------------------------------

In callback #StreamThread.onPartitionsAssigned we will do a state transition to 
PARTITION_ASSIGNED. If KafkaStreams instance invokes the shutdown hook, our 
state transits to PENDING_SHUTDOWN and the current state machine will return 
null to indicate a transition failure which causes the code path proceeds 
without triggering the taskManager.createNewTasks() call. The consequence was 
that we do assigned topic partitions for the thread but the taskManager 
contains empty active task map, so when we would receive non-empty records and 
call subsequent addRecordsToTask in #StreamThread.runOnce we will throw a NPE 
for being unable to find the correct tasks.

The proper fixes involve:
 # Debug log on the state transition error

 # State check after we exit from pollRequest in runOnce. If we are in pending 
shutdown already, we shouldn’t proceed

> Fix potential race condition in StreamThread state change
> ---------------------------------------------------------
>
>                 Key: KAFKA-8620
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8620
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Boyang Chen
>            Assignee: Boyang Chen
>            Priority: Major
>
> In the call to `StreamThread.addRecordsToTasks` we don't have synchronization 
> when we attempt to extract active tasks. If after one long poll in runOnce 
> the application state changes to PENDING_SHUTDOWN, there is a potential close 
> on TaskManager which erases the active tasks map, thus triggering NPE and 
> bringing the thread state to a false shutdown.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to