[ 
https://issues.apache.org/jira/browse/SAMZA-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490404#comment-16490404
 ] 

Yi Pan (Data Infrastructure) commented on SAMZA-1665:
-----------------------------------------------------

>From the above, there are several issues:
# StreamProcessor.stop() has two parts: async setting shutdown flag to shutdown 
container and actual shutdown the JC. The full stop of StreamProcessor has to 
invoke this stop() method twice, potentially in different thread context, and 
there is no synchronous state variable to guard the full SP.stop() sequence. 
This state transition in multi-thread environment is broken. And since 
StreamProcessor has the handle to container and JC, stopContainer() and 
stopJC() probably should be two methods in StreamProcessor, instead of using an 
if-else-then branching the two exclusive code paths in a single stop() method.
# There is no state variable to guard the half-way through StreamProcess.stop() 
sequence, which is the root cause why Debounce timer thread is still actively 
reacting to job model changes while the whole StreamProcessor shutdown sequence 
has been initiated.
# container.pause() is not a cleanly designed state, since it still shutdown 
the container. It is more like a state for the StreamProcessor to indicate that 
the StreamProcessor is in-between re-spawning the container and keep the JC 
alive. Then, if there is a single state variable for StreamProcessor to 
indicate whether the StreamProcessor is already in shutdown state or are still 
in the state to re-starting the container, the design of the state transition 
would be much easier.

> Fix race condition when stopping StreamProcessor.
> -------------------------------------------------
>
>                 Key: SAMZA-1665
>                 URL: https://issues.apache.org/jira/browse/SAMZA-1665
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Shanthoosh Venkataraman
>            Assignee: Shanthoosh Venkataraman
>            Priority: Major
>
> When the user thread stops the streamProcessor and onNewJobModelExpired event 
> is executed from debounce thread at the same time, we observe the following 
> issue:
> A. User thread stops the StreamProcessor which in turn stops the current 
> running container.
> B. Before ZkJobCoordinator is shutdown, onNewJobModelExpired is executed from 
> debounce thread(which spawns a new container). 
> C. User thread stops the ZkJobCoordinator. 
> After the StreamProcessor shutdown sequence returns,  container thread is 
> alive and zkJobCoordinator is dead. This results in a orphaned container when 
> stopping stream processor.  We were able reproduce this locally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to