Yi Pan (Data Infrastructure) created SAMZA-1832:
---------------------------------------------------

             Summary: Race condition between 
SamzaContainerListener.onContainerFailure(t) and StreamProcessor.stop()
                 Key: SAMZA-1832
                 URL: https://issues.apache.org/jira/browse/SAMZA-1832
             Project: Samza
          Issue Type: Bug
            Reporter: Yi Pan (Data Infrastructure)
             Fix For: 1.0


There is a race condition between SamzaContainerListener.onContainerFailure(t) 
and StreamProcessor.stop() that may mistakenly return a successful stopping 
state even there is an exception happened in the SamzaContainer.

The sequence of events are:
Thread-1: SamzaContainer.run() -> exception happened -> status set to FAILED -> 
start shutdown sequence
Thread-2: User called LocalApplicationRunner.kill() -> StreamProcessor.stop() 
-> stopSamzaContainer(): since container status is failed, skipped waiting -> 
jobCoordinator.stop() -> since callback 
SamzaContainerListener.onContainerFailure() is only called after the shutdown 
sequence, containerException is not set -> normal shutdown of StreamProcessor 
-> appStatus = SuccessfulFinish

The issue here is when StreamProcessor.stop() calls stopSamzaContainer(), it 
needs to wait till the callback of SamzaContainerListener.onContainerFailure(t) 
finishes before making the decision that the container is stopped 
successfully/failed.

Reproduce steps:
- Write an StreamApplication that throws exception in process()
- Using StreamApplicationIntegrationTestHarness to start the application by 
runApplication()
- In the test, immediately call runner.kill(); runner.waitForFinish(); 
runner.status()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to