Yi Pan (Data Infrastructure) created SAMZA-1832:
---------------------------------------------------
Summary: Race condition between
SamzaContainerListener.onContainerFailure(t) and StreamProcessor.stop()
Key: SAMZA-1832
URL: https://issues.apache.org/jira/browse/SAMZA-1832
Project: Samza
Issue Type: Bug
Reporter: Yi Pan (Data Infrastructure)
Fix For: 1.0
There is a race condition between SamzaContainerListener.onContainerFailure(t)
and StreamProcessor.stop() that may mistakenly return a successful stopping
state even there is an exception happened in the SamzaContainer.
The sequence of events are:
Thread-1: SamzaContainer.run() -> exception happened -> status set to FAILED ->
start shutdown sequence
Thread-2: User called LocalApplicationRunner.kill() -> StreamProcessor.stop()
-> stopSamzaContainer(): since container status is failed, skipped waiting ->
jobCoordinator.stop() -> since callback
SamzaContainerListener.onContainerFailure() is only called after the shutdown
sequence, containerException is not set -> normal shutdown of StreamProcessor
-> appStatus = SuccessfulFinish
The issue here is when StreamProcessor.stop() calls stopSamzaContainer(), it
needs to wait till the callback of SamzaContainerListener.onContainerFailure(t)
finishes before making the decision that the container is stopped
successfully/failed.
Reproduce steps:
- Write an StreamApplication that throws exception in process()
- Using StreamApplicationIntegrationTestHarness to start the application by
runApplication()
- In the test, immediately call runner.kill(); runner.waitForFinish();
runner.status()
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)