[
https://issues.apache.org/jira/browse/SAMZA-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631360#comment-16631360
]
ASF GitHub Bot commented on SAMZA-1832:
---------------------------------------
GitHub user bharathkk opened a pull request:
https://github.com/apache/samza/pull/673
SAMZA-1832: Fix race condition between StreamProcessor and
SamzaContainerListener
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/bharathkk/samza samza-1832-alternative
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/samza/pull/673.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #673
----
commit 035b6f0211f25741f2b928fb2bccf4d1eb1c3605
Author: bharathkk <codin.martial@...>
Date: 2018-09-28T05:13:24Z
Fix race condition between SamzaContainerListener and StreamProcessor
----
> Race condition between SamzaContainerListener.onContainerFailure(t) and
> StreamProcessor.stop()
> ----------------------------------------------------------------------------------------------
>
> Key: SAMZA-1832
> URL: https://issues.apache.org/jira/browse/SAMZA-1832
> Project: Samza
> Issue Type: Bug
> Reporter: Yi Pan (Data Infrastructure)
> Assignee: Bharath Kumarasubramanian
> Priority: Major
> Fix For: 1.0
>
>
> There is a race condition between
> SamzaContainerListener.onContainerFailure(t) and StreamProcessor.stop() that
> may mistakenly return a successful stopping state even there is an exception
> happened in the SamzaContainer.
> The sequence of events are:
> Thread-1: SamzaContainer.run() -> exception happened -> status set to FAILED
> -> start shutdown sequence
> Thread-2: User called LocalApplicationRunner.kill() -> StreamProcessor.stop()
> -> stopSamzaContainer(): since container status is failed, skipped waiting ->
> jobCoordinator.stop() -> since callback
> SamzaContainerListener.onContainerFailure() is only called after the shutdown
> sequence, containerException is not set -> normal shutdown of StreamProcessor
> -> appStatus = SuccessfulFinish
> The issue here is when StreamProcessor.stop() calls stopSamzaContainer(), it
> needs to wait till the callback of
> SamzaContainerListener.onContainerFailure(t) finishes before making the
> decision that the container is stopped successfully/failed.
> Reproduce steps:
> - Write an StreamApplication that throws exception in process()
> - Using StreamApplicationIntegrationTestHarness to start the application by
> runApplication()
> - In the test, immediately call runner.kill(); runner.waitForFinish();
> runner.status()
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)