[
https://issues.apache.org/jira/browse/TWILL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647154#comment-14647154
]
ASF GitHub Bot commented on TWILL-145:
--------------------------------------
GitHub user hsaputra opened a pull request:
https://github.com/apache/incubator-twill/pull/58
[TWILL-145] Potential race condition when restart all is called for a
TwillRunnable
If restart all instances is requested for a TwillRunnable then there could
be race condition to check
provisioned and container requests that could exit the TwillApplication.
This PR containes changes:
-) Change the container requests to be ConcurrentLinkedQueue since it is
accessed by multiple threads.
-) Add new volatile flag in RunnableContainerRequest to indicate whether it
is ready to be provisioned.
-) Move up adding container requests for restart before removing.
-) Remove execution of restart to thread in the add instances executor.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hsaputra/incubator-twill
TWILL-145_race_condition_all_restarts
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-twill/pull/58.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #58
----
commit e8fc957e2a60c6e42e851cd69a24bbaca736f465
Author: hsaputra <[email protected]>
Date: 2015-07-29T23:34:12Z
[TWILL-145] Potential race condition when restart all is called for a Twill
runnable.
If restart all instances is requested for a TwillRunnable then there could
be race condition to check
provisioned and container requests that could exit the TwillApplication.
This PR containes changes:
-) Change the container requests to be ConcurrentLinkedQueue since it is
accessed by multiple threads.
-) Add new volatile flag in RunnableContainerRequest to indicate whether it
is ready to be provisioned.
-) Move up adding container requests for restart before removing.
-) Remove execution of restart to thread in the add instances executor.
----
> Potential race condition when restart all is called for a Twill runnable
> ------------------------------------------------------------------------
>
> Key: TWILL-145
> URL: https://issues.apache.org/jira/browse/TWILL-145
> Project: Apache Twill
> Issue Type: Bug
> Components: yarn
> Affects Versions: 0.6.0-incubating
> Reporter: Henry Saputra
> Assignee: Henry Saputra
>
> Found this issue from careful eyes of [~chtyim]
> When sending restart instance to all for a particular TwillRunnable, it could
> have race condition where the heartbeat thread run right after all containers
> have been released which make the check:
> {code}
> // Looks for containers requests.
> if (provisioning.isEmpty() && runnableContainerRequests.isEmpty() &&
> runningContainers.isEmpty()) {
> LOG.info("All containers completed. Shutting down application
> master.");
> break;
> }
> {code}
> This could happen when all running containers are empty and new
> runnableContainerRequests has not been added.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)