[ 
https://issues.apache.org/jira/browse/TWILL-145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647154#comment-14647154
 ] 

ASF GitHub Bot commented on TWILL-145:
--------------------------------------

GitHub user hsaputra opened a pull request:

    https://github.com/apache/incubator-twill/pull/58

    [TWILL-145] Potential race condition when restart all is called for a 
TwillRunnable

    If restart all instances is requested for a TwillRunnable then there could 
be race condition to check
    provisioned and container requests that could exit the TwillApplication.
    
    This PR containes changes:
    -) Change the container requests to be ConcurrentLinkedQueue since it is 
accessed by multiple threads.
    -) Add new volatile flag in RunnableContainerRequest to indicate whether it 
is ready to be provisioned.
    -) Move up adding container requests for restart before removing.
    -) Remove execution of restart to thread in the add instances executor.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hsaputra/incubator-twill 
TWILL-145_race_condition_all_restarts

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-twill/pull/58.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #58
    
----
commit e8fc957e2a60c6e42e851cd69a24bbaca736f465
Author: hsaputra <[email protected]>
Date:   2015-07-29T23:34:12Z

    [TWILL-145] Potential race condition when restart all is called for a Twill 
runnable.
    
    If restart all instances is requested for a TwillRunnable then there could 
be race condition to check
    provisioned and container requests that could exit the TwillApplication.
    
    This PR containes changes:
    -) Change the container requests to be ConcurrentLinkedQueue since it is 
accessed by multiple threads.
    -) Add new volatile flag in RunnableContainerRequest to indicate whether it 
is ready to be provisioned.
    -) Move up adding container requests for restart before removing.
    -) Remove execution of restart to thread in the add instances executor.

----


> Potential race condition when restart all is called for a Twill runnable
> ------------------------------------------------------------------------
>
>                 Key: TWILL-145
>                 URL: https://issues.apache.org/jira/browse/TWILL-145
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 0.6.0-incubating
>            Reporter: Henry Saputra
>            Assignee: Henry Saputra
>
> Found this issue from careful eyes of [~chtyim]
> When sending restart instance to all for a particular TwillRunnable, it could 
> have race condition where the heartbeat thread run right after all containers 
> have been released which make the check:
> {code}
>      // Looks for containers requests.
>       if (provisioning.isEmpty() && runnableContainerRequests.isEmpty() && 
> runningContainers.isEmpty()) {
>         LOG.info("All containers completed. Shutting down application 
> master.");
>         break;
>       }
> {code}
> This could happen when all running containers are empty and new 
> runnableContainerRequests has not been added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to