[ 
https://issues.apache.org/jira/browse/TWILL-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437595#comment-15437595
 ] 

ASF GitHub Bot commented on TWILL-190:
--------------------------------------

Github user chtyim commented on a diff in the pull request:

    https://github.com/apache/twill/pull/4#discussion_r76317898
  
    --- Diff: 
twill-core/src/main/java/org/apache/twill/internal/TwillContainerLauncher.java 
---
    @@ -220,5 +250,31 @@ public ContainerLiveNodeData getLiveNodeData() {
         public void kill() {
           processController.cancel();
         }
    +
    +    private void killAndWait(int maxWaitSecs) {
    +      Stopwatch watch = new Stopwatch();
    +      watch.start();
    +      int tries = 0;
    +      while (watch.elapsedTime(TimeUnit.SECONDS) < maxWaitSecs) {
    +        // Kill the application
    +        try {
    +          ++tries;
    +          kill();
    +        } catch (Exception e) {
    +          LOG.error("Exception while killing runnable {}, instance {}", 
runnable, instanceId, e);
    +        }
    +
    +        // Wait on the shutdownLatch,
    +        // if the runnable has stopped then the latch will be count down 
by completed() method
    +        if (Uninterruptibles.awaitUninterruptibly(shutdownLatch, 10, 
TimeUnit.SECONDS)) {
    +          // Runnable has stopped now
    +          return;
    +        }
    +      }
    +
    +      // Timeout reached, runnable has not stopped
    +      LOG.error("Failed to kill runnable {}, instance {} after {} tries", 
runnable, instanceId, tries);
    --- End diff --
    
    Showing the number of tries is quite artificial since the retry is based on 
time. I think it's better to just say failed to kill after n seconds.


> Restart of a TwillRunnable does not wait for the runnable to stop
> -----------------------------------------------------------------
>
>                 Key: TWILL-190
>                 URL: https://issues.apache.org/jira/browse/TWILL-190
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: core, yarn
>    Affects Versions: 0.6.0-incubating, 0.7.0-incubating
>            Reporter: Poorna Chandra
>            Assignee: Poorna Chandra
>             Fix For: 0.8.0
>
>
> Today when a TwillRunnable is restarted, the call sends a stop message to the 
> TwillRunnable, and then starts new TwillRunnable without waiting for the 
> stopping runnable to finish stopping.
> This can leave a non-responding TwillRunnable container running, and can lead 
> to issues like two TwillRunnables with same instance id running at the same 
> time.
> We should kill the containers that don't respond to stop message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to