[ https://issues.apache.org/jira/browse/TWILL-190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437595#comment-15437595 ]
ASF GitHub Bot commented on TWILL-190: -------------------------------------- Github user chtyim commented on a diff in the pull request: https://github.com/apache/twill/pull/4#discussion_r76317898 --- Diff: twill-core/src/main/java/org/apache/twill/internal/TwillContainerLauncher.java --- @@ -220,5 +250,31 @@ public ContainerLiveNodeData getLiveNodeData() { public void kill() { processController.cancel(); } + + private void killAndWait(int maxWaitSecs) { + Stopwatch watch = new Stopwatch(); + watch.start(); + int tries = 0; + while (watch.elapsedTime(TimeUnit.SECONDS) < maxWaitSecs) { + // Kill the application + try { + ++tries; + kill(); + } catch (Exception e) { + LOG.error("Exception while killing runnable {}, instance {}", runnable, instanceId, e); + } + + // Wait on the shutdownLatch, + // if the runnable has stopped then the latch will be count down by completed() method + if (Uninterruptibles.awaitUninterruptibly(shutdownLatch, 10, TimeUnit.SECONDS)) { + // Runnable has stopped now + return; + } + } + + // Timeout reached, runnable has not stopped + LOG.error("Failed to kill runnable {}, instance {} after {} tries", runnable, instanceId, tries); --- End diff -- Showing the number of tries is quite artificial since the retry is based on time. I think it's better to just say failed to kill after n seconds. > Restart of a TwillRunnable does not wait for the runnable to stop > ----------------------------------------------------------------- > > Key: TWILL-190 > URL: https://issues.apache.org/jira/browse/TWILL-190 > Project: Apache Twill > Issue Type: Bug > Components: core, yarn > Affects Versions: 0.6.0-incubating, 0.7.0-incubating > Reporter: Poorna Chandra > Assignee: Poorna Chandra > Fix For: 0.8.0 > > > Today when a TwillRunnable is restarted, the call sends a stop message to the > TwillRunnable, and then starts new TwillRunnable without waiting for the > stopping runnable to finish stopping. > This can leave a non-responding TwillRunnable container running, and can lead > to issues like two TwillRunnables with same instance id running at the same > time. > We should kill the containers that don't respond to stop message. -- This message was sent by Atlassian JIRA (v6.3.4#6332)