Github user chtyim commented on a diff in the pull request: https://github.com/apache/twill/pull/4#discussion_r76318343 --- Diff: twill-yarn/src/main/java/org/apache/twill/internal/appmaster/ApplicationMasterService.java --- @@ -268,20 +270,33 @@ public void acquired(List<? extends ProcessLauncher<YarnContainerInfo>> launcher @Override public void completed(List<YarnContainerStatus> completed) { for (YarnContainerStatus status : completed) { + handleCompleted(completed); ids.remove(status.getContainerId()); } } }; - runningContainers.stopAll(); - - // Poll for 5 seconds to wait for containers to stop. - int count = 0; - while (!ids.isEmpty() && count++ < 5) { - amClient.allocate(0.0f, handler); - TimeUnit.SECONDS.sleep(1); - } + // Handle heartbeats during shutdown because runningContainers.stopAll() waits until + // handleCompleted() is called for every stopped runnable + ExecutorService stopPoller = Executors.newSingleThreadExecutor(Threads.createDaemonThreadFactory("stopPoller")); + stopPoller.execute(new Runnable() { + @Override + public void run() { + while (!ids.isEmpty()) { + try { + amClient.allocate(0.0f, handler); + TimeUnit.SECONDS.sleep(1); --- End diff -- Should check if `ids` is already emptied before sleeping, since the call the `allocate` may already have the ids emptied by the handler and we don't have the sleep for an extra second for that.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---