[ https://issues.apache.org/jira/browse/HADOOP-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273398#comment-13273398 ]
Hudson commented on HADOOP-8353: -------------------------------- Integrated in Hadoop-Hdfs-trunk-Commit #2305 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2305/]) HADOOP-8353. hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. Contributed by Roman Shaposhnik. (Revision 1337251) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh * /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh > hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop > ------------------------------------------------------------- > > Key: HADOOP-8353 > URL: https://issues.apache.org/jira/browse/HADOOP-8353 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts > Affects Versions: 0.23.1 > Reporter: Roman Shaposhnik > Assignee: Roman Shaposhnik > Fix For: 2.0.0 > > Attachments: HADOOP-8353-2.patch.txt, HADOOP-8353.patch.txt > > > The way that stop actions is implemented is a simple SIGTERM sent to the JVM. > There's a time delay between when the action is called and when the process > actually exists. This can be misleading to the callers of the *-daemon.sh > scripts since they expect stop action to return when process is actually > stopped. > I suggest we augment the stop action with a time-delay check for the process > status and a SIGKILL once the delay has expired. > I understand that sending SIGKILL is a measure of last resort and is > generally frowned upon among init.d script writers, but the excuse we have > for Hadoop is that it is engineered to be a fault tolerant system and thus > there's not danger of putting system into an incontinent state by a violent > SIGKILL. Of course, the time delay will be long enough to make SIGKILL event > a rare condition. > Finally, there's always an option of an exponential back-off type of solution > if we decide that SIGKILL timeout is short. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira