[ 
https://issues.apache.org/jira/browse/HADOOP-8353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273402#comment-13273402
 ] 

Hudson commented on HADOOP-8353:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #2231 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2231/])
    HADOOP-8353. hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop. 
Contributed by Roman Shaposhnik. (Revision 1337251)

     Result = SUCCESS
atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1337251
Files : 
* /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-daemon.sh
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mr-jobhistory-daemon.sh
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/bin/yarn-daemon.sh

                
> hadoop-daemon.sh and yarn-daemon.sh can be misleading on stop
> -------------------------------------------------------------
>
>                 Key: HADOOP-8353
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8353
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.23.1
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 2.0.0
>
>         Attachments: HADOOP-8353-2.patch.txt, HADOOP-8353.patch.txt
>
>
> The way that stop actions is implemented is a simple SIGTERM sent to the JVM. 
> There's a time delay between when the action is called and when the process 
> actually exists. This can be misleading to the callers of the *-daemon.sh 
> scripts since they expect stop action to return when process is actually 
> stopped.
> I suggest we augment the stop action with a time-delay check for the process 
> status and a SIGKILL once the delay has expired.
> I understand that sending SIGKILL is a measure of last resort and is 
> generally frowned upon among init.d script writers, but the excuse we have 
> for Hadoop is that it is engineered to be a fault tolerant system and thus 
> there's not danger of putting system into an incontinent state by a violent 
> SIGKILL. Of course, the time delay will be long enough to make SIGKILL event 
> a rare condition.
> Finally, there's always an option of an exponential back-off type of solution 
> if we decide that SIGKILL timeout is short.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to