[ 
https://issues.apache.org/jira/browse/AMBARI-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Lysnichenko updated AMBARI-8185:
---------------------------------------
    Attachment: AMBARI-8185.patch

> Services fail to start when pid file is empty
> ---------------------------------------------
>
>                 Key: AMBARI-8185
>                 URL: https://issues.apache.org/jira/browse/AMBARI-8185
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.6.1
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 2.0.0
>
>         Attachments: AMBARI-8185.patch
>
>
> Witnessed at a customer site:
> * Storm Supervisor server had a pid file at {{/var/run/storm/supervisor.pid}}
> * This file, while present, had no content
> * The stack file, {{service.py}} detects a running process using this call:
> {noformat}
>   no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps `cat {pid_file}` 
> >/dev/null 2>&1")
> {noformat}
> * When the file is empty, this command returns 0 (success), and the startup 
> command does not run.
> * Changed the command to
> {noformat}
>   no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps -p `cat 
> {pid_file}` >/dev/null 2>&1")
> {noformat}
> which returns properly that the process is not running and startup can 
> continue.
> The customer reports that they have seen this behavior with other services, 
> but could not reproduce on-site.  This pattern is used frequently through the 
> code base and should be addressed for all services including Storm.  
> Validation of this change is the critical task here since the change is 
> "small" - the effects are large in scope.
> Also, at ambari/ambari-agent/conf/unix/ambari-agent we have few invocations 
> of a similar code with another bug:
> {code}
>           PID=`cat $PIDFILE`
>           echo "Found $AMBARI_AGENT PID: $PID"
>           if [ -z "`ps ax -o pid | grep $PID`" ]; then
> {code}
> Here if $PID is for example 2111 and there is a running process with pid like 
> 22111, we will get a false positive (agent will refuse to start saying it is 
> already running).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to