Dmitry Lysnichenko created AMBARI-8185:
------------------------------------------

             Summary: Services fail to start when pid file is empty
                 Key: AMBARI-8185
                 URL: https://issues.apache.org/jira/browse/AMBARI-8185
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 1.6.1
            Reporter: Dmitry Lysnichenko
            Assignee: Dmitry Lysnichenko
             Fix For: 2.0.0


Witnessed at a customer site:
* Storm Supervisor server had a pid file at {{/var/run/storm/supervisor.pid}}
* This file, while present, had no content
* The stack file, {{service.py}} detects a running process using this call:
{noformat}
  no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps `cat {pid_file}` 
>/dev/null 2>&1")
{noformat}
* When the file is empty, this command returns 0 (success), and the startup 
command does not run.
* Changed the command to
{noformat}
  no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps -p `cat {pid_file}` 
>/dev/null 2>&1")
{noformat}
which returns properly that the process is not running and startup can continue.

The customer reports that they have seen this behavior with other services, but 
could not reproduce on-site.  This pattern is used frequently through the code 
base and should be addressed for all services including Storm.  Validation of 
this change is the critical task here since the change is "small" - the effects 
are large in scope.

Also, at ambari/ambari-agent/conf/unix/ambari-agent we have few invocations of 
a similar code with another bug:
{code}
          PID=`cat $PIDFILE`
          echo "Found $AMBARI_AGENT PID: $PID"
          if [ -z "`ps ax -o pid | grep $PID`" ]; then
{code}
Here if $PID is for example 2111 and there is a running process with pid like 
22111, we will get a false positive (agent will refuse to start saying it is 
already running).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to