Dmitry Lysnichenko created AMBARI-8185:
------------------------------------------
Summary: Services fail to start when pid file is empty
Key: AMBARI-8185
URL: https://issues.apache.org/jira/browse/AMBARI-8185
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 1.6.1
Reporter: Dmitry Lysnichenko
Assignee: Dmitry Lysnichenko
Fix For: 2.0.0
Witnessed at a customer site:
* Storm Supervisor server had a pid file at {{/var/run/storm/supervisor.pid}}
* This file, while present, had no content
* The stack file, {{service.py}} detects a running process using this call:
{noformat}
no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps `cat {pid_file}`
>/dev/null 2>&1")
{noformat}
* When the file is empty, this command returns 0 (success), and the startup
command does not run.
* Changed the command to
{noformat}
no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps -p `cat {pid_file}`
>/dev/null 2>&1")
{noformat}
which returns properly that the process is not running and startup can continue.
The customer reports that they have seen this behavior with other services, but
could not reproduce on-site. This pattern is used frequently through the code
base and should be addressed for all services including Storm. Validation of
this change is the critical task here since the change is "small" - the effects
are large in scope.
Also, at ambari/ambari-agent/conf/unix/ambari-agent we have few invocations of
a similar code with another bug:
{code}
PID=`cat $PIDFILE`
echo "Found $AMBARI_AGENT PID: $PID"
if [ -z "`ps ax -o pid | grep $PID`" ]; then
{code}
Here if $PID is for example 2111 and there is a running process with pid like
22111, we will get a false positive (agent will refuse to start saying it is
already running).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)