[
https://issues.apache.org/jira/browse/AMBARI-8185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Lysnichenko updated AMBARI-8185:
---------------------------------------
Attachment: AMBARI-8185.patch
> Services fail to start when pid file is empty
> ---------------------------------------------
>
> Key: AMBARI-8185
> URL: https://issues.apache.org/jira/browse/AMBARI-8185
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 1.6.1
> Reporter: Dmitry Lysnichenko
> Assignee: Dmitry Lysnichenko
> Fix For: 2.0.0
>
> Attachments: AMBARI-8185.patch
>
>
> Witnessed at a customer site:
> * Storm Supervisor server had a pid file at {{/var/run/storm/supervisor.pid}}
> * This file, while present, had no content
> * The stack file, {{service.py}} detects a running process using this call:
> {noformat}
> no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps `cat {pid_file}`
> >/dev/null 2>&1")
> {noformat}
> * When the file is empty, this command returns 0 (success), and the startup
> command does not run.
> * Changed the command to
> {noformat}
> no_op_test = format("ls {pid_file} >/dev/null 2>&1 && ps -p `cat
> {pid_file}` >/dev/null 2>&1")
> {noformat}
> which returns properly that the process is not running and startup can
> continue.
> The customer reports that they have seen this behavior with other services,
> but could not reproduce on-site. This pattern is used frequently through the
> code base and should be addressed for all services including Storm.
> Validation of this change is the critical task here since the change is
> "small" - the effects are large in scope.
> Also, at ambari/ambari-agent/conf/unix/ambari-agent we have few invocations
> of a similar code with another bug:
> {code}
> PID=`cat $PIDFILE`
> echo "Found $AMBARI_AGENT PID: $PID"
> if [ -z "`ps ax -o pid | grep $PID`" ]; then
> {code}
> Here if $PID is for example 2111 and there is a running process with pid like
> 22111, we will get a false positive (agent will refuse to start saying it is
> already running).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)