On Wed, Dec 15, 2010 at 09:58:44AM +0100, Florian Haas wrote:
> # HG changeset patch
> # User Florian Haas <[email protected]>
> # Date 1292402996 -3600
> # Node ID 8459a4918ad86c93962b8b59dd14d380c1e48eed
> # Parent 2b64174a3b9c8404391d5de27b800edb25c1833a
> Medium: .ocf-shellfuncs: add ocf_test_pid convenience function
>
> Add an OCF-style function, ocf_test_pid(), to test for a running
> process by PID. This function employs the following logic:
>
> * Send the process a 0 signal.
> - If sending the signal succeeds:
> * Check whether /proc/$pid/status exists.
> - If it does:
> * Test whether the process status is Z (zombie, defunct).
> - If it is, return $OCF_ERR_GENERIC.
> - If it is not, return $OCF_SUCCESS.
> - If it does not (as on any non-Linux platform if I'm not
> mistaken), then just assume that the process is running, and
> return $OCF_SUCCESS.
> - If sending the signal succeeds:
DOES NOT ;-)
> * The process can be safely assumed to not exist. Return
> $OCF_NOT_RUNNING.
There may be an option to do this as a specific user id, so that would
also check that the pid is the expected user (databases, or other
daemons not running as root; it's just an other plausibility check,
basically).
You could even check that /proc/$pid/exe is what the caller expects,
(linux only as well, probably), but that may be too much.
> The name is deliberately ocf_test_pid not ocf_check_pid, to not create
> the false impression that this has anything to do with
> OCF_CHECK_LEVEL.
ocf_is_pid_running ?
> diff -r 2b64174a3b9c -r 8459a4918ad8 heartbeat/.ocf-shellfuncs.in
> --- a/heartbeat/.ocf-shellfuncs.in Tue Dec 14 15:22:39 2010 +0100
> +++ b/heartbeat/.ocf-shellfuncs.in Wed Dec 15 09:49:56 2010 +0100
> @@ -117,6 +117,42 @@
> esac
> }
>
> +# ocf_test_pid: test for the status of a process identified by ID, and
> +# return an OCF compliant status code.
> +#
> +# Given a process ID, try to send it a 0 signal. If that returns an
> +# error, we know that that PID is not in the process table, and the
> +# process is certainly not running. If kill does return successfully,
> +# then the process may still be a zombie, so test for that too. That
> +# test, however, is Linux specific -- so on platforms where
> +# /proc/$pid/status does not exist, just be happy with what kill says.
> +ocf_test_pid() {
> + local rc
> + local pid
> +
> + rc=$OCF_SUCCESS
> + pid=$1
> +
> + if kill -s 0 $pid 2>/dev/null; then
for uid!=0, that could become su ...
> + # Process exists in process table, check its status
> + # (Linux only)
> + if [ -r /proc/$pid/status ]; then
> + if grep -E "State:[[:space:]]+Z \(zombie\)"
> /proc/$pid/status; then
do you want to >/dev/null 2>&1 here?
> + ocf_log err "Process $pid is defunct"
> + rc=$OCF_ERR_GENERIC
else readlink /proc/$pid/exe == some arg the caller told he'd expect
> + fi
> + fi
> + else
> + ocf_log debug "Process $pid is dead"
rather: "no such process."
You cannot say if it ever ran,
it may not be dead, but just not born yet ;-)
> + rc=$OCF_NOT_RUNNING
> + fi
> +
> + if [ $rc -eq $OCF_SUCCESS ]; then
I think $rc = $OCF_SUCCESS is faster (in theory; because it does not
need to convert to integer representation), and it is just as correct.
But if you insist ...
... end of bikesheding.
> + ocf_log debug "Process $pid is currently running"
> + fi
> + return $rc
> +}
> +
> __ocf_set_defaults() {
> __OCF_ACTION="$1"
I fail to see the usefulnes.
The only additional thing over an opencoded kill (which is one line)
is the check for zombie status. So what. It buys nothing.
We don't have a zombie problem. I've not seen zombies for a long time,
unless I deliverately provoke them.
Daemons (here by defined as anything that reparents to init) will never
become zombie (unless init is dead, but then this code would have
difficulties to run as well).
Some daemons may have neglected children that go zombie...
but you won't detect those, you don't know their pid.
Once the parent dies, they are reaped anyways.
Writing pid files (or remebering a pid in any other way) for non-daemons
is useless, and should never be done by a resource agent.
And to really check that the process runs and _works_ as expected, you
need to do a real check anyways ("select now()", "wget | grep" ...).
So what I try to say is, for the single resource agent,
it does not make any difference at all whether it says
if ocf_is_pid_running $pid ; then do_real_monitor; fi
or
if kill -0 $pid ; then do_real_monitor; fi
and I like the latter better.
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/