On Thu, Nov 04, 2010 at 03:12:06PM +0900, [email protected] wrote:
> Hi All,
>
> We discovered a phenomenon to fail in monitor processing from the delay of
> the fuser command of pgsql.
>
> When the output to the disk is frequent, the case which is behind with a
> fuser command occurs.
> * When we performed the output to the mountpoint of NFS in large quantities
> in our environment, it
> occurred.
>
> The fuser command searches all entries in a proc directory.
> On this account a delay occurs when we output large quantities.
>
> We made the patch which referred to a proc directory directly without using
> the fuser command.
>
> This patch works in the output of a large quantity of disks for light
> movement in comparison with the
> fuser command definitely.
>
> Please confirm a patch.
> And please apply this patch to developer-version.
>
> Best Regards,
> Hideo Yamauchi.
> diff -r d76ec18cc1e7 heartbeat/pgsql
> --- a/heartbeat/pgsql Thu Nov 04 11:18:52 2010 +0900
> +++ b/heartbeat/pgsql Thu Nov 04 11:33:32 2010 +0900
> @@ -441,7 +441,7 @@
> if [ -f $PIDFILE ]
> then
> PID=`head -n 1 $PIDFILE`
> - kill -s 0 $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1 |
> grep $PID >/dev/null 2>&1
> + kill -s 0 $PID >/dev/null 2>&1 && head -n 1 /proc/${PID}/cmdline
> 2>&1 | grep postgres >/dev/null 2>&1
> return $?
> fi
NACK.
But, good point.
Nack, because cmdline will only "by accident" contain "postgres",
the binary may be called postmaster (possibly symlinked), or anything else,
and the path may or may not contain postgres.
Good point because the first check was broken anyways (it did not grep
for word boundaries around $PID), and scanning all proc and possibly nfs
just to plausibility check wether an already known $PID was not yet
recycled but would still be what we suppose it to be, well, that's overkill.
How about
test "$(readlink /proc/$PID/cwd)" = "$OCF_RESKEY_pgdata"
(no need for the kill -s 0 before that, either).
Not sure how "portable" readlink of /proc/$PID/cwd may be, though,
but I'm pretty sure that this is what the fuser triggered on.
so, would that work for you as well?
diff -r d76ec18cc1e7 heartbeat/pgsql
--- a/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -441,7 +441,7 @@
if [ -f $PIDFILE ]
then
PID=`head -n 1 $PIDFILE`
- kill -s 0 $PID >/dev/null 2>&1 && fuser $OCF_RESKEY_pgdata 2>&1 |
grep $PID >/dev/null 2>&1
+ test "$(readlink /proc/$PID/cwd)" = "$OCF_RESKEY_pgdata"
return $?
fi
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/