On Mon, May 31, 2010 at 10:48:02PM +0200, Lars Marowsky-Bree wrote:
...
> You're embedding policy into code here. Who said anything about the
> proper response being the daemon being restarted being the right
> response? Maybe the whole point is to initiate full PE-level recovery?
> 
> > Of course you can get fancy, background the daemon, then wait,
> > have a trap on sigterm, put in some ulimits, ...
> > 
> > Write the "respawn_everything" RA.
> 
> BUT THAT ISN"T THE FUCKING POINT.
> 
> The point is to make _monitor_ better. Not replace the whole rest of the
> solution. Argh.

We are talking about some process being dead unexpectedly,
and how to notice that asap?
Well, the parent of that thing will notice "immediately".
(Process accounting will broadcast it as well,
 so if you want lrmd to subscribe to that...)

Once we notice, what are we supposed to do?
Not do any action ourselve,
but tell pacemaker the resource has failed,
because that is where $policy lives?

Am I still missing the point?

I hope one (shell) process wrapper won't be too much,
if all it does is a waitpid?

        $your_non_backgrounding_process_here
        # unless you decide based on "something"
        # that this was an expected exit:
        crm_resource -F

still too simple?

Yes, it is not a watchdog character special device.
Sorry about that ;-)

You'd rather write a library to send xml ipc via from subscribing
$process (which first hast to be modified to actually use that ipc)
to lrmd, which then again has to "poll" (because its not the parent)?
Why?
If you can have it in two lines of wrapper code?


Maybe you need to also trigger on "does no longer react",
instead of just "is no longer alive",
necessarily qualified with "... in a timely fashion",
further with "... to certain actions"
where "timely" and "certain actions" is policy, again.

Now you are back to calling monitor on $process periodically.
Or you need the cooperation of $process, which means
you have to modify $process to emit "signals"
and you can trigger on "no signal received in x seconds".

You can have "signal" be actual signal,
have a trap on SIGUSR1 and an alarm,
and have the child process kill USR1 to ppid periodically.
Or use a pipe, and expect ten 'A' per 15 seconds on average.
Or whatever.

I'm still missing the point?
Explain youself better ;)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to