Hi,

On Wed, Jun 02, 2010 at 12:08:51PM +0200, Lars Marowsky-Bree wrote:
> On 2010-06-01T00:37:24, Lars Ellenberg <[email protected]> wrote:
> 
> > Once we notice, what are we supposed to do?
> > Not do any action ourselve,
> > but tell pacemaker the resource has failed,
> > because that is where $policy lives?
> 
> Yes, that's it, I think.
> 
> Sorry for the outburst, I'm really overly ripe for a vacation.
> 
> >     $your_non_backgrounding_process_here
> >     # unless you decide based on "something"
> >     # that this was an expected exit:
> >     crm_resource -F
> > 
> > still too simple?
> 
> No, this would be ok, but I'd still like to see it integrated with lrmd.
> 
> Several resource agents may want functionality like that, and my trust
> in RA authors to implement it reliably - i.e., forking a background task
> to monitor the processes and terminating it as needed - is not
> particularly high, to be honest ;-)

Though the helper functions you wrote below are easy to use.

> See also the detail that some of the processes may actually fork and
> live in the background.
> 
> > Yes, it is not a watchdog character special device.
> > Sorry about that ;-)
> 
> I think I said I didn't like the in-kernel idea either ;-)
> 
> > You'd rather write a library to send xml ipc via from subscribing
> > $process (which first hast to be modified to actually use that ipc)
> > to lrmd, which then again has to "poll" (because its not the parent)?
> 
> No, I didn't say the first part; I said I'd like the RA to be able to
> inform lrmd and then have lrmd do the monitoring.
> 
> Additional IPC to lrmd is one option, but so is to print "monitor_pids:
> 1 3 4 5" to stderr/stdout from the RA to indicate that these should be
> monitored.
> 
> > Why?
> > If you can have it in two lines of wrapper code?
> 
> Because lrmd is already monitoring processes, it's already running, and
> it doesn't need a new process - also, lrmd would be the one to actually
> understand that it needs to cancel the running monitor and send the

The crmd should cancel the recurring monitor on receiving the
async fail message.

> "fail" notification to crmd etc. So it seemed to be a good fit, as
> opposed to every RA forking a new monitor thread (and then mismanaging
> that).

Agreed in principle. Though there are some implementation details
that needs to be clarified:

- how does the process let the lrmd know that it wants to be
  monitored (lest we modify all the other code out there); here
  one could extend the lrmd API to pass the list of pids for the
  given resource id

- how does the lrmd asynchronously monitor the processes (waitpid
  won't do, because those processes are not children of lrmd)

> It'd also make the list of processes to be generally queryable through
> lrmadmin, instead of needing new interfaces for each RA.
> 
> It's extremely cheap to do kill(pid,0) in C; a shell wrapper is more
> costly. (Probably there are also truly async interfaces available to C
> daemons that I'm not aware of

Me neither. AFAIK, only the kernel and the parent process learn
about the process leaving. Whether the Linux kernel exposes those
events in some way, I don't know. There is other stuff like dbus,
but that requires cooperation from the monitored process.

> waitpid() seems to require that the
> process be a child, which we cannot guarantee.)
> 
> So it actually seems to be the _cheapest_ and _easiest_ place to add
> this.
> 
> Yes, I could easily conceive a shell wrapper -
> 
> ocf_monitor_pid_child() {
>       local latency=$1 ; shift
>       local pids="$*"
>       local pid
>       local rc=0
> 
>       # Could also setup a SIGUSR1 trap here to exit the loop
>       # "cleanly" or to reload the list of pids?
>       while [ $rc = 0 ]; do
>               sleep $latency
>               for pid in $pids ; do
>                       if ! kill -s 0 $pid ; then
>                               rc=1
>                               break
>                       fi
>               done
>       done
>       if ! -e $HA_RSCTMP/${OCF_RESOURCE_INSTANCE}.clean_stop ; do
>               crm_resource -F
>       fi
> }
> 
> ocf_monitor_pid() {
>       ocf_monitor_pid_child $* >/dev/null 2>&1 </dev/null &
>       # save pid of monitoring child somewhere too
> }
> 
> ocf_monitor_pid_stop() {
>       # re-use saved pid to abort monitoring
>       touch $HA_RSCTMP/${OCF_RESOURCE_INSTANCE}.clean_stop
>       # needs to be called in stop path of RA
> }
> 
> (The code is obviously untested.)
> 
> but this is not as trivial as it appears; we then have additional
> processes for every resource instance - unless we extend one "simple"
> wrapper to be able to monitor several instances, and to also add tools
> to query it etc. 
> 
> At the end, we'd have almost the complexity needed to make lrmd proper
> handle it, but with more load overhead on the system.

The complexity is probably similar, but the code size would be
far bigger in lrmd (API extension, new timer, other processing).
I wouldn't underestimate that.

Finally, I'm not sure if all this is worth the effort. In my
experience, process hanging is more probably than process
crashing. Or process misbehaving in other ways.

Cheers,

Dejan

> So I appreciate the simplicity of implementing it in shell, but still
> deem lrmd to be the better place.
> 
> > Maybe you need to also trigger on "does no longer react",
> > instead of just "is no longer alive",
> > necessarily qualified with "... in a timely fashion",
> > further with "... to certain actions"
> > where "timely" and "certain actions" is policy, again.
> > 
> > Now you are back to calling monitor on $process periodically.
> 
> Sure, this what the periodic monitor ops are for - to clarify errors
> that are not as clear-cut. "Essential processes dieing" is separate.
> 
> > You can have "signal" be actual signal,
> > have a trap on SIGUSR1 and an alarm,
> > and have the child process kill USR1 to ppid periodically.
> > Or use a pipe, and expect ten 'A' per 15 seconds on average.
> > Or whatever.
> 
> That would be an application-level heartbeat, and way more complex then
> what is proposed here. The proposal is to improve a simple case.
> 
> > Explain youself better ;)
> 
> Trying to ;-)
> 
> 
> Regards,
>     Lars
> 
> -- 
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to