On Mon, May 22, 2017 at 12:26:36PM -0500, Ken Gaillot wrote: > Resurrecting an old thread, because I stumbled on something relevant ...
/me too :-) > There had been some discussion about having the ability to run a more > useful monitor operation on an otherwise systemd-based resource. We had > talked about a couple approaches with advantages and disadvantages. > > I had completely forgotten about an older capability of pacemaker that > could be repurposed here: the (undocumented) "container" meta-attribute. Which is nice to know. The wrapper approach is appealing as well, though. I have just implemented a PoC ocf:pacemaker:systemd "wrapper" RA, to give my brain something different to do for a change. Takes two parameters, unit=(systemd unit), and monitor_hook=(some executable) The monitor_hook has access to the environment, obviously, in case it needs that. For monitor, it will only be called, if "systemctl is-active" thinks the thing is active. It is expected to return 0 (OCF_SUCCESS) for "running", and 7 (OCF_NOT_RUNNING) for "not running". It can return anything else, all exit codes are directly propagated for the "monitor" action. "Unexpected" exit codes will be logged with ocf_exit_reason (does that make sense?). systemctl start and stop commands apparently are "synchronous" (have always been? only nowadays? is that relevant?) but to be so, they need properly written unit files. If there is an ExecStop command defined which will only trigger stopping, but not wait for it, systemd cannot wait, either (it has no way to know what it should wait for in that case), and no-one should blame systemd for that. That's why you would need to fix such systemd units, but that's also why I added the additional _monitor loops after systemctl start / stop. Maybe it should not be named systemd, but systemd-wrapper. Other comments? Lars So here is my RFC, tested only "manually" via for x in monitor stop monitor start monitor ; do for try in 1 2; do OCF_ROOT=/usr/lib/ocf \ OCF_RESKEY_monitor_hook=/usr/local/bin/my-monitoring-hook \ OCF_RESKEY_unit=postfix@- ./systemd $x ; echo $try. $x $? done done ------ /usr/local/bin/my-monitoring-hook ---------------------------- #!/bin/sh echo quit | nc 127.0.0.1 25 2>/dev/null | grep -q ^220 || exit 7 ----- /usr/lib/ocf/resource.d/pacemaker/systemd --------------------- #!/bin/bash : ${OCF_FUNCTIONS=${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs} . ${OCF_FUNCTIONS} : ${__OCF_ACTION=$1} meta_data() { cat <<END <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> <resource-agent name="systemd" version="1.0"> <version>1.0</version> <longdesc lang="en"> This Resource Agent delegates start and stop to systemctl start and stop, but monitor will in addition to systemctl status also run the monitor_hook you specify. </longdesc> <shortdesc lang="en">systemd service with monitor hook</shortdesc> <parameters> <parameter name="unit" unique="0"> <longdesc lang="en"> What systemd unit to manage. </longdesc> <shortdesc lang="en">systemd unit</shortdesc> <content type="string" /> </parameter> <parameters> <parameter name="monitor_hook" unique="0"> <longdesc lang="en"> What executable to run in addition to systemctl status. </longdesc> <shortdesc lang="en">monitor hook</shortdesc> <content type="string" /> </parameter> </parameters> <actions> <action name="start" timeout="20" /> <action name="stop" timeout="20" /> <action name="monitor" timeout="20" interval="10" depth="0"/> <!-- <action name="reload" timeout="20" /> --> <action name="validate-all" timeout="20" /> <action name="meta-data" timeout="5" /> </actions> </resource-agent> END } _monitor() { local ex check if [[ -n "$OCF_RESKEY_monitor_hook" ]] && [[ -x "$OCF_RESKEY_monitor_hook" ]]; then "$OCF_RESKEY_monitor_hook" ex=$? : ==== $__OCF_ACTION/$ex ==== case $__OCF_ACTION/$ex in stop/7) : "not running after stop: expected" ;; stop/*) ocf_exit_reason "returned exit code $ex after stop: $OCF_RESKEY_monitor_hook" ;; start/0) : "running after start: expected";; start/*) ocf_exit_reason "returned exit code $ex after start: $OCF_RESKEY_monitor_hook" ;; monitor/0|monitor/7) : "expected running (0) or not running (7)" ;; monitor/*) ocf_exit_reason "returned exit code $ex during monitor: $OCF_RESKEY_monitor_hook" ;; esac return $ex else ocf_exit_reason "missing or not executable: $OCF_RESKEY_monitor_hook" fi return $OCF_ERR_GENERIC } case $__OCF_ACTION in meta-data) meta_data ;; validate-all) : "Tbd. Maybe." ;; stop) systemctl stop $OCF_RESKEY_unit || exit $OCF_ERR_GENERIC # TODO make time/retries of monitor after stop configurable while _monitor; do sleep 1; done exit $OCF_SUCCESS ;; start) systemctl start $OCF_RESKEY_unit || exit $OCF_ERR_GENERIC # TODO make time/retries of monitor after start configurable while ! _monitor; do sleep 1; done exit $OCF_SUCCESS ;; monitor) systemctl is-active --quiet $OCF_RESKEY_unit || exit $OCF_NOT_RUNNING _monitor ;; *) ocf_exit_reason "not implemented: $__OCF_ACTION" exit $OCF_ERR_GENERIC esac exit $? _______________________________________________ Developers mailing list Developers@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/developers