On Thu, Sep 1, 2011 at 6:28 AM, Dejan Muhamedagic <[email protected]> wrote:

> Hi Serge,
>
> On Tue, Jul 12, 2011 at 03:50:36PM -0600, Serge Dubrouski wrote:
> > Hello -
> >
> > I've created an OCF RA for named (BIND) server. There is an existing one
> in
> > redhat directory but I don't like how it does monitoring and I doubt that
> it
> > can work with pacemaker. So please review the attached RA and see if it
> can
> > be included into the project.
>
> Sorry for the delay. The RA looks quite good, some comments
> below.
>
> Cheers,
>
> Dejan
>
> > #!/bin/sh
> > #
> > # Description:  Manages a named (Bind) server as an OCF High-Availability
> > #               resource
> > #
> > # Authors:      Serge Dubrouski ([email protected])
> > #
> > # Copyright:    2011 Serge Dubrouski <[email protected]>
> > #
> > # License:      GNU General Public License (GPL)
> > #
> >
> ###############################################################################
> > # Initialization:
> >
> > : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
> > . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
> >
> > # Used binaries
> > RNDC="/usr/sbin/rndc"
> > HOST="/usr/bin/host"
> > PIDOF="/sbin/pidof"
>
> How about relying on PATH? The RA should have a sane environment
> and these are all standard locations. Packagers may have
> different ideas which may lead to problems.
>

Done.


>
> > #Defaults
> > OCF_RESKEY_named_default="/usr/sbin/named"
> > OCF_RESKEY_named_user_default=named
> > OCF_RESKEY_named_config_default="/etc/named.conf"
> > OCF_RESKEY_named_pidfile_default="/var/run/named/named.pid"
> > OCF_RESKEY_named_rootdir_default=""
> > OCF_RESKEY_named_options_default=""
> > OCF_RESKEY_named_keytab_file_default=""
> > OCF_RESKEY_named_stop_timeout_default=25
> > OCF_RESKEY_monitor_request_default="localhost"
> > OCF_RESKEY_monitor_response_default="127.0.0.1"
> > OCF_RESKEY_monitor_ip_default="127.0.0.1"
> >
> > : ${OCF_RESKEY_named=${OCF_RESKEY_named_default}}
> > : ${OCF_RESKEY_named_user=${OCF_RESKEY_named_user_default}}
> > : ${OCF_RESKEY_named_config=${OCF_RESKEY_named_config_default}}
> > : ${OCF_RESKEY_named_pidfile=${OCF_RESKEY_named_pidfile_default}}
> > : ${OCF_RESKEY_named_rootdir=${OCF_RESKEY_named_rootdir_default}}
> > : ${OCF_RESKEY_named_options=${OCF_RESKEY_named_options_default}}
> > : ${OCF_RESKEY_named_keytab_file=${OCF_RESKEY_named_keytab_file_default}}
> > :
> ${OCF_RESKEY_named_stop_timeout=${OCF_RESKEY_named_stop_timeout_default}}
> > : ${OCF_RESKEY_monitor_request=${OCF_RESKEY_monitor_request_default}}
> > : ${OCF_RESKEY_monitor_response=${OCF_RESKEY_monitor_response_default}}
> > : ${OCF_RESKEY_monitor_ip=${OCF_RESKEY_monitor_ip_default}}
> >
> > usage() {
> >     cat <<EOF
> >         usage: $0
> start|stop|reload|status|monitor|meta-data|validate-all|methods
> >
> >         $0 manages named (Bind) server as an HA resource.
> >
> >         The 'start' operation starts named server.
> >         The 'stop' operation stops  named server.
> >         The 'reload' operation reload named configuration.
> >         The 'status' operation reports whether named is up.
> >         The 'monitor' operation reports whether named is running.
> >         The 'validate-all' operation reports whether parameters are
> valid.
> >         The 'methods' operation reports on the methods $0 supports.
> > EOF
> >   return $OCF_ERR_ARGS
> > }
> >
> > named_meta_data() {
> >         cat <<EOF
> > <?xml version="1.0"?>
> > <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
> > <resource-agent name="named">
> > <version>1.0</version>
> >
> > <longdesc lang="en">
> > Resource script for named (Bind) server. It manages named as an HA
> resource.
> > </longdesc>
> > <shortdesc lang="en">Manages a named server</shortdesc>
> >
> > <parameters>
> > <parameter name="named" unique="0" required="0">
> > <longdesc lang="en">
> > Path to the named command.
> > </longdesc>
> > <shortdesc lang="en">named</shortdesc>
> > <content type="string" default="${OCF_RESKEY_named_default}" />
> > </parameter>
> >
> > <parameter name="named_user" unique="0" required="0">
> > <longdesc lang="en">
> > User that should own named process.
> > </longdesc>
> > <shortdesc lang="en">named_user</shortdesc>
> > <content type="string" default="${OCF_RESKEY_named_user_default}" />
> > </parameter>
> >
> > <parameter name="named_config" unique="0" required="0">
> > <longdesc lang="en">
> > Configuration file for named.
> > </longdesc>
> > <shortdesc lang="en">named_config</shortdesc>
> > <content type="string" default="${OCF_RESKEY_named_config_default}" />
> > </parameter>
>
> This one should be unique.
>

Done.


>
> > <parameter name="named_pidfile" unique="0" required="0">
> > <longdesc lang="en">
> > PIDFILE file for named.
> > </longdesc>
> > <shortdesc lang="en">named_pidfile</shortdesc>
> > <content type="string" default="${OCF_RESKEY_named_pidfile_default}" />
> > </parameter>
>
> This one too.
>

Done.


>
> > <parameter name="named_rootdir" unique="0" required="0">
> > <longdesc lang="en">
> > Directory that named should use for chroot if any.
> > </longdesc>
> > <shortdesc lang="en">named_rootdir</shortdesc>
> > <content type="string" default="${OCF_RESKEY_named_rootdir_default}" />
> > </parameter>
>
> This one also? Or do different instances share chroot?
>

Made it unique too.

>
> > <parameter name="named_options" unique="0" required="0">
> > <longdesc lang="en">
> > Options for named process if any.
> > </longdesc>
> > <shortdesc lang="en">named_options</shortdesc>
> > <content type="string" default="${OCF_RESKEY_named_options_default}" />
> > </parameter>
> >
> > <parameter name="named_keytab_file" unique="0" required="0">
> > <longdesc lang="en">
> > named service keytab file (for GSS-TSIG).
> > </longdesc>
> > <shortdesc lang="en">named_keytab_file</shortdesc>
> > <content type="string" default="${OCF_RESKEY_named_keytab_file_default}"
> />
> > </parameter>
> >
> > <parameter name="named_stop_timeout" unique="0" required="0">
> > <longdesc lang="en">
> > Stop timeout. Named process will be killed if it doesn't stop in a given
> time.
> > </longdesc>
> > <shortdesc lang="en">named_stop_timeout</shortdesc>
> > <content type="string" default="${OCF_RESKEY_named_stop_timeout_default}"
> />
> > </parameter>
>
> This an overkill? Some RA use 2/3 (or so) of the meta timeout
> attribute and then try to kill -9.
>

Got rid of the parameter and implemented that 2/3 of met timeout idea.

>
> > <parameter name="monitor_request" unique="0" required="0">
> > <longdesc lang="en">
> > Request that shall be sent to named for monitoring. Usually an A record
> in DNS.
> > </longdesc>
> > <shortdesc lang="en">monitor_request</shortdesc>
> > <content type="string" default="${OCF_RESKEY_monitor_request_default}" />
> > </parameter>
> >
> > <parameter name="monitor_response" unique="0" required="0">
> > <longdesc lang="en">
> > Expected response from named server.
> > </longdesc>
> > <shortdesc lang="en">monitor_response</shortdesc>
> > <content type="string" default="${OCF_RESKEY_monitor_response_default}"
> />
> > </parameter>
> >
> > <parameter name="monitor_ip" unique="0" required="0">
> > <longdesc lang="en">
> > IP Address where named listens.
> > </longdesc>
> > <shortdesc lang="en">monitor_ip</shortdesc>
> > <content type="string" default="${OCF_RESKEY_monitor_ip_default}" />
> > </parameter>
> > </parameters>
>
> Why not just use localhost? Could there be an instance which
> doesn't listen on the lo interface?
>

Disagree. I usually prefer monitor clustered resources through VIPs they
assigned to. Also localhost wouldn't work with the case of several instances
listening on different interfaces.



>
> > <actions>
> > <action name="start" timeout="60" />
> > <action name="stop" timeout="60" />
> > <action name="reload" timeout="60" />
> > <action name="status" timeout="10" />
> > <action name="monitor" depth="0" timeout="30" interval="30"/>
> > <action name="meta-data" timeout="5" />
> > <action name="validate-all" timeout="5" />
> > <action name="methods" timeout="5" />
> > </actions>
> > </resource-agent>
> >
> > EOF
> > }
> >
> > #
> > # methods: What methods/operations do we support?
> > #
> >
> > named_methods() {
> >   cat <<EOF
> >         start
> >         stop
> >         status
> >         monitor
> >         methods
> >         meta-data
> >         validate-all
> > EOF
> > }
> >
> > # Validate most critical parameters
> > named_validate_all() {
> >     check_binary $OCF_RESKEY_named
> >     check_binary $RNDC
> >     check_binary $HOST
> >
> >     if [ ! -r "$OCF_RESKEY_named_config" ]; then
> >         if ocf_is_probe; then
> >            ocf_log info "Configuration file $OCF_RESKEY_named_config not
> readable during probe."
> >         else
> >            ocf_log err "Configuration file $OCF_RESKEY_named_config
> doesn't exist"
> >            return $OCF_ERR_INSTALLED
> >         fi
> >     fi
> >
> >     getent passwd $OCF_RESKEY_named_user >/dev/null 2>&1
> >     if [ ! $? -eq 0 ]; then
> >         ocf_log err "User $OCF_RESKEY_named_user doesn't exist";
> >         return $OCF_ERR_INSTALLED;
> >     fi
> >
> >     if [ -z "$OCF_RESKEY_monitor_request" -o \
> >          -z "$OCF_RESKEY_monitor_response" -o \
> >          -z "$OCF_RESKEY_monitor_ip" ]; then
> >         ocf_log err "Neither monitor_request, monitor_response or
> monitor_ip can be empty"
>
> ocf_log err "Neither monitor_request, monitor_response, nor monitor_ip can
> be empty"
>
> (I guess, not a native speaker.)
>

Even after 10 years of living in the US. Next time will check with my
daughter ;-)

>
> >         return $OCF_ERR_CONFIGURED
> >     fi
> >
> >     return $OCF_SUCCESS
> > }
> >
> > #
> > # named_status. Simple check of the status of named process by pidfile.
> > #
> >
> > named_status () {
> >     ocf_pidfile_status ${OCF_RESKEY_named_pidfile} >/dev/null 2>&1
> >     return $?
>
> return is superfluous here.
>

Deleted.


>
> > }
> >
> > #
> > # named_monitor. Send a request to named and check response.
> > #
> >
> > named_monitor() {
> >     if ! named_status
> >     then
> >         ocf_log info "named is down"
> >         return $OCF_NOT_RUNNING
> >     fi
> >
> >     if ! $HOST $OCF_RESKEY_monitor_request $OCF_RESKEY_monitor_ip | \
> >        grep -q '.* has address '"$OCF_RESKEY_monitor_response" ; then
> >        ocf_log err "named didn't answer properly for
> $OCF_RESKEY_monitor_request."
> >        ocf_log err "Expected: $OCF_RESKEY_monitor_response."
> >        ocf_log err "Got: `$HOST $OCF_RESKEY_monitor_request
> $OCF_RESKEY_monitor_ip`"
>
> I think that you really need to save the output of the above
> command:
>
>     local output
>         output=`$HOST $OCF_RESKEY_monitor_request $OCF_RESKEY_monitor_ip`
>         if [ $? -ne 0 ] || ! echo $output | grep -q ...; then
>         ...
>

Done.


>
> >        return $OCF_ERR_GENERIC
> >     fi
> >
> >     return $OCF_SUCCESS
> > }
> >
> > #
> > # Reload
> > #
> >
> > named_reload() {
> >     $RNDC reload >/dev/null 2>&1 || return $OCF_ERR_GENERIC
>
> Perhaps to let at least stderr through.
>

Done.


>
> >
> >     return $OCF_SUCCESS
> > }
> >
> > #
> > # Start
> > #
> >
> > named_start() {
> >     local ROOT_DIR_OPT
> >     local PID
>
> Usually local variables are lower case, so that global which are
> typically upper case stand out.
>

Done.


>
> >     ROOT_DIR_OPT=""
> >     named_status && return $OCF_SUCCESS
> >
> >     # Remove pidfile if exists
> >     rm -f ${OCF_RESKEY_named_pidfile}
> >
> >     if [ -n "${OCF_RESKEY_named_rootdir}" -a
> "x${OCF_RESKEY_named_rootdir}" != "x/" ]
>
> Why '/' at the end makes a difference?
>
> >     then
> >         ROOT_DIR_OPT="-t ${OCF_RESKEY_named_rootdir}"
> >         [ -s /etc/localtime ] && cp -fp /etc/localtime
> ${OCF_RESKEY_named_rootdir}/etc/localtime
> >     fi
> >
> >     if ! ${OCF_RESKEY_named} -u ${OCF_RESKEY_named_user} $ROOT_DIR_OPT
> ${OCF_RESKEY_named_options}
> >     then
> >         ocf_log err "named failed to start."
> >         return $OCF_ERR_GENERIC
> >     fi
> >
> >
> >     PID=`$PIDOF -o %PPID ${OCF_RESKEY_named}`
>
> Hmm, pidof is part of sysvinit-utils (or similar). Is that always
> necessarily installed? (I can see that there are two more RA
> using it, but still.)
>

Got rid of pidof and added named_getpid, Actually it's even better since now
it can recognize different instances on the same server.


>
> >     if [ -n "$PID" ]; then
> >         if [ ! -e ${OCF_RESKEY_named_pidfile} ]; then
> >             echo $PID > ${OCF_RESKEY_named_pidfile}
> >         fi
> >     else
> >         ocf_log err "named failed to start. Probably error in
> configuration."
> >         return $OCF_ERR_GENERIC
> >     fi
> >
> >
> >     while :
> >     do
> >         named_monitor && break
> >         sleep 1
> >         ocf_log debug "named hasn't started yet."
> >     done
> >     ocf_log debug "named has started."
>
> I guess that you can use the info severity here. Starts don't
> happen that often.
>

Done.


>
> >     return $OCF_SUCCESS
> > }
> >
> > #
> > # Stop
> > #
> >
> > named_stop () {
> >     local timeout
> >
> >     named_status || return $OCF_SUCCESS
> >
> >     if ! $RNDC stop >/dev/null 2>&1; then
>
> Again, let lrmd log stderr.
>

Done.


>
> >         kill `cat ${OCF_RESKEY_named_pidfile}`
> >     fi
> >
> >     timeout=0
> >     while named_status ; do
> >         if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
> >             break
> >         else
> >             sleep 1
> >             timeout=$((timeout++))
> >         fi
> >     done
> >
> >     #If still up
> >     named_status 2>&1 && (ocf_log err "named is still up! Killing"; \
> >                     kill -9 `cat ${OCF_RESKEY_named_pidfile}`)
> >
> >     rm -f ${OCF_RESKEY_named_pidfile}
> >     return $OCF_SUCCESS
> > }
> >
> >
> > # Main part
> >
> > [ $# -ne 1 ] && (usage; exit $OCF_ERR_GENERIC)
>
> In bash, () creates a subprocess, so exit (I think so) won't work
> here. Better use {}.
>

You are right. Replaced it with standard if/then for better readability.


>
> >
> > case "$1" in
> >     methods)    named_methods
> >                 exit $?;;
> >
> >     meta-data)  named_meta_data
> >                 exit $OCF_SUCCESS;;
> > esac
> >
> > named_validate_all
> > rc=$?
> >
> > [ "$1" == "validate-all" ] && exit $rc
> >
> > if [ $rc -ne 0 ]
> > then
> >     case "$1" in
> >         stop)    exit $OCF_SUCCESS;;
> >         monitor) exit $OCF_NOT_RUNNING;;
> >         status)  exit $OCF_NOT_RUNNING;;
> >         *)       exit $rc;;
> >     esac
> > fi
> >
> > [ "$EUID" != "0" ] && (ocf_log err "$0 must be run as root"; \
> >                        exit $OCF_ERR_GENERIC)
> >
> > case "$1" in
> >     status)     if named_status
> >                 then
> >                     ocf_log info "named is up"
> >                     exit $OCF_SUCCESS
> >                 else
> >                     ocf_log info "named is down"
> >                     exit $OCF_NOT_RUNNING
> >                 fi;;
> >
> >     monitor)    named_monitor
> >                 exit $?;;
> >
> >     start)      named_start
> >                 exit $?;;
> >
> >     stop)       named_stop
> >                 exit $?;;
> >     reload)     named_reload
> >                 exit $?;;
> >     *)
> >                 exit $OCF_ERR_UNIMPLEMENTED;;
> > esac
> >
> > --
> > Serge Dubrouski.
>
>
Thanks for reviewing.


> > _______________________________________________________
> > Linux-HA-Dev: [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
>
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>



-- 
Serge Dubrouski.

Attachment: named
Description: Binary data

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to