Re: [Linux-ha-dev] resource agent for nfsserver

Lars Marowsky-Bree Wed, 22 Oct 2008 08:31:18 -0700

On 2008-10-21T10:56:41, Xinwei Hu <[EMAIL PROTECTED]> wrote:

Hi Xinwei,


thanks for posting this RA. Looks pretty good, a few comments:

- I assume this does "only" do NFS fail-over of the whole server, not
  single exports, and not exporting the same fs from several nodes
  concurrently?

  (If it can't, it probably should return ERR_CONFIGURED if run as a
  clone.)

- How is it stacked with IPaddr? It seems to require the IPaddr to be
  online before - but if IPaddr is running w/o the server being up yet,
  there's a time where clients would get connection refused errors.


> #!/bin/sh

Really sh, or bash? (bash is fine, but if so, it should state that.) I
am hazy on what is bash specific and what isn't, so I thought it's safer
to ask ;-)

> # nfsserver
> #
> # Description: Manages nfs server as OCF resource
> # by [EMAIL PROTECTED]
> # License: GNU General Public License 2 (GPL2)

Why GPLv2 only, and not GPLv2 and later?

> if [ -n "$OCF_DEBUG_LIBRARY" ]; then
>     . $OCF_DEBUG_LIBRARY
> else
>     . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs
> fi

I wonder where that snippet comes from; do we actually have a debug
library anywhere? I've seen it in one or two RAs, but just curious.

> <parameter name="nfs_ip" unique="0" required="1">
> <longdesc lang="en">
> The floating IP address used to access the the nfs service
> </longdesc>
> <shortdesc lang="en">
> IP address.
> </shortdesc>
> <content type="string" default="*" />

If this was called "ip" instead of "nfs_ip", it could reference the
IPaddr parameter in the CIB (possible with pacemaker-1.0) and only be
specified once.

(We probably want to eventually have hints like that for the GUI to
automatically support this ...)

> <actions>
> <action name="start"   timeout="40" />
> <action name="stop"    timeout="10" />
> <action name="status"   timeout="90" />

status doesn't need to be specified.

> <action name="monitor" depth="0"  timeout="20" interval="10" 
> start-delay="1m"/>

Why a start-delay?

> nfsserver_monitor ()
> {
>       ${OCF_RESKEY_nfs_init_script} status > /dev/null 2>&1 

Should the output be logged?

>       rc=$?
> 
> #Adapte LSB status code to OCF return code
>       if [ $rc -eq 0 ]; then
>               return $OCF_SUCCESS
>       elif [ $rc -eq 3 ]; then
>               return $OCF_NOT_RUNNING
>       else
>               return $OCF_ERR_GENERIC
>       fi
> }

Should this handle "unused" as a possible return code translated to
ERR_INSTALLED or ERR_CONFIGURED?

> prepare_directory ()
> {
>       fp="$OCF_RESKEY_nfs_shared_infodir"

fp is assigned in many places; this should be checked & set as part of
the general init sequence of the script (see below).

> 
>       [ -d "$fp" ] || mkdir -p $fp
>       [ -d "$fp/rpc_pipefs" ] || mkdir -p $fp/rpc_pipefs
>       [ -d "$fp/sm" ] || mkdir -p $fp/sm
>       [ -d "$fp/sm.ha" ] || mkdir -p $fp/sm.ha
>       [ -d "$fp/sm.bak" ] || mkdir -p $fp/sm.bak
>       [ -d "$fp/v4recovery" ] || mkdir -p $fp/v4recovery
> }
> 
> is_bound ()
> {
>       mount | grep -q "$1 on $2 type none (.*bind)"
>       return $?
> }

I have no perfect proof, but this strikes me as fragile, ie if bind is
not the last option etc.

> prepare_tree ()
> {
>       fp="$OCF_RESKEY_nfs_shared_infodir"
>       if is_bound $fp /var/lib/nfs; then
>               ocf_log debug "$fp is already bound to /var/lib/nfs"
>               return 0
>       fi
>       mount --bind $fp /var/lib/nfs
> }

Why a bind mount instead of possibly symlinking the shared space to
this, or directly mounting the shared fs or something else?

(No criticism; just trying to understand the reasons.)

> unprepare_tree ()
> {

Better name might be to call these "bind/unbind" or "take/release", as
unprepare doesn't really exist.

>       fp="$OCF_RESKEY_nfs_shared_infodir"
>       if is_bound $fp /var/lib/nfs; then
>               umount /var/lib/nfs
>       fi
> }
> 
> nfsserver_start ()
> {
>       prepare_directory
>       prepare_tree
> 
>       cp -f /var/lib/nfs/sm/* /var/lib/nfs/sm.ha 2>&1 > /dev/null

What about left-over files in sm.ha? Should that directory be cleaned
out first?

>       ocf_log info "Starting NFS server ..."
>       ${OCF_RESKEY_nfs_init_script} start 2>&1 > /dev/null    

Should the output here be logged?

>       rc=$?
>       if [ $rc -ne 0 ]; then
>               ocf_log err "Failed to start NFS server"
>               return $rc
>       fi      
> 
>       #Notify the nfs server has been moved or rebooted
>       #The init script do that already, but with the hostname, which may be 
> ignored by client
>       #we have to do it again with the nfs_ip 
>       ${OCF_RESKEY_nfs_notify_cmd} -v $OCF_RESKEY_nfs_ip -P /var/lib/nfs/sm.ha
>       rm -f /var/lib/nfs/sm.ha/* 2>&1 > /dev/null
> 
>       ocf_log info "NFS server started"
>       return $OCF_SUCCESS
> }

What happens when this is called twice? Does it pass ocf-tester?

> 
> nfsserver_stop ()
> {
>       ocf_log info "Stopping NFS server ..."
>       ${OCF_RESKEY_nfs_init_script} stop 2>&1 > /dev/null

Should this output be logged?

>       rc=$?
>       if [ $rc -eq 0 ]; then
>               unprepare_tree 
>               ocf_log info "NFS server stopped"
>               return $OCF_SUCCESS
>       fi
>       ocf_log err "Failed to stop NFS server"
>       return $rc
> }
> 
> nfsserver_validate ()
> {
>       if [ -z ${OCF_RESKEY_nfs_ip} ]; then
>               return $OCF_ERR_ARGS
>       fi
> 
>       if [ -x ${OCF_RESKEY_nfs_init_script} && -x 
> ${OCF_RESKEY_nfs_notify_cmd} ]; then
>               if [ -d $OCF_RESKEY_nfs_shared_infodir ]; then
>                       return $OCF_SUCCESS
>               else
>                       return $OCF_ERR_ARGS
>       else
>               return $OCF_ERR_INSTALLED
>       fi
> }

I think you always need to call nfsserver_validate() in yout generic
init sequence for this script, before attempting to start/stop/monitor.
Otherwise, the initial probe will fail and throw the cluster into a
reboot cycle if the binary is incorrectly specified.

For checking for the binaries, "check_binary" is likely your friend.

> case $__OCF_ACTION in
>       start)      nfsserver_start
>               ;;
>       stop)       nfsserver_stop
>               ;;
>       monitor)    nfsserver_monitor
>               ;;
>       validate-all)   nfsserver_validate
>               ;;
>       *)      nfsserver_usage
>       exit $OCF_ERR_UNIMPLEMENTED
>       ;;
> esac

Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] resource agent for nfsserver

Reply via email to