Hi Satomi-san,

On Tue, Oct 14, 2008 at 03:06:27PM +0900, Satomi TANIGUCHI wrote:
> Hi Dejan,
>
> Thank you so much for your comments!
> I modified and tested the patch.
>
>
> Dejan Muhamedagic wrote:
>> Hi Satomi-san,
>>
>> On Wed, Oct 08, 2008 at 02:55:57PM +0900, Satomi TANIGUCHI wrote:
>>> Hi lists,
>>>
>>> I'm posting a STONITH plugin which checks whether the target node is 
>>> kdumping
>>> or not.
>>> There are some steps to use this, but I believe this plugin is helpful for
>>> failure analysis.
>>> See attached README for details about how to use this.
>>>
>>> There are 2 patches.
>>> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8).
>>> And the patch named "mkdumprd_for_kdumpcheck.patch" is
>>> for mkdumprd version 5.0.39.
>>>
>>> If you're interested in, please give me your comments.
>>> Any comments and suggestions are really appreciated.
>>
>> The script (kdumpcheck) looks fine to me. Just a few points.
>>
>> The use of upper case variable names: Typically, those denote
>> global (or exported) environment variables. Vars which should
>> live only within a function (though that's not possible with
>> Bourne shell) should be lower case and, probably, have shorter
>> names. Excessive use of upper case strains eyes more than the
>> lower case. That is unless you're a VMS user ;-)
> I changed all non-global variables' names to lower and shorter strings.
> Thanks!
>
>>
>> Leave "function" and "local" keywords out, unless you want to use
>> /bin/bash for the script, but I don't see why would that be
>> necessary.
> I deleted "function" and "local".
> And now check_identity_file() and check_user_existence() require no argument.
>
>>
>> I wonder if the status function should depend on ping-ing the
>> target node.
> The ping-ing is just to confirm that
> the node which kdumpcheck plugin is working on knows the hostnames in 
> hostlist.
> Because if the target node is not listed in hostlist,
> kdumpcheck will fail to STONITH the node.
> Is it verbosity?
> I referd to ssh STONITH plugin when I wrote these process...
> I think it is necessary for the case which an user writes wrong hostname
> to hostlist or /etc/hosts.
>
>>
>> Document that this works only on Linux.
> I added NOTE in README's introduction.

Applied the patch.

Cheers,

Dejan

>
>
> Best Regards,
> Satomi TANIGUCHI
>
>
>
>>
>> Cheers,
>>
>> Dejan
>>
>>> Best Regards,
>>> Satomi TANIGUCHI
>> _______________________________________________________
>> Linux-HA-Dev: [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>

> diff -urN org/configure.in mod/configure.in
> --- org/configure.in  2008-10-14 10:24:16.000000000 +0900
> +++ mod/configure.in  2008-10-14 10:25:17.000000000 +0900
> @@ -2665,6 +2665,7 @@
>               lib/plugins/stonith/external/riloe              \
>               lib/plugins/stonith/external/ssh                \
>               lib/plugins/stonith/external/hmchttp            \
> +             lib/plugins/stonith/external/kdumpcheck         \
>               lib/plugins/stonith/external/xen0-ha            \
>               lib/plugins/stonith/external/drac5              \
>               lib/plugins/HBcompress/Makefile                 \
> diff -urN org/lib/plugins/stonith/external/Makefile.am 
> mod/lib/plugins/stonith/external/Makefile.am
> --- org/lib/plugins/stonith/external/Makefile.am      2008-10-14 
> 10:24:17.000000000 +0900
> +++ mod/lib/plugins/stonith/external/Makefile.am      2008-10-14 
> 10:25:17.000000000 +0900
> @@ -20,13 +20,13 @@
>  MAINTAINERCLEANFILES = Makefile.in
>  
>  EXTRA_DIST           = drac5 ibmrsa-telnet ipmi rackpdu vmware xen0 \
> -                     xen0-ha-dom0-stonith-helper sbd
> +                     xen0-ha-dom0-stonith-helper sbd kdumpcheck
>  
>  extdir                    = $(stonith_ext_plugindir)
>  
>  helperdir         = $(stonith_plugindir)
>  
>  ext_SCRIPTS       = drac5 ibmrsa ibmrsa-telnet ipmi riloe ssh vmware rackpdu 
> xen0 hmchttp \
> -                     xen0-ha sbd
> +                     xen0-ha sbd kdumpcheck
>  
>  helper_SCRIPTS            = xen0-ha-dom0-stonith-helper
> diff -urN org/lib/plugins/stonith/external/kdumpcheck.in 
> mod/lib/plugins/stonith/external/kdumpcheck.in
> --- org/lib/plugins/stonith/external/kdumpcheck.in    1970-01-01 
> 09:00:00.000000000 +0900
> +++ mod/lib/plugins/stonith/external/kdumpcheck.in    2008-10-14 
> 10:02:03.000000000 +0900
> @@ -0,0 +1,288 @@
> +#!/bin/sh
> +#
> +# External STONITH module to check kdump.
> +#
> +# Copyright (c) 2008 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of version 2 of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> +#
> +# Further, this software is distributed without any warranty that it is
> +# free of the rightful claim of any third person regarding infringement
> +# or the like.  Any license provided herein, whether implied or
> +# otherwise, applies only to this software file.  Patent licenses, if
> +# any, provided herein do not apply to combinations of this program with
> +# other software, or any other product whatsoever.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
> +#
> +
> +SSH_COMMAND="@SSH@ -q -x -o PasswordAuthentication=no -o 
> StrictHostKeyChecking=no -n"
> +#Set default user name.
> +USERNAME="kdumpchecker"
> +#Initialize identity file-path options for ssh command
> +IDENTITY_OPTS=""
> +
> +#For debug print.
> +DEBUG=1
> +if [ -n "${DEBUG}" ]; then
> +    DEBUG_FILE=/var/log/ha-kdumpcheck.log
> +    touch ${DEBUG_FILE}
> +    chmod 600 ${DEBUG_FILE}
> +
> +    exec 2>> ${DEBUG_FILE}
> +    OUTPUT='>&2'
> +fi
> +
> +print_debug() {
> +    if [ -n "${DEBUG}" ]; then
> +        cat >&2
> +    else
> +        cat > /dev/null 2>&1
> +    fi
> +}
> +
> +#Rewrite the hostlist to accept "," as a delimeter for hostnames too.
> +hostlist=`echo ${hostlist} | tr ',' ' '`
> +
> +##
> +# Check the parameter hostlist is set or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +check_hostlist() {
> +    if [ -z "${hostlist}" ]; then
> +        echo "`date`::ERROR: hostlist is empty." | print_debug
> +        exit 6 #ERR_CONFIGURED
> +    fi
> +}
> +
> +##
> +# Set kdump check user name to USERNAME.
> +#   always return 0.
> +##
> +get_username() {
> +    kdump_conf="/etc/kdump.conf"
> +    config_name="kdump_check_user"
> +
> +    if [ ! -f "${kdump_conf}" ]; then
> +        echo "`date`::DEBUG: ${kdump_conf} doesn't exist." | print_debug
> +        return 0
> +    fi
> +
> +    tmp=`grep "^\s*${config_name}\>" ${kdump_conf} | awk '{print $2}'`
> +    if [ -n "${tmp}" ]; then
> +        USERNAME="${tmp}"
> +    fi
> +
> +    echo "`date`::DEBUG: kdump check user name is ${USERNAME}." | print_debug
> +}
> +
> +##
> +# Check the specified or default identity file exists or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +check_identity_file() {
> +    IDENTITY_OPTS=""
> +    if [ -n "${identity_file}" ]; then
> +        if [ ! -f "${identity_file}" ]; then
> +            echo "`date`::ERROR: ${identity_file} doesn't exist." | 
> print_debug
> +            exit 6 #ERR_CONFIGURED
> +        fi
> +        IDENTITY_OPTS="-i ${identity_file}"
> +    else
> +        flg_file_exists=0
> +        homedir=`eval echo "~${USERNAME}"`
> +        for filename in "${homedir}/.ssh/id_rsa" \
> +                        "${homedir}/.ssh/id_dsa" \
> +                        "${homedir}/.ssh/identity"
> +        do
> +            if [ -f "${filename}" ]; then
> +                flg_file_exists=1
> +                IDENTITY_OPTS="${IDENTITY_OPTS} -i ${filename}"
> +            fi
> +        done
> +        if [ ${flg_file_exists} -eq 0 ]; then
> +            echo "`date`::ERROR: ${USERNAME}'s identity file for ssh 
> command" \
> +                " doesn't exist." | print_debug
> +            exit 6 #ERR_CONFIGURED
> +        fi
> +    fi
> +}
> +
> +##
> +# Check the user to check doing kdump exists or not.
> +# If not, exit with 6 (ERR_CONFIGURED).
> +##
> +check_user_existence() {
> +
> +    # Get kdump check user name and check whether he exists or not.
> +    grep -q "^${USERNAME}\>" /etc/passwd > /dev/null 2>&1
> +    ret=$?
> +    if [ ${ret} != 0 ]; then
> +        echo "`date`::ERROR: user ${USERNAME} doesn't exist." \
> +            "please confirm \"kdump_check_user\" setting in 
> /etc/kdump.conf." \
> +            "(default user name is \"kdumpchecker\")" | print_debug
> +        exit 6 #ERR_CONFIGURED
> +    fi
> +}
> +
> +##
> +# Check the target node is kdumping or not.
> +#   arg1 : target node name.
> +#   ret  : 0 -> the target is kdumping.
> +#        : 1 -> the target is _not_ kdumping.
> +#        : else -> failed to check.
> +##
> +check_kdump() {
> +    target_node="$1"
> +
> +    # Get kdump check user name.
> +    get_username
> +    check_user_existence
> +    exec_cmd="${SSH_COMMAND} -l ${USERNAME}"
> +
> +    # Specify kdump check user's identity file for ssh command.
> +    check_identity_file
> +    exec_cmd="${exec_cmd} ${IDENTITY_OPTS}"
> +
> +    # Now, check the target!
> +    # In advance, Write the following setting at the head of
> +    # kdump_check_user's public key in authorized_keys file on target node.
> +    #    command="test -s /proc/vmcore", \
> +    #    no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty
> +    echo "`date`::DEBUG: execute the command" \
> +            "[${exec_cmd} ${target_node}]." | print_debug
> +    ${exec_cmd} ${target_node} > /dev/null 2>&1
> +    ret=$?
> +    echo "`date`::DEBUG: the command's result is ${ret}." | print_debug
> +
> +    #ret ->   0 : vmcore file's size is not zero. the node is kdumping.
> +    #ret ->   1 : the node is _not_ kdumping (vmcore didn't exist or
> +    #             its size is zero). It still needs to be STONITH'ed.
> +    #ret -> 255 : ssh command is failed.
> +    #      else : Maybe command strings in authorized_keys is wrong...
> +    return ${ret}
> +}
> +
> +###
> +#
> +#  Main function.
> +#
> +###
> +case $1 in
> +gethosts)
> +    check_hostlist
> +    for hostname in ${hostlist} ; do
> +        echo "${hostname}"
> +    done
> +    exit 0
> +    ;;
> +on)
> +    # This plugin does only check whether a target node is kdumping or not.
> +    exit 1
> +    ;;
> +reset|off)
> +    check_hostlist
> +    ret=1
> +    for hostname in ${hostlist}
> +    do
> +        if [ "${hostname}" != "$2" ]; then
> +            continue
> +        fi
> +        while [ 1 ]
> +        do
> +            check_kdump "$2"
> +            ret=$?
> +            if [ ${ret} -ne 255 ]; then
> +                exit ${ret}
> +            fi
> +            #255 means ssh command itself is failed.
> +            #For example, connection failure as if network doesn't start yet
> +            #in 2nd kernel on the target node.
> +            #So, retry to check after a little while.
> +            sleep 1
> +        done
> +    done
> +    exit ${ret}
> +    ;;
> +status)
> +    check_hostlist
> +    for hostname in ${hostlist}
> +    do
> +        if ping -w1 -c1 "${hostname}" 2>&1 | grep "unknown host"
> +        then
> +            exit 1
> +        fi
> +    done
> +    get_username
> +    check_user_existence
> +    check_identity_file
> +    exit 0
> +    ;;
> +getconfignames)
> +    echo "hostlist identity_file"
> +    exit 0
> +    ;;
> +getinfo-devid)
> +    echo "kdump check STONITH device"
> +    exit 0
> +    ;;
> +getinfo-devname)
> +    echo "kdump check STONITH external device"
> +    exit 0
> +    ;;
> +getinfo-devdescr)
> +    echo "ssh-based kdump checker"
> +    echo "To check whether a target node is dumping or not."
> +    exit 0
> +    ;;
> +getinfo-devurl)
> +    echo "kdump -> http://lse.sourceforge.net/kdump/";
> +    echo "ssh   -> http://openssh.org";
> +    exit 0
> +    ;;
> +getinfo-xml)
> +    cat << SSHXML
> +<parameters>
> +<parameter name="hostlist" unique="1" required="1">
> +<content type="string" />
> +<shortdesc lang="en">
> +Hostlist
> +</shortdesc>
> +<longdesc lang="en">
> +The list of hosts that the STONITH device controls
> +</longdesc>
> +</parameter>
> +
> +<parameter name="identity_file" unique="1" required="0">
> +<content type="string" />
> +<shortdesc lang="en">
> +Identity file's full path for kdump check user
> +</shortdesc>
> +<longdesc lang="en">
> +The full path of kdump check user's identity file for ssh command.
> +The identity in the specified file have to be restricted to execute
> +only the following command.
> +"test -s /proc/vmcore"
> +Default: kdump check user's default identity file path.
> +NOTE: You can specify kdump check user name in /etc/kdump.conf.
> +      The parameter name is "kdump_check_user".
> +      Default user is "kdumpchecker".
> +</longdesc>
> +</parameter>
> +
> +</parameters>
> +SSHXML
> +    exit 0
> +    ;;
> +*)
> +    exit 1
> +    ;;
> +esac

>                      Kdump check STONITH plugin "kdumpcheck"
> 1. Introduction
>     This plugin's purpose is to avoid STONITH for a node which is doing kdump.
>     It confirms whether the node is doing kdump or not when STONITH reset or
>     off operation is executed.
>     If the target node is doing kdump, this plugin considers that STONITH
>     succeeded. If not, it considers that STONITH failed.
> 
>     NOTE: This plugin has no ability to shutdown or startup a node.
>           So it has to be used with other STONITH plugin.
>           Then, when this plugin failed, the next plugin which can kill a node
>           is executed.
>     NOTE: This plugin works only on Linux.
> 
> 2. The way to check
>    When STONITH reset or off is executed, kdumpcheck connects to the target
>    node, and checks the size of /proc/vmcore.
>    It judges that the target node is _not_ doing kdump when the size of
>    /proc/vmcore on the node is zero, or the file doesn't exist.
>    Then kdumpcheck returns "STONITH failed" to stonithd, and the next plugin
>    is executed.
> 
> 3. Expanding mkdumprd
>     This plugin requires non-root user and ssh connection even on 2nd kernel.
>     So, you need to apply mkdumprd_for_kdumpcheck.patch to /sbin/mkdumprd.
>     This patch is tested with mkdumprd version 5.0.39.
>     The patch adds the following functions:
>       i) Start udevd with specified .rules files.
>      ii) Bring the specified network interface up.
>     iii) Start sshd.
>      iv) Add the specified user to the 2nd kernel.
>          The user is to check whether the node is doing kdump or not.
>       v) Execute sync command after dumping.
> 
>      NOTE: i) to iv) expandings are only for the case that filesystem 
> partition
>            is specified as the location where the vmcore should be dumped.
> 
> 4. Parameters
>     kdumpcheck's parameters are the following.
>       hostlist     : The list of hosts that the STONITH device controls.
>                      delimiter is "," or " ".
>                      indispensable setting. (default:none)
>       identity_file: a full-path of the private key file for the user
>                      who checks doing kdump.
>                      (default: $HOME/.ssh/id_rsa, $HOME/.ssh/id_dsa and
>                                $HOME/.ssh/identity)
> 
>     NOTE: To execute this plugin first, set the highest priority to this 
> plugin
>           in all STONITH resources.
> 
> 5. How to Use
>     To use this tool, do the following steps at all nodes in the cluster.
>       1) Add an user to check doing kdump.
>          ex.)
>            # useradd kdumpchecker
>            # passwd kdumpchecker
>       2) Allow passwordless login from the node which will do STONITH to all
>          target nodes for the user added at step 1).
>          ex.)
>            $ cd
>            $ mkdir .ssh
>            $ chmod 700 .ssh
>            $ cd .ssh
>            $ ssh-keygen (generate authentication  keys with empty passphrase)
>            $ scp id_rsa.pub [EMAIL PROTECTED]:"~/.ssh/."
>            $ ssh [EMAIL PROTECTED]
>            $ cd ~/.ssh
>            $ cat id_rsa.pub >> authorized_keys
>            $ chmod 600 autorized_keys
>            $ rm id_rsa.pub
>       3) Limit the command that the user can execute.
>          Describe the following commands in a line at the head of the user's
>          public key in target node's authorized_keys file.
>          [command="test -s /proc/vmcore"]
>          And describe some options (like no-pty, no-port-forwarding and so on)
>          according to your security policy.
>          ex.)
>            $ vi ~/.ssh/authorized_keys
>            command="test -s 
> /proc/vmcore",no-port-forwarding,no-X11-forwarding,
>            no-agent-forwarding,no-pty ssh-rsa AAA..snip..== [EMAIL PROTECTED]
>       4) Add settings in /etc/kdump.conf.
>            network_device   : network interface name to check doing kdump.
>                               indispensable setting. (default: none)
>            kdump_check_user : user name to check doing kdump.
>                               specify non-root user.
>                               (default: "kdumpchecker")
>            udev_rules       : .rules files' names.
>                               specify if you use udev for mapping devices.
>                               specified files have to be in 
> /etc/udev/rules.d/.
>                               you can specify two or more files.
>                               delimiter is "," or " ". (default: none)
>          ex.)
>            # vi /etc/kdump.conf
>            ext3 /dev/sda1
>            network_device eth0
>            kdump_check_user kdumpchecker
>            udev_rules 10-if.rules
>       5) Apply the patch to /sbin/mkdumprd.
>            # cd /sbin
>            # patch -p 1 < mkdumprd_for_kdumpcheck.patch
>       6) Restart kdump service.
>            # service kdump restart
>       7) Describe cib.xml to set STONITH plugin.
>          (See "2. Parameters" and "6. Appendix")
> 
> 6. Appendix
>     A sample cib.xml.
>     <clone id="clnStonith">
>       <instance_attributes id="instance_attributes.id238245a">
>         <nvpair id="clone0_clone_max" name="clone_max" value="2"/>
>         <nvpair id="clone0_clone_node_max" name="clone_node_max" value="1"/>
>       </instance_attributes>
>       <group id="grpStonith">
>         <instance_attributes id="instance_attributes.id2382455"/>
>         <primitive id="grpStonith-kdumpcheck" class="stonith" 
> type="external/kd
>     umpcheck">
>           <instance_attributes id="instance_attributes.id238240a">
>             <nvpair id="nvpair.id238240b" name="hostlist" 
> value="node1,node2"/>
>             <nvpair id="nvpair.id238240c" name="priority" value="1"/>
>           <nvpair id="nvpair.id2382408b" name="stonith-timeout" value="30s"/>
>           </instance_attributes>
>           <operations>
>             <op id="grpStonith-kdumpcheck-start" name="start" interval="0"  
> tim
>     eout="300" on-fail="restart"/>
>             <op id="grpStonith-kdumpcheck-monitor" name="monitor" 
> interval="10"
>      timeout="60" on-fail="restart"/>
>             <op id="grpStonith-kdumpcheck-stop" name="stop" interval="0" 
> timeou
>     t="300" on-fail="block"/>
>           </operations>
>           <meta_attributes id="primitive-grpStonith-kdump-check.meta"/>
>         </primitive>
>         <primitive id="grpStonith-ssh" class="stonith" type="external/ssh">
>           <instance_attributes id="instance_attributes.id2382402a">
>             <nvpair id="nvpair.id2382408a" name="hostlist" 
> value="node1,node2"/
>     >
>             <nvpair id="nvpair.id238066b" name="priority" value="2"/>
>             <nvpair id="nvpair.id2382408c" name="stonith-timeout" 
> value="60s"/>
>           </instance_attributes>
>           <operations>
>             <op id="grpStonith-ssh-start" name="start" interval="0" 
> timeout="30
>     0" on-fail="restart"/>
>             <op id="grpStonith-ssh-monitor" name="monitor" interval="10" 
> timeou
>     t="60" on-fail="restart"/>
>             <op id="grpStonith-ssh-stop" name="stop" interval="0" 
> timeout="300"
>      on-fail="block"/>
>           </operations>
>           <meta_attributes id="primitive-grpStonith-ssh.meta"/>
>         </primitive>
>       </group>
>     </clone>
> 

> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to