Hi Satomi-san, On Tue, Oct 14, 2008 at 03:06:27PM +0900, Satomi TANIGUCHI wrote: > Hi Dejan, > > Thank you so much for your comments! > I modified and tested the patch. > > > Dejan Muhamedagic wrote: >> Hi Satomi-san, >> >> On Wed, Oct 08, 2008 at 02:55:57PM +0900, Satomi TANIGUCHI wrote: >>> Hi lists, >>> >>> I'm posting a STONITH plugin which checks whether the target node is >>> kdumping >>> or not. >>> There are some steps to use this, but I believe this plugin is helpful for >>> failure analysis. >>> See attached README for details about how to use this. >>> >>> There are 2 patches. >>> The patch named "kdumpcheck.patch" is for Linux-HA-dev(1eae6aaf1af8). >>> And the patch named "mkdumprd_for_kdumpcheck.patch" is >>> for mkdumprd version 5.0.39. >>> >>> If you're interested in, please give me your comments. >>> Any comments and suggestions are really appreciated. >> >> The script (kdumpcheck) looks fine to me. Just a few points. >> >> The use of upper case variable names: Typically, those denote >> global (or exported) environment variables. Vars which should >> live only within a function (though that's not possible with >> Bourne shell) should be lower case and, probably, have shorter >> names. Excessive use of upper case strains eyes more than the >> lower case. That is unless you're a VMS user ;-) > I changed all non-global variables' names to lower and shorter strings. > Thanks! > >> >> Leave "function" and "local" keywords out, unless you want to use >> /bin/bash for the script, but I don't see why would that be >> necessary. > I deleted "function" and "local". > And now check_identity_file() and check_user_existence() require no argument. > >> >> I wonder if the status function should depend on ping-ing the >> target node. > The ping-ing is just to confirm that > the node which kdumpcheck plugin is working on knows the hostnames in > hostlist. > Because if the target node is not listed in hostlist, > kdumpcheck will fail to STONITH the node. > Is it verbosity? > I referd to ssh STONITH plugin when I wrote these process... > I think it is necessary for the case which an user writes wrong hostname > to hostlist or /etc/hosts. > >> >> Document that this works only on Linux. > I added NOTE in README's introduction.
Applied the patch. Cheers, Dejan > > > Best Regards, > Satomi TANIGUCHI > > > >> >> Cheers, >> >> Dejan >> >>> Best Regards, >>> Satomi TANIGUCHI >> _______________________________________________________ >> Linux-HA-Dev: [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ > > diff -urN org/configure.in mod/configure.in > --- org/configure.in 2008-10-14 10:24:16.000000000 +0900 > +++ mod/configure.in 2008-10-14 10:25:17.000000000 +0900 > @@ -2665,6 +2665,7 @@ > lib/plugins/stonith/external/riloe \ > lib/plugins/stonith/external/ssh \ > lib/plugins/stonith/external/hmchttp \ > + lib/plugins/stonith/external/kdumpcheck \ > lib/plugins/stonith/external/xen0-ha \ > lib/plugins/stonith/external/drac5 \ > lib/plugins/HBcompress/Makefile \ > diff -urN org/lib/plugins/stonith/external/Makefile.am > mod/lib/plugins/stonith/external/Makefile.am > --- org/lib/plugins/stonith/external/Makefile.am 2008-10-14 > 10:24:17.000000000 +0900 > +++ mod/lib/plugins/stonith/external/Makefile.am 2008-10-14 > 10:25:17.000000000 +0900 > @@ -20,13 +20,13 @@ > MAINTAINERCLEANFILES = Makefile.in > > EXTRA_DIST = drac5 ibmrsa-telnet ipmi rackpdu vmware xen0 \ > - xen0-ha-dom0-stonith-helper sbd > + xen0-ha-dom0-stonith-helper sbd kdumpcheck > > extdir = $(stonith_ext_plugindir) > > helperdir = $(stonith_plugindir) > > ext_SCRIPTS = drac5 ibmrsa ibmrsa-telnet ipmi riloe ssh vmware rackpdu > xen0 hmchttp \ > - xen0-ha sbd > + xen0-ha sbd kdumpcheck > > helper_SCRIPTS = xen0-ha-dom0-stonith-helper > diff -urN org/lib/plugins/stonith/external/kdumpcheck.in > mod/lib/plugins/stonith/external/kdumpcheck.in > --- org/lib/plugins/stonith/external/kdumpcheck.in 1970-01-01 > 09:00:00.000000000 +0900 > +++ mod/lib/plugins/stonith/external/kdumpcheck.in 2008-10-14 > 10:02:03.000000000 +0900 > @@ -0,0 +1,288 @@ > +#!/bin/sh > +# > +# External STONITH module to check kdump. > +# > +# Copyright (c) 2008 NIPPON TELEGRAPH AND TELEPHONE CORPORATION > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of version 2 of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, but > +# WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > +# > +# Further, this software is distributed without any warranty that it is > +# free of the rightful claim of any third person regarding infringement > +# or the like. Any license provided herein, whether implied or > +# otherwise, applies only to this software file. Patent licenses, if > +# any, provided herein do not apply to combinations of this program with > +# other software, or any other product whatsoever. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. > +# > + > +SSH_COMMAND="@SSH@ -q -x -o PasswordAuthentication=no -o > StrictHostKeyChecking=no -n" > +#Set default user name. > +USERNAME="kdumpchecker" > +#Initialize identity file-path options for ssh command > +IDENTITY_OPTS="" > + > +#For debug print. > +DEBUG=1 > +if [ -n "${DEBUG}" ]; then > + DEBUG_FILE=/var/log/ha-kdumpcheck.log > + touch ${DEBUG_FILE} > + chmod 600 ${DEBUG_FILE} > + > + exec 2>> ${DEBUG_FILE} > + OUTPUT='>&2' > +fi > + > +print_debug() { > + if [ -n "${DEBUG}" ]; then > + cat >&2 > + else > + cat > /dev/null 2>&1 > + fi > +} > + > +#Rewrite the hostlist to accept "," as a delimeter for hostnames too. > +hostlist=`echo ${hostlist} | tr ',' ' '` > + > +## > +# Check the parameter hostlist is set or not. > +# If not, exit with 6 (ERR_CONFIGURED). > +## > +check_hostlist() { > + if [ -z "${hostlist}" ]; then > + echo "`date`::ERROR: hostlist is empty." | print_debug > + exit 6 #ERR_CONFIGURED > + fi > +} > + > +## > +# Set kdump check user name to USERNAME. > +# always return 0. > +## > +get_username() { > + kdump_conf="/etc/kdump.conf" > + config_name="kdump_check_user" > + > + if [ ! -f "${kdump_conf}" ]; then > + echo "`date`::DEBUG: ${kdump_conf} doesn't exist." | print_debug > + return 0 > + fi > + > + tmp=`grep "^\s*${config_name}\>" ${kdump_conf} | awk '{print $2}'` > + if [ -n "${tmp}" ]; then > + USERNAME="${tmp}" > + fi > + > + echo "`date`::DEBUG: kdump check user name is ${USERNAME}." | print_debug > +} > + > +## > +# Check the specified or default identity file exists or not. > +# If not, exit with 6 (ERR_CONFIGURED). > +## > +check_identity_file() { > + IDENTITY_OPTS="" > + if [ -n "${identity_file}" ]; then > + if [ ! -f "${identity_file}" ]; then > + echo "`date`::ERROR: ${identity_file} doesn't exist." | > print_debug > + exit 6 #ERR_CONFIGURED > + fi > + IDENTITY_OPTS="-i ${identity_file}" > + else > + flg_file_exists=0 > + homedir=`eval echo "~${USERNAME}"` > + for filename in "${homedir}/.ssh/id_rsa" \ > + "${homedir}/.ssh/id_dsa" \ > + "${homedir}/.ssh/identity" > + do > + if [ -f "${filename}" ]; then > + flg_file_exists=1 > + IDENTITY_OPTS="${IDENTITY_OPTS} -i ${filename}" > + fi > + done > + if [ ${flg_file_exists} -eq 0 ]; then > + echo "`date`::ERROR: ${USERNAME}'s identity file for ssh > command" \ > + " doesn't exist." | print_debug > + exit 6 #ERR_CONFIGURED > + fi > + fi > +} > + > +## > +# Check the user to check doing kdump exists or not. > +# If not, exit with 6 (ERR_CONFIGURED). > +## > +check_user_existence() { > + > + # Get kdump check user name and check whether he exists or not. > + grep -q "^${USERNAME}\>" /etc/passwd > /dev/null 2>&1 > + ret=$? > + if [ ${ret} != 0 ]; then > + echo "`date`::ERROR: user ${USERNAME} doesn't exist." \ > + "please confirm \"kdump_check_user\" setting in > /etc/kdump.conf." \ > + "(default user name is \"kdumpchecker\")" | print_debug > + exit 6 #ERR_CONFIGURED > + fi > +} > + > +## > +# Check the target node is kdumping or not. > +# arg1 : target node name. > +# ret : 0 -> the target is kdumping. > +# : 1 -> the target is _not_ kdumping. > +# : else -> failed to check. > +## > +check_kdump() { > + target_node="$1" > + > + # Get kdump check user name. > + get_username > + check_user_existence > + exec_cmd="${SSH_COMMAND} -l ${USERNAME}" > + > + # Specify kdump check user's identity file for ssh command. > + check_identity_file > + exec_cmd="${exec_cmd} ${IDENTITY_OPTS}" > + > + # Now, check the target! > + # In advance, Write the following setting at the head of > + # kdump_check_user's public key in authorized_keys file on target node. > + # command="test -s /proc/vmcore", \ > + # no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty > + echo "`date`::DEBUG: execute the command" \ > + "[${exec_cmd} ${target_node}]." | print_debug > + ${exec_cmd} ${target_node} > /dev/null 2>&1 > + ret=$? > + echo "`date`::DEBUG: the command's result is ${ret}." | print_debug > + > + #ret -> 0 : vmcore file's size is not zero. the node is kdumping. > + #ret -> 1 : the node is _not_ kdumping (vmcore didn't exist or > + # its size is zero). It still needs to be STONITH'ed. > + #ret -> 255 : ssh command is failed. > + # else : Maybe command strings in authorized_keys is wrong... > + return ${ret} > +} > + > +### > +# > +# Main function. > +# > +### > +case $1 in > +gethosts) > + check_hostlist > + for hostname in ${hostlist} ; do > + echo "${hostname}" > + done > + exit 0 > + ;; > +on) > + # This plugin does only check whether a target node is kdumping or not. > + exit 1 > + ;; > +reset|off) > + check_hostlist > + ret=1 > + for hostname in ${hostlist} > + do > + if [ "${hostname}" != "$2" ]; then > + continue > + fi > + while [ 1 ] > + do > + check_kdump "$2" > + ret=$? > + if [ ${ret} -ne 255 ]; then > + exit ${ret} > + fi > + #255 means ssh command itself is failed. > + #For example, connection failure as if network doesn't start yet > + #in 2nd kernel on the target node. > + #So, retry to check after a little while. > + sleep 1 > + done > + done > + exit ${ret} > + ;; > +status) > + check_hostlist > + for hostname in ${hostlist} > + do > + if ping -w1 -c1 "${hostname}" 2>&1 | grep "unknown host" > + then > + exit 1 > + fi > + done > + get_username > + check_user_existence > + check_identity_file > + exit 0 > + ;; > +getconfignames) > + echo "hostlist identity_file" > + exit 0 > + ;; > +getinfo-devid) > + echo "kdump check STONITH device" > + exit 0 > + ;; > +getinfo-devname) > + echo "kdump check STONITH external device" > + exit 0 > + ;; > +getinfo-devdescr) > + echo "ssh-based kdump checker" > + echo "To check whether a target node is dumping or not." > + exit 0 > + ;; > +getinfo-devurl) > + echo "kdump -> http://lse.sourceforge.net/kdump/" > + echo "ssh -> http://openssh.org" > + exit 0 > + ;; > +getinfo-xml) > + cat << SSHXML > +<parameters> > +<parameter name="hostlist" unique="1" required="1"> > +<content type="string" /> > +<shortdesc lang="en"> > +Hostlist > +</shortdesc> > +<longdesc lang="en"> > +The list of hosts that the STONITH device controls > +</longdesc> > +</parameter> > + > +<parameter name="identity_file" unique="1" required="0"> > +<content type="string" /> > +<shortdesc lang="en"> > +Identity file's full path for kdump check user > +</shortdesc> > +<longdesc lang="en"> > +The full path of kdump check user's identity file for ssh command. > +The identity in the specified file have to be restricted to execute > +only the following command. > +"test -s /proc/vmcore" > +Default: kdump check user's default identity file path. > +NOTE: You can specify kdump check user name in /etc/kdump.conf. > + The parameter name is "kdump_check_user". > + Default user is "kdumpchecker". > +</longdesc> > +</parameter> > + > +</parameters> > +SSHXML > + exit 0 > + ;; > +*) > + exit 1 > + ;; > +esac > Kdump check STONITH plugin "kdumpcheck" > 1. Introduction > This plugin's purpose is to avoid STONITH for a node which is doing kdump. > It confirms whether the node is doing kdump or not when STONITH reset or > off operation is executed. > If the target node is doing kdump, this plugin considers that STONITH > succeeded. If not, it considers that STONITH failed. > > NOTE: This plugin has no ability to shutdown or startup a node. > So it has to be used with other STONITH plugin. > Then, when this plugin failed, the next plugin which can kill a node > is executed. > NOTE: This plugin works only on Linux. > > 2. The way to check > When STONITH reset or off is executed, kdumpcheck connects to the target > node, and checks the size of /proc/vmcore. > It judges that the target node is _not_ doing kdump when the size of > /proc/vmcore on the node is zero, or the file doesn't exist. > Then kdumpcheck returns "STONITH failed" to stonithd, and the next plugin > is executed. > > 3. Expanding mkdumprd > This plugin requires non-root user and ssh connection even on 2nd kernel. > So, you need to apply mkdumprd_for_kdumpcheck.patch to /sbin/mkdumprd. > This patch is tested with mkdumprd version 5.0.39. > The patch adds the following functions: > i) Start udevd with specified .rules files. > ii) Bring the specified network interface up. > iii) Start sshd. > iv) Add the specified user to the 2nd kernel. > The user is to check whether the node is doing kdump or not. > v) Execute sync command after dumping. > > NOTE: i) to iv) expandings are only for the case that filesystem > partition > is specified as the location where the vmcore should be dumped. > > 4. Parameters > kdumpcheck's parameters are the following. > hostlist : The list of hosts that the STONITH device controls. > delimiter is "," or " ". > indispensable setting. (default:none) > identity_file: a full-path of the private key file for the user > who checks doing kdump. > (default: $HOME/.ssh/id_rsa, $HOME/.ssh/id_dsa and > $HOME/.ssh/identity) > > NOTE: To execute this plugin first, set the highest priority to this > plugin > in all STONITH resources. > > 5. How to Use > To use this tool, do the following steps at all nodes in the cluster. > 1) Add an user to check doing kdump. > ex.) > # useradd kdumpchecker > # passwd kdumpchecker > 2) Allow passwordless login from the node which will do STONITH to all > target nodes for the user added at step 1). > ex.) > $ cd > $ mkdir .ssh > $ chmod 700 .ssh > $ cd .ssh > $ ssh-keygen (generate authentication keys with empty passphrase) > $ scp id_rsa.pub [EMAIL PROTECTED]:"~/.ssh/." > $ ssh [EMAIL PROTECTED] > $ cd ~/.ssh > $ cat id_rsa.pub >> authorized_keys > $ chmod 600 autorized_keys > $ rm id_rsa.pub > 3) Limit the command that the user can execute. > Describe the following commands in a line at the head of the user's > public key in target node's authorized_keys file. > [command="test -s /proc/vmcore"] > And describe some options (like no-pty, no-port-forwarding and so on) > according to your security policy. > ex.) > $ vi ~/.ssh/authorized_keys > command="test -s > /proc/vmcore",no-port-forwarding,no-X11-forwarding, > no-agent-forwarding,no-pty ssh-rsa AAA..snip..== [EMAIL PROTECTED] > 4) Add settings in /etc/kdump.conf. > network_device : network interface name to check doing kdump. > indispensable setting. (default: none) > kdump_check_user : user name to check doing kdump. > specify non-root user. > (default: "kdumpchecker") > udev_rules : .rules files' names. > specify if you use udev for mapping devices. > specified files have to be in > /etc/udev/rules.d/. > you can specify two or more files. > delimiter is "," or " ". (default: none) > ex.) > # vi /etc/kdump.conf > ext3 /dev/sda1 > network_device eth0 > kdump_check_user kdumpchecker > udev_rules 10-if.rules > 5) Apply the patch to /sbin/mkdumprd. > # cd /sbin > # patch -p 1 < mkdumprd_for_kdumpcheck.patch > 6) Restart kdump service. > # service kdump restart > 7) Describe cib.xml to set STONITH plugin. > (See "2. Parameters" and "6. Appendix") > > 6. Appendix > A sample cib.xml. > <clone id="clnStonith"> > <instance_attributes id="instance_attributes.id238245a"> > <nvpair id="clone0_clone_max" name="clone_max" value="2"/> > <nvpair id="clone0_clone_node_max" name="clone_node_max" value="1"/> > </instance_attributes> > <group id="grpStonith"> > <instance_attributes id="instance_attributes.id2382455"/> > <primitive id="grpStonith-kdumpcheck" class="stonith" > type="external/kd > umpcheck"> > <instance_attributes id="instance_attributes.id238240a"> > <nvpair id="nvpair.id238240b" name="hostlist" > value="node1,node2"/> > <nvpair id="nvpair.id238240c" name="priority" value="1"/> > <nvpair id="nvpair.id2382408b" name="stonith-timeout" value="30s"/> > </instance_attributes> > <operations> > <op id="grpStonith-kdumpcheck-start" name="start" interval="0" > tim > eout="300" on-fail="restart"/> > <op id="grpStonith-kdumpcheck-monitor" name="monitor" > interval="10" > timeout="60" on-fail="restart"/> > <op id="grpStonith-kdumpcheck-stop" name="stop" interval="0" > timeou > t="300" on-fail="block"/> > </operations> > <meta_attributes id="primitive-grpStonith-kdump-check.meta"/> > </primitive> > <primitive id="grpStonith-ssh" class="stonith" type="external/ssh"> > <instance_attributes id="instance_attributes.id2382402a"> > <nvpair id="nvpair.id2382408a" name="hostlist" > value="node1,node2"/ > > > <nvpair id="nvpair.id238066b" name="priority" value="2"/> > <nvpair id="nvpair.id2382408c" name="stonith-timeout" > value="60s"/> > </instance_attributes> > <operations> > <op id="grpStonith-ssh-start" name="start" interval="0" > timeout="30 > 0" on-fail="restart"/> > <op id="grpStonith-ssh-monitor" name="monitor" interval="10" > timeou > t="60" on-fail="restart"/> > <op id="grpStonith-ssh-stop" name="stop" interval="0" > timeout="300" > on-fail="block"/> > </operations> > <meta_attributes id="primitive-grpStonith-ssh.meta"/> > </primitive> > </group> > </clone> > > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
