hi, I've created a short script for our megaraid controller:
=================== #!/bin/bash # Flush and load controller configuration PATH="/usr/sbin:/usr/bin:/usr/sfw/bin/:/opt/SUNWcluster/bin/:/usr/cluster/bin:/opt/MegaRAID/CLI:/opt/csw/bin:/usr/sfw/bin/:/opt/SUNWcluster/bin/:/usr/cluster/bin:/opt/csw/sbin/:/opt/csw/bin/" MEGA="/opt/MegaRAID/CLI/MegaCli" CONFIG="/etc/megaraid/cfg/megaraid-config.conf" CNTRL="-a0" case "$1" in import) # restore config and disks are shown in format $MEGA -CfgForeign -Scan $CNTRL $MEGA -CfgForeign -Clear $CNTRL $MEGA -CfgClr $CNTRL sleep 3 $MEGA -CfgRestore -f $CONFIG $CNTRL # wartezeit sleep 10 ;; clear) # flush config and all disks are gone in format $MEGA -CfgClr $CNTRL sleep 2 ;; *) echo "Usage: $0 import|clear" ;; esac exit 0 ================== Create the RG: # clrg create -n iscsihead-m,iscsihead-s megaraid-switch-rg Create the RS: # clrs create -g megaraid-switch-rg -t SUNW.gds -p \ Start_command="/root/bin/megaraid-config import" -p \ Stop_command="/root/bin/megaraid-config clear" -p \ Probe_command=/bin/true -p Network_aware=false -p Log_level=ERR \ megaraid-switch-rs # clrg online -M megaraid-switch-rg Adding the disks (48) takes up to ~ 30 seconds, but with unknown reason, the cluster wants to stop and restarting: iscsihead-m -> master node iscsihead-s -> failover node =========================================0 [...] Jan 7 13:33:03 iscsihead-s Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <gds_svc_start> completed successfully for resource <megaraid-switch-rs>, resource group <megaraid-switch-rg>, node <iscsihead-s>, time used: 3% of timeout <300 seconds> Jan 7 13:33:03 iscsihead-s Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource megaraid-switch-rs state on node iscsihead-s change to R_ONLINE_UNMON Jan 7 13:33:03 iscsihead-s Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource megaraid-switch-rs status on node iscsihead-s change to R_FM_ONLINE Jan 7 13:33:03 iscsihead-s Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource megaraid-switch-rs status msg on node iscsihead-s change to <> Jan 7 13:33:03 iscsihead-s Cluster.RGM.global.rgmd: [ID 224900 daemon.notice] launching method <gds_monitor_start> for resource <megaraid-switch-rs>, resource group <megaraid-switch-rg>, node <iscsihead-s>, timeout <300> seconds Jan 7 13:33:03 iscsihead-s Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method <gds_monitor_start> completed successfully for resource <megaraid-switch-rs>, resource group <megaraid-switch-rg>, node <iscsihead-s>, time used: 0% of timeout <300 seconds> Jan 7 13:33:03 iscsihead-s Cluster.RGM.global.rgmd: [ID 443746 daemon.notice] resource megaraid-switch-rs state on node iscsihead-s change to R_ONLINE Jan 7 13:33:03 iscsihead-s Cluster.RGM.global.rgmd: [ID 529407 daemon.notice] resource group megaraid-switch-rg state on node iscsihead-s change to RG_ONLINE Jan 7 13:33:07 iscsihead-s Cluster.PMF.pmfd: [ID 887656 daemon.notice] Process: tag="megaraid-switch-rg,megaraid-switch-rs,0.svc", cmd="/bin/ksh -c /root/bin/megaraid-config import", Failed to stay up. Jan 7 13:33:07 iscsihead-s Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource megaraid-switch-rs status on node iscsihead-s change to R_FM_FAULTED Jan 7 13:33:07 iscsihead-s Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource megaraid-switch-rs status msg on node iscsihead-s change to <Service daemon not running.> Jan 7 13:33:07 iscsihead-s SC[,SUNW.gds:6,megaraid-switch-rg,megaraid-switch-rs,gds_probe]: [ID 423137 daemon.error] A resource restart attempt on resource megaraid-switch-rs in resource group megaraid-switch-rg has been blocked because the number of restarts within the past Retry_interval (370 seconds) would exceed Retry_count (2) Jan 7 13:33:07 iscsihead-s SC[,SUNW.gds:6,megaraid-switch-rg,megaraid-switch-rs,gds_probe]: [ID 874133 daemon.notice] Issuing a failover request because the application exited. Jan 7 13:33:07 iscsihead-s Cluster.RGM.global.rgmd: [ID 494478 daemon.notice] resource megaraid-switch-rs in resource group megaraid-switch-rg has requested failover of the resource group on iscsihead-s. Jan 7 13:33:07 iscsihead-s Cluster.RGM.global.rgmd: [ID 423291 daemon.error] RGM isn't failing resource group <megaraid-switch-rg> off of node <iscsihead-s>, because there are no other current or potential masters Jan 7 13:33:07 iscsihead-s Cluster.RGM.global.rgmd: [ID 702911 daemon.error] Resource <megaraid-switch-rs> of Resource Group <megaraid-switch-rg> failed pingpong check on node <iscsihead-m>. The resource group will not be mastered by that node. Jan 7 13:33:07 iscsihead-s SC[,SUNW.gds:6,megaraid-switch-rg,megaraid-switch-rs,gds_probe]: [ID 969827 daemon.error] Failover attempt has failed. Jan 7 13:33:07 iscsihead-s SC[,SUNW.gds:6,megaraid-switch-rg,megaraid-switch-rs,gds_probe]: [ID 670283 daemon.notice] Issuing a resource restart request because the application exited. Jan 7 13:33:07 iscsihead-s Cluster.RGM.global.rgmd: [ID 494478 daemon.notice] resource megaraid-switch-rs in resource group megaraid-switch-rg has requested restart of the resource on iscsihead-s. Jan 7 13:33:07 iscsihead-s Cluster.RGM.global.rgmd: [ID 471587 daemon.notice] Resource <megaraid-switch-rs> is restarting too often on <iscsihead-s>. Sleeping for <15> seconds. ==================================== I also thought, maybe reading $1 for my case script is the problem, so I split the script in two files, so that I doesn't need $1, but it's the same: First the script starts, some disks are added and while the megaraid cli is running (and adding more disks), the cluster calls the stop command. ... method <gds_svc_start> for resource <megaraid-switch-rs>, resource group <megaraid-switch-rg>, node <iscsihead-s>, timeout <300> seconds Jan 7 14:33:32 iscsihead-s Cluster.RGM.global.rgmd: [ID 784560 daemon.notice] resource megaraid-switch-rs status on node iscsihead-s change to R_FM_UNKNOWN Jan 7 14:33:32 iscsihead-s Cluster.RGM.global.rgmd: [ID 922363 daemon.notice] resource megaraid-switch-rs status msg on node iscsihead-s change to <Starting> execute the script: # clrg offline megaraid-switch-rg root(iscsihead-s):~# /bin/ksh -c /root/bin/megaraid-import ~ 20 seconds later ... root(iscsihead-s):~# echo $? 0 where is my error? cu denny
signature.asc
Description: This is a digitally signed message part
_______________________________________________ ha-clusters-discuss mailing list ha-clusters-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss