This looks just like a quorum problem. Can you describe what your network to the quorum server is? Is it only reachable via node xCloud?
Ceri On Fri, Sep 04, 2009 at 01:00:31PM -0700, Janey Le wrote: > After setting up SunCluster on OpenSolaris, and when I reboot the second node > of the cluster, my first node panic. Can you please let me know if there is > anyone that I can contact to know if this is setup issue or it is cluster > bug? > > Below is the setup that I had: > > - 2x1 ( 2 OpenSolaris 2009.06 x86 hosts named xCid and xCloud connected > to one FC array) > - Created 32 volumes and mapped to the host group; under the host groups > are the 2 nodes cluster > - Format the volumes > - Setup cluster with quorum server named Auron (all 2 nodes joined > cluster, all of the resource groups and resources are online on 1st node xCid) > > Below is the status of the cluster before rebooting the nodes. > root at xCid:~# scstat -p > ------------------------------------------------------------------ > > -- Cluster Nodes -- > > Node name Status > --------- ------ > Cluster node: xCid Online > Cluster node: xCloud Online > > ------------------------------------------------------------------ > > -- Cluster Transport Paths -- > > Endpoint Endpoint Status > -------- -------- ------ > Transport path: xCid:e1000g3 xCloud:e1000g3 Path online > Transport path: xCid:e1000g2 xCloud:e1000g2 Path online > > ------------------------------------------------------------------ > > -- Quorum Summary from latest node reconfiguration -- > > Quorum votes possible: 3 > Quorum votes needed: 2 > Quorum votes present: 3 > > > -- Quorum Votes by Node (current status) -- > > Node Name Present Possible Status > --------- ------- -------- ------ > Node votes: xCid 1 1 Online > Node votes: xCloud 1 1 Online > > > -- Quorum Votes by Device (current status) -- > > Device Name Present Possible Status > ----------- ------- -------- ------ > Device votes: Auron 1 1 Online > > ------------------------------------------------------------------ > > -- Device Group Servers -- > > Device Group Primary Secondary > ------------ ------- --------- > > > -- Device Group Status -- > > Device Group Status > ------------ ------ > > > -- Multi-owner Device Groups -- > > Device Group Online Status > ------------ ------------- > > ------------------------------------------------------------------ > > -- Resource Groups and Resources -- > > Group Name Resources > ---------- --------- > Resources: xCloud-rg xCloud-nfsres r-nfs > Resources: nfs-rg nfs-lh-rs nfs-hastp-rs nfs-rs > > > -- Resource Groups -- > > Group Name Node Name State Suspended > ---------- --------- ----- --------- > Group: xCloud-rg xCid Online No > Group: xCloud-rg xCloud Offline No > > Group: nfs-rg xCid Online No > Group: nfs-rg xCloud Offline No > > > -- Resources -- > > Resource Name Node Name State Status > Message > ------------- --------- ----- > -------------- > Resource: xCloud-nfsres xCid Online Online - > LogicalHostname online. > Resource: xCloud-nfsres xCloud Offline Offline > > Resource: r-nfs xCid Online Online - > Service is online. > Resource: r-nfs xCloud Offline Offline > > Resource: nfs-lh-rs xCid Online Online - > LogicalHostname online. > Resource: nfs-lh-rs xCloud Offline Offline > > Resource: nfs-hastp-rs xCid Online Online > Resource: nfs-hastp-rs xCloud Offline Offline > > Resource: nfs-rs xCid Online Online - > Service is online. > Resource: nfs-rs xCloud Offline Offline > > ------------------------------------------------------------------ > > -- IPMP Groups -- > > Node Name Group Status Adapter Status > --------- ----- ------ ------- ------ > IPMP Group: xCid sc_ipmp0 Online e1000g1 Online > > IPMP Group: xCloud sc_ipmp0 Online e1000g0 Online > > > -- IPMP Groups in Zones -- > > Zone Name Group Status Adapter Status > --------- ----- ------ ------- ------ > ------------------------------------------------------------------ > root at xCid:~# > > > root at xCid:~# clnode show > > === Cluster Nodes === > > Node Name: xCid > Node ID: 1 > Enabled: yes > privatehostname: clusternode1-priv > reboot_on_path_failure: disabled > globalzoneshares: 1 > defaultpsetmin: 1 > quorum_vote: 1 > quorum_defaultvote: 1 > quorum_resv_key: 0x4A9B35C600000001 > Transport Adapter List: e1000g2, e1000g3 > > Node Name: xCloud > Node ID: 2 > Enabled: yes > privatehostname: clusternode2-priv > reboot_on_path_failure: disabled > globalzoneshares: 1 > defaultpsetmin: 1 > quorum_vote: 1 > quorum_defaultvote: 1 > quorum_resv_key: 0x4A9B35C600000002 > Transport Adapter List: e1000g2, e1000g3 > > root at xCid:~# > > > ****** Reboot 1st node xCid, all of the resources transfer to 2nd node > xCloud and online on node xCloud ************ > > root at xCloud:~# scstat -p > ------------------------------------------------------------------ > > -- Cluster Nodes -- > > Node name Status > --------- ------ > Cluster node: xCid Online > Cluster node: xCloud Online > > ------------------------------------------------------------------ > > -- Cluster Transport Paths -- > > Endpoint Endpoint Status > -------- -------- ------ > Transport path: xCid:e1000g3 xCloud:e1000g3 Path online > Transport path: xCid:e1000g2 xCloud:e1000g2 Path online > > ------------------------------------------------------------------ > > -- Quorum Summary from latest node reconfiguration -- > > Quorum votes possible: 3 > Quorum votes needed: 2 > Quorum votes present: 3 > > > -- Quorum Votes by Node (current status) -- > > Node Name Present Possible Status > --------- ------- -------- ------ > Node votes: xCid 1 1 Online > Node votes: xCloud 1 1 Online > > > -- Quorum Votes by Device (current status) -- > > Device Name Present Possible Status > ----------- ------- -------- ------ > Device votes: Auron 1 1 Online > > ------------------------------------------------------------------ > > -- Device Group Servers -- > > Device Group Primary Secondary > ------------ ------- --------- > > > -- Device Group Status -- > > Device Group Status > ------------ ------ > > > -- Multi-owner Device Groups -- > > Device Group Online Status > ------------ ------------- > > ------------------------------------------------------------------ > > -- Resource Groups and Resources -- > > Group Name Resources > ---------- --------- > Resources: xCloud-rg xCloud-nfsres r-nfs > Resources: nfs-rg nfs-lh-rs nfs-hastp-rs nfs-rs > > > -- Resource Groups -- > > Group Name Node Name State Suspended > ---------- --------- ----- --------- > Group: xCloud-rg xCid Offline No > Group: xCloud-rg xCloud Online No > > Group: nfs-rg xCid Offline No > Group: nfs-rg xCloud Online No > > > -- Resources -- > > Resource Name Node Name State Status > Message > ------------- --------- ----- > -------------- > Resource: xCloud-nfsres xCid Offline Offline > Resource: xCloud-nfsres xCloud Online Online - > LogicalHostname online. > > Resource: r-nfs xCid Offline Offline > Resource: r-nfs xCloud Online Online - > Service is online. > > Resource: nfs-lh-rs xCid Offline Offline > Resource: nfs-lh-rs xCloud Online Online - > LogicalHostname online. > > Resource: nfs-hastp-rs xCid Offline Offline > Resource: nfs-hastp-rs xCloud Online Online > > Resource: nfs-rs xCid Offline Offline > Resource: nfs-rs xCloud Online Online - > Service is online. > > ------------------------------------------------------------------ > > -- IPMP Groups -- > > Node Name Group Status Adapter Status > --------- ----- ------ ------- ------ > IPMP Group: xCid sc_ipmp0 Online e1000g1 Online > > IPMP Group: xCloud sc_ipmp0 Online e1000g0 Online > > > -- IPMP Groups in Zones -- > > Zone Name Group Status Adapter Status > --------- ----- ------ ------- ------ > ------------------------------------------------------------------ > root at xCloud:~# > > > ***********Wait for about 5 minutes, then reboot 2nd node xCloud and node > xCid panic with the error below ********************* > > root at xCid:~# Notifying cluster that this node is panicking > WARNING: CMM: Reading reservation keys from quorum device Auron failed with > error 2. > > panic[cpu0]/thread=ffffff02d0a623c0: CMM: Cluster lost operational quorum; > aborting. > > ffffff0011976b50 genunix:vcmn_err+2c () > ffffff0011976b60 > cl_runtime:__1cZsc_syslog_msg_log_no_args6FpviipkcpnR__va_list_element__nZsc_syslog_msg_status_enum__+1f > () > ffffff0011976c40 > cl_runtime:__1cCosNsc_syslog_msgDlog6MiipkcE_nZsc_syslog_msg_status_enum__+8c > () > ffffff0011976e30 > cl_haci:__1cOautomaton_implbAstate_machine_qcheck_state6M_nVcmm_automaton_event_t__+57f > () > ffffff0011976e70 cl_haci:__1cIcmm_implStransitions_thread6M_v_+b7 () > ffffff0011976e80 cl_haci:__1cIcmm_implYtransitions_thread_start6Fpv_v_+9 () > ffffff0011976ed0 cl_orb:cllwpwrapper+d7 () > ffffff0011976ee0 unix:thread_start+8 () > > syncing file systems... done > dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel > 51% done[2mMIdoOe > > the host log is attched. > > I have gone thru the SunCluster doc on how to setup SunCluster for > OpenSolaris multiple times, but I don???t see any steps that I miss. Can you > please help to see if this is setup issue or it is a bug? > > Thanks, > > Janey > -- > This message posted from opensolaris.org > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss -- That must be wonderful! I don't understand it at all. -- Moliere