Hi Sambit,
I looked into the quorum server using command "clquorumserver show +" , it 
showed me the old services that quorum server did server before this setup, so 
I reinstall my  host and setup the quorum server again.  From the document that 
I got from 
http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/OHACdocs/ , 
and from the restriction, Veritas Volume Manager is not supported in 
OpenSolaris HA Cluster.  So, I wonder if we need to use metaset to setup 
diskset to manage the disk or what I have from the doc is enough?

Thanks,

Janey
-----Original Message-----
From: Sambit.Nayak at Sun.COM [mailto:sambit.na...@sun.com]
Sent: Wednesday, September 09, 2009 1:53 AM
To: Le, Janey
Cc: ha-clusters-discuss at opensolaris.org
Subject: Re: [ha-clusters-discuss] Host panic - OpenSolaris SunCluster

Hi Janey,

The error message :
 > WARNING: CMM: Reading reservation keys from quorum device Auron
failed with error 2.
means that the node xCid failed to read the quorum keys from the quorum
server host Auron.

It is possible that the node xCid could not contact the quorum server
host Auron at that specific time, when the reconfiguration was in
progress due to reboot of xCloud.
Also please look in the syslog messages on xCid, for any failure
messages related to the quorum server.

Is the problem happening everytime you reboot xCloud, after both nodes
are successfully online in the cluster?


Things that can be done to debug
--------------------------------------------
More information about this failure will be available in the cluster
kernel trace buffers.

If you can obtain the kernel dumps of the cluster nodes at the time of
this problem, then we can look into them to debug the problem more.
If you are not able to provide the dumps, then please do :
 > *cmm_dbg_buf/s
 > *(cmm_dbg_buf+8)+1/s
at the kmdb prompt resulting from the panic (or on the saved crash
dump), and provide that output.

Please also save the /var/adm/messages of both nodes.

Each quorum server daemon (on a quorum server host) has an associated
directory where it stores some files.
By default, /var/scqsd is the directory used by a quorum server daemon.
If you have changed the default directory while configuring the quorum
server, then please look in it instead.
There will be files named ".scqsd_dbg_buf*" in such a directory.
Please provide those files as well; they will tell us what's happening
on the quorum server host Auron.

If you execute "clquorum status" command on a cluster node, then it will
tell if the local node can access the quorum server at the time of this
command execution or not. If access is possible and the node's keys are
present, the quorum server is marked online; else it is marked offline.
So if you execute this command on both cluster nodes before doing the
experiment of rebooting xCloud, then that will tell whether any node was
having problems accessing the quorum server.
Please run that command on both nodes, and capture the output, before
rebooting xCloud.

Similarly, the "clquorumserver show +" on the quorum server host will
tell what cluster it is serving, and what keys are present on the quorum
server, which cluster node is the owner of the quorum server, etc.
Please capture its output before rebooting xCloud, and after xCid panics
as a result of rebooting xCloud.

************

Just as a confirmation, the cluster is running Open HA Cluster 2009.06,
and you are using the quorum server packages available with Open HA
Cluster 2009.06, right?

Thanks & Regards,
Sambit


Janey Le wrote:
> After setting up SunCluster on OpenSolaris, and when I reboot the second node 
> of the cluster, my first node panic.  Can you please let me know if there is 
> anyone that I can contact to know if this is setup issue or it is cluster bug?
>
> Below is the setup that I had:
>
> -     2x1 ( 2 OpenSolaris 2009.06 x86 hosts named xCid and xCloud connected 
> to one FC array)
> -     Created 32 volumes and mapped to the host group; under the host groups 
> are the 2 nodes cluster
> -     Format the volumes
> -     Setup cluster with quorum server named Auron (all 2 nodes joined 
> cluster, all of the resource groups and resources are online on 1st node xCid)
>
> Below is the status of the cluster before rebooting the nodes.
> root at xCid:~# scstat -p
> ------------------------------------------------------------------
>
> -- Cluster Nodes --
>
>                     Node name           Status
>                     ---------           ------
>   Cluster node:     xCid                Online
>   Cluster node:     xCloud              Online
>
> ------------------------------------------------------------------
>
> -- Cluster Transport Paths --
>
>                     Endpoint               Endpoint               Status
>                     --------               --------               ------
>   Transport path:   xCid:e1000g3           xCloud:e1000g3         Path online
>   Transport path:   xCid:e1000g2           xCloud:e1000g2         Path online
>
> ------------------------------------------------------------------
>
> -- Quorum Summary from latest node reconfiguration --
>
>   Quorum votes possible:      3
>   Quorum votes needed:        2
>   Quorum votes present:       3
>
>
> -- Quorum Votes by Node (current status) --
>
>                     Node Name           Present Possible Status
>                     ---------           ------- -------- ------
>   Node votes:       xCid                1        1       Online
>   Node votes:       xCloud              1        1       Online
>
>
> -- Quorum Votes by Device (current status) --
>
>                     Device Name         Present Possible Status
>                     -----------         ------- -------- ------
>   Device votes:     Auron               1        1       Online
>
> ------------------------------------------------------------------
>
> -- Device Group Servers --
>
>                          Device Group        Primary             Secondary
>                          ------------        -------             ---------
>
>
> -- Device Group Status --
>
>                               Device Group        Status
>                               ------------        ------
>
>
> -- Multi-owner Device Groups --
>
>                               Device Group        Online Status
>                               ------------        -------------
>
> ------------------------------------------------------------------
>
> -- Resource Groups and Resources --
>
>             Group Name     Resources
>             ----------     ---------
>  Resources: xCloud-rg      xCloud-nfsres r-nfs
>  Resources: nfs-rg         nfs-lh-rs nfs-hastp-rs nfs-rs
>
>
> -- Resource Groups --
>
>             Group Name     Node Name                State          Suspended
>             ----------     ---------                -----          ---------
>      Group: xCloud-rg      xCid                     Online         No
>      Group: xCloud-rg      xCloud                   Offline        No
>
>      Group: nfs-rg         xCid                     Online         No
>      Group: nfs-rg         xCloud                   Offline        No
>
>
> -- Resources --
>
>             Resource Name  Node Name                State          Status 
> Message
>             -------------  ---------                -----          
> --------------
>   Resource: xCloud-nfsres  xCid                     Online         Online - 
> LogicalHostname online.
>   Resource: xCloud-nfsres  xCloud                   Offline        Offline
>
>   Resource: r-nfs          xCid                     Online         Online - 
> Service is online.
>   Resource: r-nfs          xCloud                   Offline        Offline
>
>   Resource: nfs-lh-rs      xCid                     Online         Online - 
> LogicalHostname online.
>   Resource: nfs-lh-rs      xCloud                   Offline        Offline
>
>   Resource: nfs-hastp-rs   xCid                     Online         Online
>   Resource: nfs-hastp-rs   xCloud                   Offline        Offline
>
>   Resource: nfs-rs         xCid                     Online         Online - 
> Service is online.
>   Resource: nfs-rs         xCloud                   Offline        Offline
>
> ------------------------------------------------------------------
>
> -- IPMP Groups --
>
>               Node Name           Group   Status         Adapter   Status
>               ---------           -----   ------         -------   ------
>   IPMP Group: xCid                sc_ipmp0 Online         e1000g1   Online
>
>   IPMP Group: xCloud              sc_ipmp0 Online         e1000g0   Online
>
>
> -- IPMP Groups in Zones --
>
>               Zone Name           Group   Status         Adapter   Status
>               ---------           -----   ------         -------   ------
> ------------------------------------------------------------------
> root at xCid:~#
>
>
> root at xCid:~# clnode show
>
> === Cluster Nodes ===
>
> Node Name:                                      xCid
>   Node ID:                                         1
>   Enabled:                                         yes
>   privatehostname:                                 clusternode1-priv
>   reboot_on_path_failure:                          disabled
>   globalzoneshares:                                1
>   defaultpsetmin:                                  1
>   quorum_vote:                                     1
>   quorum_defaultvote:                              1
>   quorum_resv_key:                                 0x4A9B35C600000001
>   Transport Adapter List:                          e1000g2, e1000g3
>
> Node Name:                                      xCloud
>   Node ID:                                         2
>   Enabled:                                         yes
>   privatehostname:                                 clusternode2-priv
>   reboot_on_path_failure:                          disabled
>   globalzoneshares:                                1
>   defaultpsetmin:                                  1
>   quorum_vote:                                     1
>   quorum_defaultvote:                              1
>   quorum_resv_key:                                 0x4A9B35C600000002
>   Transport Adapter List:                          e1000g2, e1000g3
>
> root at xCid:~#
>
>
> ******  Reboot 1st node xCid, all of the resources transfer to 2nd node 
> xCloud  and online on node xCloud  ************
>
> root at xCloud:~# scstat -p
> ------------------------------------------------------------------
>
> -- Cluster Nodes --
>
>                     Node name           Status
>                     ---------           ------
>   Cluster node:     xCid                Online
>   Cluster node:     xCloud              Online
>
> ------------------------------------------------------------------
>
> -- Cluster Transport Paths --
>
>                     Endpoint               Endpoint               Status
>                     --------               --------               ------
>   Transport path:   xCid:e1000g3           xCloud:e1000g3         Path online
>   Transport path:   xCid:e1000g2           xCloud:e1000g2         Path online
>
> ------------------------------------------------------------------
>
> -- Quorum Summary from latest node reconfiguration --
>
>   Quorum votes possible:      3
>   Quorum votes needed:        2
>   Quorum votes present:       3
>
>
> -- Quorum Votes by Node (current status) --
>
>                     Node Name           Present Possible Status
>                     ---------           ------- -------- ------
>   Node votes:       xCid                1        1       Online
>   Node votes:       xCloud              1        1       Online
>
>
> -- Quorum Votes by Device (current status) --
>
>                     Device Name         Present Possible Status
>                     -----------         ------- -------- ------
>   Device votes:     Auron               1        1       Online
>
> ------------------------------------------------------------------
>
> -- Device Group Servers --
>
>                          Device Group        Primary             Secondary
>                          ------------        -------             ---------
>
>
> -- Device Group Status --
>
>                               Device Group        Status
>                               ------------        ------
>
>
> -- Multi-owner Device Groups --
>
>                               Device Group        Online Status
>                               ------------        -------------
>
> ------------------------------------------------------------------
>
> -- Resource Groups and Resources --
>
>             Group Name     Resources
>             ----------     ---------
>  Resources: xCloud-rg      xCloud-nfsres r-nfs
>  Resources: nfs-rg         nfs-lh-rs nfs-hastp-rs nfs-rs
>
>
> -- Resource Groups --
>
>             Group Name     Node Name                State          Suspended
>             ----------     ---------                -----          ---------
>      Group: xCloud-rg      xCid                     Offline        No
>      Group: xCloud-rg      xCloud                   Online         No
>
>      Group: nfs-rg         xCid                     Offline        No
>      Group: nfs-rg         xCloud                   Online         No
>
>
> -- Resources --
>
>             Resource Name  Node Name                State          Status 
> Message
>             -------------  ---------                -----          
> --------------
>   Resource: xCloud-nfsres  xCid                     Offline        Offline
>   Resource: xCloud-nfsres  xCloud                   Online         Online - 
> LogicalHostname online.
>
>   Resource: r-nfs          xCid                     Offline        Offline
>   Resource: r-nfs          xCloud                   Online         Online - 
> Service is online.
>
>   Resource: nfs-lh-rs      xCid                     Offline        Offline
>   Resource: nfs-lh-rs      xCloud                   Online         Online - 
> LogicalHostname online.
>
>   Resource: nfs-hastp-rs   xCid                     Offline        Offline
>   Resource: nfs-hastp-rs   xCloud                   Online         Online
>
>   Resource: nfs-rs         xCid                     Offline        Offline
>   Resource: nfs-rs         xCloud                   Online         Online - 
> Service is online.
>
> ------------------------------------------------------------------
>
> -- IPMP Groups --
>
>               Node Name           Group   Status         Adapter   Status
>               ---------           -----   ------         -------   ------
>   IPMP Group: xCid                sc_ipmp0 Online         e1000g1   Online
>
>   IPMP Group: xCloud              sc_ipmp0 Online         e1000g0   Online
>
>
> -- IPMP Groups in Zones --
>
>               Zone Name           Group   Status         Adapter   Status
>               ---------           -----   ------         -------   ------
> ------------------------------------------------------------------
> root at xCloud:~#
>
>
> ***********Wait for about 5 minutes, then reboot 2nd node xCloud and  node 
> xCid panic with the error below *********************
>
> root at xCid:~# Notifying cluster that this node is panicking
> WARNING: CMM: Reading reservation keys from quorum device Auron failed with 
> error 2.
>
> panic[cpu0]/thread=ffffff02d0a623c0: CMM: Cluster lost operational quorum; 
> aborting.
>
> ffffff0011976b50 genunix:vcmn_err+2c ()
> ffffff0011976b60 
> cl_runtime:__1cZsc_syslog_msg_log_no_args6FpviipkcpnR__va_list_element__nZsc_syslog_msg_status_enum__+1f
>  ()
> ffffff0011976c40 
> cl_runtime:__1cCosNsc_syslog_msgDlog6MiipkcE_nZsc_syslog_msg_status_enum__+8c 
> ()
> ffffff0011976e30 
> cl_haci:__1cOautomaton_implbAstate_machine_qcheck_state6M_nVcmm_automaton_event_t__+57f
>  ()
> ffffff0011976e70 cl_haci:__1cIcmm_implStransitions_thread6M_v_+b7 ()
> ffffff0011976e80 cl_haci:__1cIcmm_implYtransitions_thread_start6Fpv_v_+9 ()
> ffffff0011976ed0 cl_orb:cllwpwrapper+d7 ()
> ffffff0011976ee0 unix:thread_start+8 ()
>
> syncing file systems... done
> dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
>  51% done[2mMIdoOe
>
> the host log is attched.
>
> I have gone thru the SunCluster doc  on how to setup SunCluster for 
> OpenSolaris multiple times, but I don't see any steps that I miss.  Can you 
> please help to see if this is setup issue or it is a bug?
>
> Thanks,
>
> Janey
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> ha-clusters-discuss mailing list
> ha-clusters-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss

Reply via email to