[ha-clusters-discuss] SVM setting up in OpenSolaris 2009.06 HA Cluster

Le, Janey Fri, 18 Sep 2009 09:31:20 -0600

Hi

I got the OpenSolaris 2009.06 HA cluster installed on both hosts and add quorum 
server to the cluster. Both of the hosts are online and now I am trying to 
setting up SVM.  I created 2 diskset, with 15 drives for each diskset as below:


root at xCid:# metaset

Set name = xCid_diskSet, Set number = 1

Host                Owner
  xCid               Yes
  xCloud

Drive Dbase
d1    Yes
d2    Yes
d3    Yes
d4    Yes
d5    Yes
d6    Yes
d7    Yes
d8    Yes
d9    Yes
d10   Yes
d11   Yes
d12   Yes
d13   Yes
d14   Yes
d15   Yes
d16   Yes

Set name = xCloud_diskSet, Set number = 2

Host                Owner
  xCid               Yes
  xCloud

Drive Dbase
d17   Yes
d18   Yes
d19   Yes
d20   Yes
d21   Yes
d22   Yes
d23   Yes
d24   Yes
d25   Yes
d26   Yes
d27   Yes
d28   Yes
d29   Yes
d30   Yes
d31   Yes
root at xCid:# metaset | grep Set
Set name = xCid_diskSet, Set number = 1
Set name = xCloud_diskSet, Set number = 2
root at xCid:#


And for some reason, I am not able to check the metadevices in md.tab file by 
running command "metainit" and the error that I get is "device not in set"

root at xCid:# metainit -n -a -s xCid_diskSet
metainit: xCid: /etc/lvm/md.tab line 81: d1s0: device not in set

metainit: xCid: /etc/lvm/md.tab line 82: d2s0: device not in set

metainit: xCid: /etc/lvm/md.tab line 83: d3s0: device not in set

metainit: xCid: /etc/lvm/md.tab line 84: d4s0: device not in set

metainit: xCid: /etc/lvm/md.tab line 85: d5s0: device not in set

metainit: xCid: /etc/lvm/md.tab line 86: d6s0: device not in set

metainit: xCid: /etc/lvm/md.tab line 87: d7s0: device not in set


I attached you the md.tab file with this email.  Can you please help me to 
check if the format of the entries in md.tab file are correct and why I got the 
error message above?

Thanks a lot for your time.

Janey




-----Original Message-----
From: Sambit.Nayak at Sun.COM [mailto:sambit.na...@sun.com]
Sent: Wednesday, September 16, 2009 1:16 AM
To: Le, Janey
Subject: Re: [ha-clusters-discuss] Host panic - OpenSolaris SunCluster

Hi Janey,

Yes, this looks correct.
It means xCid is the owner of the quorum server.

The idea is : one of the cluster member nodes becomes the owner of the quorum 
device (has the reservation), and all cluster member nodes have their 
registration keys present on the quorum device.

So this looks right.

***********
One more thing :
Quorum server/device can intermittently stop responding - may be due to path or 
network failures.
Hence "scstat -q" and "clquorum status" are enhanced to show the immediate 
status of the quorum device/server.
So if a node is having problems accessing the quorum device/server, executing 
one of the above commands on that node will report you that.
If a node is unable to access the quorum device/server, the command on that 
node will say the device/server is offline.
The command output on xCid, as shown below, looks great - xCid can access the 
quorum server quite fine.
You can run the same command on the other node (xCloud) to see whether it is 
able to access the quorum server fine.


Thanks & Regards,
Sambit


Le, Janey wrote:
> Hi Sambit,
>
> I am in the process of setting up the cluster again, and after
> installed the cluster on both of the nodes, I added quorum server to
> the cluster.  Below is what I see on the quorum server after adding
> the quorum server
>
> root at Auron:~# clquorumserver show +
> === Quorum Server on port 9000 ===
>
> Disabled                        False
>
>
>   ---  Cluster CC_Cluster (id 0x4AAEB956) Reservation ---
>
>   Node ID:                      1
>     Reservation key:            0x4aaeb95600000001
>
>   ---  Cluster CC_Cluster (id 0x4AAEB956) Registrations ---
>
>   Node ID:                      1
>     Registration key:           0x4aaeb95600000001
>
>   Node ID:                      2
>     Registration key:           0x4aaeb95600000002
>
> root at Auron:~#
>
>
> ***There is only one Reservation key, is it correct?  Should we have 
> Reservation key for Node ID: 2 too?
>
> From the 1st cluster node status of the cluster
>
> root at xCid:~# scstat -q
>
> -- Quorum Summary from latest node reconfiguration --
>
>   Quorum votes possible:      3
>   Quorum votes needed:        2
>   Quorum votes present:       3
>
>
> -- Quorum Votes by Node (current status) --
>
>                     Node Name           Present Possible Status
>                     ---------           ------- -------- ------
>   Node votes:       xCid                1        1       Online
>   Node votes:       xCloud              1        1       Online
>
>
> -- Quorum Votes by Device (current status) --
>
>                     Device Name         Present Possible Status
>                     -----------         ------- -------- ------
>   Device votes:     Auron               1        1       Online
>
> root at xCid:~#
>
> root at xCid:~# clnode show
>
> === Cluster Nodes ===
>
> Node Name:                                      xCid
>   Node ID:                                         1
>   Enabled:                                         yes
>   privatehostname:                                 clusternode1-priv
>   reboot_on_path_failure:                          disabled
>   globalzoneshares:                                1
>   defaultpsetmin:                                  1
>   quorum_vote:                                     1
>   quorum_defaultvote:                              1
>   quorum_resv_key:                                 0x4AAEB95600000001
>   Transport Adapter List:                          e1000g2, e1000g3
>
> Node Name:                                      xCloud
>   Node ID:                                         2
>   Enabled:                                         yes
>   privatehostname:                                 clusternode2-priv
>   reboot_on_path_failure:                          disabled
>   globalzoneshares:                                1
>   defaultpsetmin:                                  1
>   quorum_vote:                                     1
>   quorum_defaultvote:                              1
>   quorum_resv_key:                                 0x4AAEB95600000002
>   Transport Adapter List:                          e1000g2, e1000g3
>
> root at xCid:~#
>
>
>
> Thanks a lot for your time.
>
> Janey
>
> -----Original Message-----
> From: Sambit.Nayak at Sun.COM [mailto:Sambit.Nayak at Sun.COM]
> Sent: Wednesday, September 09, 2009 1:53 AM
> To: Le, Janey
> Cc: ha-clusters-discuss at opensolaris.org
> Subject: Re: [ha-clusters-discuss] Host panic - OpenSolaris SunCluster
>
> Hi Janey,
>
> The error message :
>  > WARNING: CMM: Reading reservation keys from quorum device Auron
> failed with error 2.
> means that the node xCid failed to read the quorum keys from the
> quorum server host Auron.
>
> It is possible that the node xCid could not contact the quorum server
> host Auron at that specific time, when the reconfiguration was in
> progress due to reboot of xCloud.
> Also please look in the syslog messages on xCid, for any failure
> messages related to the quorum server.
>
> Is the problem happening everytime you reboot xCloud, after both nodes
> are successfully online in the cluster?
>
>
> Things that can be done to debug
> --------------------------------------------
> More information about this failure will be available in the cluster
> kernel trace buffers.
>
> If you can obtain the kernel dumps of the cluster nodes at the time of
> this problem, then we can look into them to debug the problem more.
> If you are not able to provide the dumps, then please do :
>  > *cmm_dbg_buf/s
>  > *(cmm_dbg_buf+8)+1/s
> at the kmdb prompt resulting from the panic (or on the saved crash
> dump), and provide that output.
>
> Please also save the /var/adm/messages of both nodes.
>
> Each quorum server daemon (on a quorum server host) has an associated
> directory where it stores some files.
> By default, /var/scqsd is the directory used by a quorum server daemon.
> If you have changed the default directory while configuring the quorum
> server, then please look in it instead.
> There will be files named ".scqsd_dbg_buf*" in such a directory.
> Please provide those files as well; they will tell us what's happening
> on the quorum server host Auron.
>
> If you execute "clquorum status" command on a cluster node, then it
> will tell if the local node can access the quorum server at the time
> of this command execution or not. If access is possible and the node's
> keys are present, the quorum server is marked online; else it is marked 
> offline.
> So if you execute this command on both cluster nodes before doing the
> experiment of rebooting xCloud, then that will tell whether any node
> was having problems accessing the quorum server.
> Please run that command on both nodes, and capture the output, before
> rebooting xCloud.
>
> Similarly, the "clquorumserver show +" on the quorum server host will
> tell what cluster it is serving, and what keys are present on the
> quorum server, which cluster node is the owner of the quorum server, etc.
> Please capture its output before rebooting xCloud, and after xCid
> panics as a result of rebooting xCloud.
>
> ************
>
> Just as a confirmation, the cluster is running Open HA Cluster
> 2009.06, and you are using the quorum server packages available with
> Open HA Cluster 2009.06, right?
>
> Thanks & Regards,
> Sambit
>
>
> Janey Le wrote:
>
>> After setting up SunCluster on OpenSolaris, and when I reboot the second 
>> node of the cluster, my first node panic.  Can you please let me know if 
>> there is anyone that I can contact to know if this is setup issue or it is 
>> cluster bug?
>>
>> Below is the setup that I had:
>>
>> -     2x1 ( 2 OpenSolaris 2009.06 x86 hosts named xCid and xCloud connected 
>> to one FC array)
>> -     Created 32 volumes and mapped to the host group; under the host groups 
>> are the 2 nodes cluster
>> -     Format the volumes
>> -     Setup cluster with quorum server named Auron (all 2 nodes joined 
>> cluster, all of the resource groups and resources are online on 1st node 
>> xCid)
>>
>> Below is the status of the cluster before rebooting the nodes.
>> root at xCid:~# scstat -p
>> ------------------------------------------------------------------
>>
>> -- Cluster Nodes --
>>
>>                     Node name           Status
>>                     ---------           ------
>>   Cluster node:     xCid                Online
>>   Cluster node:     xCloud              Online
>>
>> ------------------------------------------------------------------
>>
>> -- Cluster Transport Paths --
>>
>>                     Endpoint               Endpoint               Status
>>                     --------               --------               ------
>>   Transport path:   xCid:e1000g3           xCloud:e1000g3         Path online
>>   Transport path:   xCid:e1000g2           xCloud:e1000g2         Path online
>>
>> ------------------------------------------------------------------
>>
>> -- Quorum Summary from latest node reconfiguration --
>>
>>   Quorum votes possible:      3
>>   Quorum votes needed:        2
>>   Quorum votes present:       3
>>
>>
>> -- Quorum Votes by Node (current status) --
>>
>>                     Node Name           Present Possible Status
>>                     ---------           ------- -------- ------
>>   Node votes:       xCid                1        1       Online
>>   Node votes:       xCloud              1        1       Online
>>
>>
>> -- Quorum Votes by Device (current status) --
>>
>>                     Device Name         Present Possible Status
>>                     -----------         ------- -------- ------
>>   Device votes:     Auron               1        1       Online
>>
>> ------------------------------------------------------------------
>>
>> -- Device Group Servers --
>>
>>                          Device Group        Primary             Secondary
>>                          ------------        -------             ---------
>>
>>
>> -- Device Group Status --
>>
>>                               Device Group        Status
>>                               ------------        ------
>>
>>
>> -- Multi-owner Device Groups --
>>
>>                               Device Group        Online Status
>>                               ------------        -------------
>>
>> ------------------------------------------------------------------
>>
>> -- Resource Groups and Resources --
>>
>>             Group Name     Resources
>>             ----------     ---------
>>  Resources: xCloud-rg      xCloud-nfsres r-nfs
>>  Resources: nfs-rg         nfs-lh-rs nfs-hastp-rs nfs-rs
>>
>>
>> -- Resource Groups --
>>
>>             Group Name     Node Name                State          Suspended
>>             ----------     ---------                -----          ---------
>>      Group: xCloud-rg      xCid                     Online         No
>>      Group: xCloud-rg      xCloud                   Offline        No
>>
>>      Group: nfs-rg         xCid                     Online         No
>>      Group: nfs-rg         xCloud                   Offline        No
>>
>>
>> -- Resources --
>>
>>             Resource Name  Node Name                State          Status 
>> Message
>>             -------------  ---------                -----          
>> --------------
>>   Resource: xCloud-nfsres  xCid                     Online         Online - 
>> LogicalHostname online.
>>   Resource: xCloud-nfsres  xCloud                   Offline        Offline
>>
>>   Resource: r-nfs          xCid                     Online         Online - 
>> Service is online.
>>   Resource: r-nfs          xCloud                   Offline        Offline
>>
>>   Resource: nfs-lh-rs      xCid                     Online         Online - 
>> LogicalHostname online.
>>   Resource: nfs-lh-rs      xCloud                   Offline        Offline
>>
>>   Resource: nfs-hastp-rs   xCid                     Online         Online
>>   Resource: nfs-hastp-rs   xCloud                   Offline        Offline
>>
>>   Resource: nfs-rs         xCid                     Online         Online - 
>> Service is online.
>>   Resource: nfs-rs         xCloud                   Offline        Offline
>>
>> ------------------------------------------------------------------
>>
>> -- IPMP Groups --
>>
>>               Node Name           Group   Status         Adapter   Status
>>               ---------           -----   ------         -------   ------
>>   IPMP Group: xCid                sc_ipmp0 Online         e1000g1   Online
>>
>>   IPMP Group: xCloud              sc_ipmp0 Online         e1000g0   Online
>>
>>
>> -- IPMP Groups in Zones --
>>
>>               Zone Name           Group   Status         Adapter   Status
>>               ---------           -----   ------         -------   ------
>> ------------------------------------------------------------------
>> root at xCid:~#
>>
>>
>> root at xCid:~# clnode show
>>
>> === Cluster Nodes ===
>>
>> Node Name:                                      xCid
>>   Node ID:                                         1
>>   Enabled:                                         yes
>>   privatehostname:                                 clusternode1-priv
>>   reboot_on_path_failure:                          disabled
>>   globalzoneshares:                                1
>>   defaultpsetmin:                                  1
>>   quorum_vote:                                     1
>>   quorum_defaultvote:                              1
>>   quorum_resv_key:                                 0x4A9B35C600000001
>>   Transport Adapter List:                          e1000g2, e1000g3
>>
>> Node Name:                                      xCloud
>>   Node ID:                                         2
>>   Enabled:                                         yes
>>   privatehostname:                                 clusternode2-priv
>>   reboot_on_path_failure:                          disabled
>>   globalzoneshares:                                1
>>   defaultpsetmin:                                  1
>>   quorum_vote:                                     1
>>   quorum_defaultvote:                              1
>>   quorum_resv_key:                                 0x4A9B35C600000002
>>   Transport Adapter List:                          e1000g2, e1000g3
>>
>> root at xCid:~#
>>
>>
>> ******  Reboot 1st node xCid, all of the resources transfer to 2nd
>> node xCloud  and online on node xCloud  ************
>>
>> root at xCloud:~# scstat -p
>> ------------------------------------------------------------------
>>
>> -- Cluster Nodes --
>>
>>                     Node name           Status
>>                     ---------           ------
>>   Cluster node:     xCid                Online
>>   Cluster node:     xCloud              Online
>>
>> ------------------------------------------------------------------
>>
>> -- Cluster Transport Paths --
>>
>>                     Endpoint               Endpoint               Status
>>                     --------               --------               ------
>>   Transport path:   xCid:e1000g3           xCloud:e1000g3         Path online
>>   Transport path:   xCid:e1000g2           xCloud:e1000g2         Path online
>>
>> ------------------------------------------------------------------
>>
>> -- Quorum Summary from latest node reconfiguration --
>>
>>   Quorum votes possible:      3
>>   Quorum votes needed:        2
>>   Quorum votes present:       3
>>
>>
>> -- Quorum Votes by Node (current status) --
>>
>>                     Node Name           Present Possible Status
>>                     ---------           ------- -------- ------
>>   Node votes:       xCid                1        1       Online
>>   Node votes:       xCloud              1        1       Online
>>
>>
>> -- Quorum Votes by Device (current status) --
>>
>>                     Device Name         Present Possible Status
>>                     -----------         ------- -------- ------
>>   Device votes:     Auron               1        1       Online
>>
>> ------------------------------------------------------------------
>>
>> -- Device Group Servers --
>>
>>                          Device Group        Primary             Secondary
>>                          ------------        -------             ---------
>>
>>
>> -- Device Group Status --
>>
>>                               Device Group        Status
>>                               ------------        ------
>>
>>
>> -- Multi-owner Device Groups --
>>
>>                               Device Group        Online Status
>>                               ------------        -------------
>>
>> ------------------------------------------------------------------
>>
>> -- Resource Groups and Resources --
>>
>>             Group Name     Resources
>>             ----------     ---------
>>  Resources: xCloud-rg      xCloud-nfsres r-nfs
>>  Resources: nfs-rg         nfs-lh-rs nfs-hastp-rs nfs-rs
>>
>>
>> -- Resource Groups --
>>
>>             Group Name     Node Name                State          Suspended
>>             ----------     ---------                -----          ---------
>>      Group: xCloud-rg      xCid                     Offline        No
>>      Group: xCloud-rg      xCloud                   Online         No
>>
>>      Group: nfs-rg         xCid                     Offline        No
>>      Group: nfs-rg         xCloud                   Online         No
>>
>>
>> -- Resources --
>>
>>             Resource Name  Node Name                State          Status 
>> Message
>>             -------------  ---------                -----          
>> --------------
>>   Resource: xCloud-nfsres  xCid                     Offline        Offline
>>   Resource: xCloud-nfsres  xCloud                   Online         Online - 
>> LogicalHostname online.
>>
>>   Resource: r-nfs          xCid                     Offline        Offline
>>   Resource: r-nfs          xCloud                   Online         Online - 
>> Service is online.
>>
>>   Resource: nfs-lh-rs      xCid                     Offline        Offline
>>   Resource: nfs-lh-rs      xCloud                   Online         Online - 
>> LogicalHostname online.
>>
>>   Resource: nfs-hastp-rs   xCid                     Offline        Offline
>>   Resource: nfs-hastp-rs   xCloud                   Online         Online
>>
>>   Resource: nfs-rs         xCid                     Offline        Offline
>>   Resource: nfs-rs         xCloud                   Online         Online - 
>> Service is online.
>>
>> ------------------------------------------------------------------
>>
>> -- IPMP Groups --
>>
>>               Node Name           Group   Status         Adapter   Status
>>               ---------           -----   ------         -------   ------
>>   IPMP Group: xCid                sc_ipmp0 Online         e1000g1   Online
>>
>>   IPMP Group: xCloud              sc_ipmp0 Online         e1000g0   Online
>>
>>
>> -- IPMP Groups in Zones --
>>
>>               Zone Name           Group   Status         Adapter   Status
>>               ---------           -----   ------         -------   ------
>> ------------------------------------------------------------------
>> root at xCloud:~#
>>
>>
>> ***********Wait for about 5 minutes, then reboot 2nd node xCloud and
>> node xCid panic with the error below *********************
>>
>> root at xCid:~# Notifying cluster that this node is panicking
>> WARNING: CMM: Reading reservation keys from quorum device Auron failed with 
>> error 2.
>>
>> panic[cpu0]/thread=ffffff02d0a623c0: CMM: Cluster lost operational quorum; 
>> aborting.
>>
>> ffffff0011976b50 genunix:vcmn_err+2c () ffffff0011976b60
>> cl_runtime:__1cZsc_syslog_msg_log_no_args6FpviipkcpnR__va_list_elemen
>> t__nZsc_syslog_msg_status_enum__+1f () ffffff0011976c40
>> cl_runtime:__1cCosNsc_syslog_msgDlog6MiipkcE_nZsc_syslog_msg_status_e
>> num__+8c () ffffff0011976e30
>> cl_haci:__1cOautomaton_implbAstate_machine_qcheck_state6M_nVcmm_autom
>> aton_event_t__+57f () ffffff0011976e70
>> cl_haci:__1cIcmm_implStransitions_thread6M_v_+b7 () ffffff0011976e80
>> cl_haci:__1cIcmm_implYtransitions_thread_start6Fpv_v_+9 ()
>> ffffff0011976ed0 cl_orb:cllwpwrapper+d7 () ffffff0011976ee0
>> unix:thread_start+8 ()
>>
>> syncing file systems... done
>> dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
>> 51% done[2mMIdoOe
>>
>> the host log is attched.
>>
>> I have gone thru the SunCluster doc  on how to setup SunCluster for 
>> OpenSolaris multiple times, but I don't see any steps that I miss.  Can you 
>> please help to see if this is setup issue or it is a bug?
>>
>> Thanks,
>>
>> Janey
>>
>> ---------------------------------------------------------------------
>> ---
>>
>> _______________________________________________
>> ha-clusters-discuss mailing list
>> ha-clusters-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: md.tab
Type: application/octet-stream
Size: 3653 bytes
Desc: md.tab
URL: 
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20090918/ea0dbd15/attachment-0001.obj>

[ha-clusters-discuss] SVM setting up in OpenSolaris 2009.06 HA Cluster

Reply via email to