Instead of using metasets (svm), use ZFS/ZPOOL, much easier to setup. Thanks, Tirthankar
http://blogs.sun.com/tirthankar On 09/14/09 14:41, Binu Jose Philip wrote: > On Mon, Sep 14, 2009 at 2:10 PM, Sambit Nayak <Sambit.Nayak at sun.com> wrote: > >> Hi Janey, >> >> If I understood correctly, quorum server is working correctly now, right? >> >> Some more replies inline... >> >> Le, Janey wrote: >> >>> Hi Sambit, >>> I looked into the quorum server using command "clquorumserver show +" , it >>> showed me the old services that quorum server did server before this setup, >>> so I reinstall my host and setup the quorum server again. >>> >> There are cases where the quorum server could still be maintaining >> information about the clusters it serviced in the past. >> (it can happen as a result of unclean removal of the quorum server, for >> instance.) >> >> This should not affect its action with the current clusters that it is >> serving. >> >> There is a procedure to clean up such stale information anyhow - you do not >> need to reinstall quorum server again for that. >> Please look in the section "How to Clean Up the Quorum Server Configuration >> Information" in the quorum server document >> (http://docs.sun.com/app/docs/doc/820-4679/gfjrh?l=en&a=view). >> >> >>> From the document that I got from >>> http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/OHACdocs/ >>> , and from the restriction, Veritas Volume Manager is not supported in >>> OpenSolaris HA Cluster. So, I wonder if we need to use metaset to setup >>> diskset to manage the disk or what I have from the doc is enough? >>> > > If you want to create failover disksets and volumes from shared disks > then yes, you will need to use meta* commands to create a mulit-host > diskset. Other wise, i.e. you don't need disksets/volumes, you can use > the shared devices as they are using the did path. > > cheers > Binu > > >> I'll let other folks answer this one. >> >> Thanks & Regards, >> Sambit >> >> >>> Thanks, >>> >>> Janey >>> -----Original Message----- >>> From: Sambit.Nayak at Sun.COM [mailto:Sambit.Nayak at Sun.COM] >>> Sent: Wednesday, September 09, 2009 1:53 AM >>> To: Le, Janey >>> Cc: ha-clusters-discuss at opensolaris.org >>> Subject: Re: [ha-clusters-discuss] Host panic - OpenSolaris SunCluster >>> >>> Hi Janey, >>> >>> The error message : >>> > WARNING: CMM: Reading reservation keys from quorum device Auron >>> failed with error 2. >>> means that the node xCid failed to read the quorum keys from the quorum >>> server host Auron. >>> >>> It is possible that the node xCid could not contact the quorum server >>> host Auron at that specific time, when the reconfiguration was in >>> progress due to reboot of xCloud. >>> Also please look in the syslog messages on xCid, for any failure >>> messages related to the quorum server. >>> >>> Is the problem happening everytime you reboot xCloud, after both nodes >>> are successfully online in the cluster? >>> >>> >>> Things that can be done to debug >>> -------------------------------------------- >>> More information about this failure will be available in the cluster >>> kernel trace buffers. >>> >>> If you can obtain the kernel dumps of the cluster nodes at the time of >>> this problem, then we can look into them to debug the problem more. >>> If you are not able to provide the dumps, then please do : >>> > *cmm_dbg_buf/s >>> > *(cmm_dbg_buf+8)+1/s >>> at the kmdb prompt resulting from the panic (or on the saved crash >>> dump), and provide that output. >>> >>> Please also save the /var/adm/messages of both nodes. >>> >>> Each quorum server daemon (on a quorum server host) has an associated >>> directory where it stores some files. >>> By default, /var/scqsd is the directory used by a quorum server daemon. >>> If you have changed the default directory while configuring the quorum >>> server, then please look in it instead. >>> There will be files named ".scqsd_dbg_buf*" in such a directory. >>> Please provide those files as well; they will tell us what's happening >>> on the quorum server host Auron. >>> >>> If you execute "clquorum status" command on a cluster node, then it will >>> tell if the local node can access the quorum server at the time of this >>> command execution or not. If access is possible and the node's keys are >>> present, the quorum server is marked online; else it is marked offline. >>> So if you execute this command on both cluster nodes before doing the >>> experiment of rebooting xCloud, then that will tell whether any node was >>> having problems accessing the quorum server. >>> Please run that command on both nodes, and capture the output, before >>> rebooting xCloud. >>> >>> Similarly, the "clquorumserver show +" on the quorum server host will >>> tell what cluster it is serving, and what keys are present on the quorum >>> server, which cluster node is the owner of the quorum server, etc. >>> Please capture its output before rebooting xCloud, and after xCid panics >>> as a result of rebooting xCloud. >>> >>> ************ >>> >>> Just as a confirmation, the cluster is running Open HA Cluster 2009.06, >>> and you are using the quorum server packages available with Open HA >>> Cluster 2009.06, right? >>> >>> Thanks & Regards, >>> Sambit >>> >>> >>> Janey Le wrote: >>> >>> >>>> After setting up SunCluster on OpenSolaris, and when I reboot the second >>>> node of the cluster, my first node panic. Can you please let me know if >>>> there is anyone that I can contact to know if this is setup issue or it is >>>> cluster bug? >>>> >>>> Below is the setup that I had: >>>> >>>> - 2x1 ( 2 OpenSolaris 2009.06 x86 hosts named xCid and xCloud >>>> connected to one FC array) >>>> - Created 32 volumes and mapped to the host group; under the host >>>> groups are the 2 nodes cluster >>>> - Format the volumes >>>> - Setup cluster with quorum server named Auron (all 2 nodes joined >>>> cluster, all of the resource groups and resources are online on 1st node >>>> xCid) >>>> >>>> Below is the status of the cluster before rebooting the nodes. >>>> root at xCid:~# scstat -p >>>> ------------------------------------------------------------------ >>>> >>>> -- Cluster Nodes -- >>>> >>>> Node name Status >>>> --------- ------ >>>> Cluster node: xCid Online >>>> Cluster node: xCloud Online >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- Cluster Transport Paths -- >>>> >>>> Endpoint Endpoint Status >>>> -------- -------- ------ >>>> Transport path: xCid:e1000g3 xCloud:e1000g3 Path >>>> online >>>> Transport path: xCid:e1000g2 xCloud:e1000g2 Path >>>> online >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- Quorum Summary from latest node reconfiguration -- >>>> >>>> Quorum votes possible: 3 >>>> Quorum votes needed: 2 >>>> Quorum votes present: 3 >>>> >>>> >>>> -- Quorum Votes by Node (current status) -- >>>> >>>> Node Name Present Possible Status >>>> --------- ------- -------- ------ >>>> Node votes: xCid 1 1 Online >>>> Node votes: xCloud 1 1 Online >>>> >>>> >>>> -- Quorum Votes by Device (current status) -- >>>> >>>> Device Name Present Possible Status >>>> ----------- ------- -------- ------ >>>> Device votes: Auron 1 1 Online >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- Device Group Servers -- >>>> >>>> Device Group Primary Secondary >>>> ------------ ------- --------- >>>> >>>> >>>> -- Device Group Status -- >>>> >>>> Device Group Status >>>> ------------ ------ >>>> >>>> >>>> -- Multi-owner Device Groups -- >>>> >>>> Device Group Online Status >>>> ------------ ------------- >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- Resource Groups and Resources -- >>>> >>>> Group Name Resources >>>> ---------- --------- >>>> Resources: xCloud-rg xCloud-nfsres r-nfs >>>> Resources: nfs-rg nfs-lh-rs nfs-hastp-rs nfs-rs >>>> >>>> >>>> -- Resource Groups -- >>>> >>>> Group Name Node Name State >>>> Suspended >>>> ---------- --------- ----- >>>> --------- >>>> Group: xCloud-rg xCid Online No >>>> Group: xCloud-rg xCloud Offline No >>>> >>>> Group: nfs-rg xCid Online No >>>> Group: nfs-rg xCloud Offline No >>>> >>>> >>>> -- Resources -- >>>> >>>> Resource Name Node Name State Status >>>> Message >>>> ------------- --------- ----- >>>> -------------- >>>> Resource: xCloud-nfsres xCid Online Online >>>> - LogicalHostname online. >>>> Resource: xCloud-nfsres xCloud Offline Offline >>>> >>>> Resource: r-nfs xCid Online Online >>>> - Service is online. >>>> Resource: r-nfs xCloud Offline Offline >>>> >>>> Resource: nfs-lh-rs xCid Online Online >>>> - LogicalHostname online. >>>> Resource: nfs-lh-rs xCloud Offline Offline >>>> >>>> Resource: nfs-hastp-rs xCid Online Online >>>> Resource: nfs-hastp-rs xCloud Offline Offline >>>> >>>> Resource: nfs-rs xCid Online Online >>>> - Service is online. >>>> Resource: nfs-rs xCloud Offline Offline >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- IPMP Groups -- >>>> >>>> Node Name Group Status Adapter Status >>>> --------- ----- ------ ------- ------ >>>> IPMP Group: xCid sc_ipmp0 Online e1000g1 Online >>>> >>>> IPMP Group: xCloud sc_ipmp0 Online e1000g0 Online >>>> >>>> >>>> -- IPMP Groups in Zones -- >>>> >>>> Zone Name Group Status Adapter Status >>>> --------- ----- ------ ------- ------ >>>> ------------------------------------------------------------------ >>>> root at xCid:~# >>>> >>>> >>>> root at xCid:~# clnode show >>>> >>>> === Cluster Nodes === >>>> >>>> Node Name: xCid >>>> Node ID: 1 >>>> Enabled: yes >>>> privatehostname: clusternode1-priv >>>> reboot_on_path_failure: disabled >>>> globalzoneshares: 1 >>>> defaultpsetmin: 1 >>>> quorum_vote: 1 >>>> quorum_defaultvote: 1 >>>> quorum_resv_key: 0x4A9B35C600000001 >>>> Transport Adapter List: e1000g2, e1000g3 >>>> >>>> Node Name: xCloud >>>> Node ID: 2 >>>> Enabled: yes >>>> privatehostname: clusternode2-priv >>>> reboot_on_path_failure: disabled >>>> globalzoneshares: 1 >>>> defaultpsetmin: 1 >>>> quorum_vote: 1 >>>> quorum_defaultvote: 1 >>>> quorum_resv_key: 0x4A9B35C600000002 >>>> Transport Adapter List: e1000g2, e1000g3 >>>> >>>> root at xCid:~# >>>> >>>> >>>> ****** Reboot 1st node xCid, all of the resources transfer to 2nd node >>>> xCloud and online on node xCloud ************ >>>> >>>> root at xCloud:~# scstat -p >>>> ------------------------------------------------------------------ >>>> >>>> -- Cluster Nodes -- >>>> >>>> Node name Status >>>> --------- ------ >>>> Cluster node: xCid Online >>>> Cluster node: xCloud Online >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- Cluster Transport Paths -- >>>> >>>> Endpoint Endpoint Status >>>> -------- -------- ------ >>>> Transport path: xCid:e1000g3 xCloud:e1000g3 Path >>>> online >>>> Transport path: xCid:e1000g2 xCloud:e1000g2 Path >>>> online >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- Quorum Summary from latest node reconfiguration -- >>>> >>>> Quorum votes possible: 3 >>>> Quorum votes needed: 2 >>>> Quorum votes present: 3 >>>> >>>> >>>> -- Quorum Votes by Node (current status) -- >>>> >>>> Node Name Present Possible Status >>>> --------- ------- -------- ------ >>>> Node votes: xCid 1 1 Online >>>> Node votes: xCloud 1 1 Online >>>> >>>> >>>> -- Quorum Votes by Device (current status) -- >>>> >>>> Device Name Present Possible Status >>>> ----------- ------- -------- ------ >>>> Device votes: Auron 1 1 Online >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- Device Group Servers -- >>>> >>>> Device Group Primary Secondary >>>> ------------ ------- --------- >>>> >>>> >>>> -- Device Group Status -- >>>> >>>> Device Group Status >>>> ------------ ------ >>>> >>>> >>>> -- Multi-owner Device Groups -- >>>> >>>> Device Group Online Status >>>> ------------ ------------- >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- Resource Groups and Resources -- >>>> >>>> Group Name Resources >>>> ---------- --------- >>>> Resources: xCloud-rg xCloud-nfsres r-nfs >>>> Resources: nfs-rg nfs-lh-rs nfs-hastp-rs nfs-rs >>>> >>>> >>>> -- Resource Groups -- >>>> >>>> Group Name Node Name State >>>> Suspended >>>> ---------- --------- ----- >>>> --------- >>>> Group: xCloud-rg xCid Offline No >>>> Group: xCloud-rg xCloud Online No >>>> >>>> Group: nfs-rg xCid Offline No >>>> Group: nfs-rg xCloud Online No >>>> >>>> >>>> -- Resources -- >>>> >>>> Resource Name Node Name State Status >>>> Message >>>> ------------- --------- ----- >>>> -------------- >>>> Resource: xCloud-nfsres xCid Offline Offline >>>> Resource: xCloud-nfsres xCloud Online Online >>>> - LogicalHostname online. >>>> >>>> Resource: r-nfs xCid Offline Offline >>>> Resource: r-nfs xCloud Online Online >>>> - Service is online. >>>> >>>> Resource: nfs-lh-rs xCid Offline Offline >>>> Resource: nfs-lh-rs xCloud Online Online >>>> - LogicalHostname online. >>>> >>>> Resource: nfs-hastp-rs xCid Offline Offline >>>> Resource: nfs-hastp-rs xCloud Online Online >>>> >>>> Resource: nfs-rs xCid Offline Offline >>>> Resource: nfs-rs xCloud Online Online >>>> - Service is online. >>>> >>>> ------------------------------------------------------------------ >>>> >>>> -- IPMP Groups -- >>>> >>>> Node Name Group Status Adapter Status >>>> --------- ----- ------ ------- ------ >>>> IPMP Group: xCid sc_ipmp0 Online e1000g1 Online >>>> >>>> IPMP Group: xCloud sc_ipmp0 Online e1000g0 Online >>>> >>>> >>>> -- IPMP Groups in Zones -- >>>> >>>> Zone Name Group Status Adapter Status >>>> --------- ----- ------ ------- ------ >>>> ------------------------------------------------------------------ >>>> root at xCloud:~# >>>> >>>> >>>> ***********Wait for about 5 minutes, then reboot 2nd node xCloud and >>>> node xCid panic with the error below ********************* >>>> >>>> root at xCid:~# Notifying cluster that this node is panicking >>>> WARNING: CMM: Reading reservation keys from quorum device Auron failed >>>> with error 2. >>>> >>>> panic[cpu0]/thread=ffffff02d0a623c0: CMM: Cluster lost operational >>>> quorum; aborting. >>>> >>>> ffffff0011976b50 genunix:vcmn_err+2c () >>>> ffffff0011976b60 >>>> cl_runtime:__1cZsc_syslog_msg_log_no_args6FpviipkcpnR__va_list_element__nZsc_syslog_msg_status_enum__+1f >>>> () >>>> ffffff0011976c40 >>>> cl_runtime:__1cCosNsc_syslog_msgDlog6MiipkcE_nZsc_syslog_msg_status_enum__+8c >>>> () >>>> ffffff0011976e30 >>>> cl_haci:__1cOautomaton_implbAstate_machine_qcheck_state6M_nVcmm_automaton_event_t__+57f >>>> () >>>> ffffff0011976e70 cl_haci:__1cIcmm_implStransitions_thread6M_v_+b7 () >>>> ffffff0011976e80 cl_haci:__1cIcmm_implYtransitions_thread_start6Fpv_v_+9 >>>> () >>>> ffffff0011976ed0 cl_orb:cllwpwrapper+d7 () >>>> ffffff0011976ee0 unix:thread_start+8 () >>>> >>>> syncing file systems... done >>>> dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel >>>> 51% done[2mMIdoOe >>>> >>>> the host log is attched. >>>> >>>> I have gone thru the SunCluster doc on how to setup SunCluster for >>>> OpenSolaris multiple times, but I don't see any steps that I miss. Can you >>>> please help to see if this is setup issue or it is a bug? >>>> >>>> Thanks, >>>> >>>> Janey >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> ha-clusters-discuss mailing list >>>> ha-clusters-discuss at opensolaris.org >>>> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss >>>> >>>> >> _______________________________________________ >> ha-clusters-discuss mailing list >> ha-clusters-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss >> >> > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20090914/599764b2/attachment-0001.html>