Hello Digimer, The problem solved. First of all, I just want to thank you your time to stay with me on the issue I have. You are also correct about fencing. But here is it breaking down to.
1. I forgot, when I create the cluster. I didn't join these system in cluster set yet. You know one for a long while I have to setup cluster. I did write documentation about all this. But I still forget to follow it to the teeth. That is what happens. So I have to run: cman_tool join for all nodes. This is the key. 2. after join all nodes into cluster. I'm able to start cman via: service cman start 3. then configure fencing 4. then add static config mount device into /etc/fstab 5. then reboot each node one by one. They are all come back and well. I do have this error in logs: (it mean our multicast is not using. I'm using broadcast for now. But if we have multicast network not blocking, then that error would go away. That is my thought.) [TOTEM ] Received message has invalid digest... ignoring. Jan 8 08:34:33 ustlvcmsp1956 corosync[21194]: [TOTEM ] Invalid packet data Again, thanks for your helps. Vinh -----Original Message----- From: linux-cluster-boun...@redhat.com [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Digimer Sent: Thursday, January 08, 2015 2:02 AM To: linux clustering Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster Please configure fencing. If you don't, it _will_ cause you problems. On 07/01/15 09:48 PM, Cao, Vinh wrote: > Hi Digimer, > > No we're not supporting multicast. I'm trying to use Broadcast, but Redhat > support is saying better to use transport=udpu. Which I did set and it is > complaining time out. > I did try to set broadcast, but somehow it didn't work either. > > Let me give broadcast a try again. > > Thanks, > Vinh > > -----Original Message----- > From: linux-cluster-boun...@redhat.com > [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Digimer > Sent: Wednesday, January 07, 2015 5:51 PM > To: linux clustering > Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster > > It looks like a network problem... Does your (virtual) switch support > multicast properly and have you opened up the proper ports in the firewall? > > On 07/01/15 05:32 PM, Cao, Vinh wrote: >> Hi Digimer, >> >> Yes, I just did. Looks like they are failing. I'm not sure why that is. >> Please see the attachment for all servers log. >> >> By the way, I do appreciated all the helps I can get. >> >> Vinh >> >> -----Original Message----- >> From: linux-cluster-boun...@redhat.com >> [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Digimer >> Sent: Wednesday, January 07, 2015 4:33 PM >> To: linux clustering >> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >> >> Quorum is enabled by default. I need to see the entire logs from all five >> nodes, as I mentioned in the first email. Please disable cman from starting >> on boot, configure fencing properly and then reboot all nodes cleanly. Start >> the 'tail -f -n 0 /var/log/messages' on all five nodes, then in another >> window, start cman on all five nodes. When things settle down, copy/paste >> all the log output please. >> >> On 07/01/15 04:29 PM, Cao, Vinh wrote: >>> Hi Digimer, >>> >>> Here is from the logs: >>> [root@ustlvcmsp1954 ~]# tail -f /var/log/messages >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> loaded: corosync profile loading service >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Using quorum >>> provider quorum_cman >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> loaded: corosync cluster quorum service v0.1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Compatibility mode >>> set to whitetank. Using V1 and V2 of the synchronization engine. >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [TOTEM ] A processor joined >>> or left the membership and a new membership was formed. >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [QUORUM] Members[1]: 1 >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [CPG ] chosen downlist: >>> sender r(0) ip(10.30.197.108) ; members(old:0 left:0) >>> Jan 7 16:14:01 ustlvcmsp1954 corosync[8182]: [MAIN ] Completed service >>> synchronization, ready to provide service. >>> Jan 7 16:14:01 ustlvcmsp1954 rgmanager[8099]: Waiting for quorum to form >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Unloading all >>> Corosync service engines. >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> unloaded: corosync extended virtual synchrony service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> unloaded: corosync configuration service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> unloaded: corosync cluster closed process group service v1.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> unloaded: corosync cluster config database access v1.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> unloaded: corosync profile loading service >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> unloaded: openais checkpoint service B.01.01 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> unloaded: corosync CMAN membership service 2.90 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [SERV ] Service engine >>> unloaded: corosync cluster quorum service v0.1 >>> Jan 7 16:15:06 ustlvcmsp1954 corosync[8182]: [MAIN ] Corosync Cluster >>> Engine exiting with status 0 at main.c:2055. >>> Jan 7 16:15:06 ustlvcmsp1954 rgmanager[8099]: Quorum formed >>> >>> Then it die at: >>> Starting cman... [ OK ] >>> Waiting for quorum... Timed-out waiting for cluster >>> >>> [FAILED] >>> >>> Yes, I did made changes with: <fence_daemon post_join_delay="30"/> the >>> problem is still there. One thing I don't know why cluster is looking for >>> quorum? >>> I did have any disk quorum setup in cluster.conf file. >>> >>> Any helps can I get appreciated. >>> >>> Vinh >>> >>> -----Original Message----- >>> From: linux-cluster-boun...@redhat.com >>> [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Digimer >>> Sent: Wednesday, January 07, 2015 3:59 PM >>> To: linux clustering >>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>> >>> On 07/01/15 03:39 PM, Cao, Vinh wrote: >>>> Hello Digimer, >>>> >>>> Yes, I would agrre with you RHEL6.4 is old. We patched monthly, but I'm >>>> not sure why these servers are still at 6.4. Most of our system are 6.6. >>>> >>>> Here is my cluster config. All I want is using cluster to have BGFS2 mount >>>> via /etc/fstab. >>>> root@ustlvcmsp1955 ~]# cat /etc/cluster/cluster.conf <?xml >>>> version="1.0"?> <cluster config_version="15" name="p1954_to_p1958"> >>>> <clusternodes> >>>> <clusternode name="ustlvcmsp1954" nodeid="1"/> >>>> <clusternode name="ustlvcmsp1955" nodeid="2"/> >>>> <clusternode name="ustlvcmsp1956" nodeid="3"/> >>>> <clusternode name="ustlvcmsp1957" nodeid="4"/> >>>> <clusternode name="ustlvcmsp1958" nodeid="5"/> >>>> </clusternodes> >>> >>> You don't configure the fencing for the nodes... If anything causes a >>> fence, the cluster will lock up (by design). >>> >>>> <fencedevices> >>>> <fencedevice agent="fence_vmware_soap" >>>> ipaddr="10.30.197.108" login="rhfence" name="p1954" passwd="xxxxxxxx"/> >>>> <fencedevice agent="fence_vmware_soap" >>>> ipaddr="10.30.197.109" login="rhfence" name="p1955" passwd=" xxxxxxxx "/> >>>> <fencedevice agent="fence_vmware_soap" >>>> ipaddr="10.30.197.110" login="rhfence" name="p1956" passwd=" xxxxxxxx "/> >>>> <fencedevice agent="fence_vmware_soap" >>>> ipaddr="10.30.197.111" login="rhfence" name="p1957" passwd=" xxxxxxxx "/> >>>> <fencedevice agent="fence_vmware_soap" >>>> ipaddr="10.30.197.112" login="rhfence" name="p1958" passwd=" xxxxxxxx "/> >>>> </fencedevices> >>>> </cluster> >>>> >>>> clustat show: >>>> >>>> Cluster Status for p1954_to_p1958 @ Wed Jan 7 15:38:00 2015 Member >>>> Status: Quorate >>>> >>>> Member Name ID >>>> Status >>>> ------ ---- ---- >>>> ------ >>>> ustlvcmsp1954 1 >>>> Offline >>>> ustlvcmsp1955 2 >>>> Online, Local >>>> ustlvcmsp1956 3 >>>> Online >>>> ustlvcmsp1957 4 >>>> Offline >>>> ustlvcmsp1958 5 >>>> Online >>>> >>>> I need to make them all online, so I can use fencing for mounting shared >>>> disk. >>>> >>>> Thanks, >>>> Vinh >>> >>> What about the log entries from the start-up? Did you try the >>> post_join_delay config? >>> >>> >>>> -----Original Message----- >>>> From: linux-cluster-boun...@redhat.com >>>> [mailto:linux-cluster-boun...@redhat.com] On Behalf Of Digimer >>>> Sent: Wednesday, January 07, 2015 3:16 PM >>>> To: linux clustering >>>> Subject: Re: [Linux-cluster] needs helps GFS2 on 5 nodes cluster >>>> >>>> My first though would be to set <fence_daemon post_join_delay="30" /> in >>>> cluster.conf. >>>> >>>> If that doesn't work, please share your configuration file. Then, with all >>>> nodes offline, open a terminal to each node and run 'tail -f -n 0 >>>> /var/log/messages'. With that running, start all the nodes and wait for >>>> things to settle down, then paste the five nodes' output as well. >>>> >>>> Also, 6.4 is pretty old, why not upgrade to 6.6? >>>> >>>> digimer >>>> >>>> On 07/01/15 03:10 PM, Cao, Vinh wrote: >>>>> Hello Cluster guru, >>>>> >>>>> I'm trying to setup Redhat 6.4 OS cluster with 5 nodes. With two >>>>> nodes I don't have any issue. >>>>> >>>>> But with 5 nodes, when I ran clustat I got 3 nodes online and the >>>>> other two off line. >>>>> >>>>> When I start the one that are off line. Service cman start. I got: >>>>> >>>>> [root@ustlvcmspxxx ~]# service cman status >>>>> >>>>> corosync is stopped >>>>> >>>>> [root@ustlvcmsp1954 ~]# service cman start >>>>> >>>>> Starting cluster: >>>>> >>>>> Checking if cluster has been disabled at boot... [ OK ] >>>>> >>>>> Checking Network Manager... [ OK ] >>>>> >>>>> Global setup... [ OK ] >>>>> >>>>> Loading kernel modules... [ OK ] >>>>> >>>>> Mounting configfs... [ OK ] >>>>> >>>>> Starting cman... [ OK ] >>>>> >>>>> Waiting for quorum... Timed-out waiting for cluster >>>>> >>>>> >>>>> [FAILED] >>>>> >>>>> Stopping cluster: >>>>> >>>>> Leaving fence domain... [ OK ] >>>>> >>>>> Stopping gfs_controld... [ OK ] >>>>> >>>>> Stopping dlm_controld... [ OK ] >>>>> >>>>> Stopping fenced... [ OK ] >>>>> >>>>> Stopping cman... [ OK ] >>>>> >>>>> Waiting for corosync to shutdown: [ OK ] >>>>> >>>>> Unloading kernel modules... [ OK ] >>>>> >>>>> Unmounting configfs... [ OK ] >>>>> >>>>> Can you help? >>>>> >>>>> Thank you, >>>>> >>>>> Vinh >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is >>>> trapped in the mind of a person without access to education? >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>> >>> >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is >> trapped in the mind of a person without access to education? >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is > trapped in the mind of a person without access to education? > > -- > Linux-cluster mailing list > Linux-cluster@redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster