> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote: > > Yep, my network engineer and I found that the multicast packets were being > blocked by the underlying hypervisor for the VM systems.
Yeah, that'll happen :-( I believe its fixed in newer kernels, but for a while there multicast would appear to work and then stop for no good reason. Putting the device into promiscuous mode seemed to help IIRC. This is the bug I knew it as: https://bugzilla.redhat.com/show_bug.cgi?id=1090670 > At first we thought it was just iptables on the servers, but i was certain I > had actually turned that off. The issue has been bumped up to the operations > team for a fixing this, but since I've gotten it to work with unicast, > there's no pressure > > Sent from my iPad > >> On Oct 21, 2014, at 3:15 PM, Digimer <li...@alteeve.ca> wrote: >> >> Glad you sorted it out! >> >> So then, it was almost certainly a multicast issue. I would still strongly >> recommend trying to source and fix the problem, and reverting to mcast if >> you can. More efficient. :) >> >> digimer >> >>> On 21/10/14 02:59 PM, John Scalia wrote: >>> Ok, got it working after a little more effort, and the cluster is now >>> properly reporting. >>> >>>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> >>>> wrote: >>>> >>>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks >>>> like this: >>>> >>>> <cluster config_version="11" name="pgdb_cluster" transport="udpu"> >>>> >>>> <fence_daemon/> >>>> <clusternodes> >>>> <clusternode name="csgha1" nodeid="1"> >>>> <fence> >>>> <method name="pcmk-redirect"> >>>> <device name="pcmk" port="csgha1"/> >>>> </method> >>>> </fence> >>>> </clusternode> >>>> <clusternode name="csgha2" nodeid="2"> >>>> <fence> >>>> <method name="pcmk-redirect"> >>>> <device name="pcmk" port="csgha2"/> >>>> </method> >>>> </fence> >>>> </clusternode> >>>> <clusternode name="csgha3" nodeid="3"> >>>> <fence> >>>> <method name="pcmk-redirect"> >>>> <device name="pcmk" port="csgha3"/> >>>> </method> >>>> </fence> >>>> </clusternode> >>>> </clusternodes> >>>> <cman/> >>>> <fencedevices> >>>> <fencedevice agent="fence_pcmk" name="pcmk"/> >>>> </fencedevices> >>>> <rm> >>>> <failoverdomains/> >>>> <resources/> >>>> </rm> >>>> </cluster> >>>> >>>> But, after restarting the cluster I don't see any difference. Did I do >>>> something wrong? >>>> -- >>>> Jay >>>> >>>>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote: >>>>> >>>>> No, you don't need to specify anything in cluster.conf for unicast to >>>>> work. Corosync will divine the IPs by resolving the node names to IPs. If >>>>> you set multicast and don't want to use the auto-selected mcast IP, then >>>>> you can specify the mcast IP group to use via <multicast... />. >>>>> >>>>> digimer >>>>> >>>>> >>>>>> On 21/10/14 12:22 PM, John Scalia wrote: >>>>>> >>>>>> OK, looking at the cman man page on this system, I see the line saying >>>>>> "the corosync.conf file is not used." So, I'm guessing I need to set a >>>>>> unicast address somewhere in the cluster.conf file, but the man page >>>>>> only mentions the <multicast addr="..."/> parameter. What can I use to >>>>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I >>>>>> can't just put a unicast address for the multicast parameter, and the >>>>>> man page for cluster.conf wasn't much help either. >>>>>> >>>>>> We're still working on having the security team permit these 3 systems >>>>>> to use multicast. >>>>>> >>>>>>> On 10/21/2014 11:51 AM, Digimer wrote: >>>>>>> >>>>>>> Keep us posted. :) >>>>>>> >>>>>>>> On 21/10/14 08:40 AM, John Scalia wrote: >>>>>>>> >>>>>>>> I've been check hostname resolution this morning, and all the systems >>>>>>>> are listed in each /etc/hosts file (No DNS in this environment.) and >>>>>>>> ping works on every system both to itself and all the other systems. At >>>>>>>> least it's working on the 10.10.1.0/24 network. >>>>>>>> >>>>>>>> I ran tcpdump trying to see what traffic is on port 5405 on each >>>>>>>> system, >>>>>>>> and I'm only seeing outbound on each, even though netstat shows each is >>>>>>>> listening on the multicast address. My suspicion is that the router is >>>>>>>> eating the multicast broadcasts, so I may try the unicast address >>>>>>>> instead, but I'm waiting on one of our network engineers to see if my >>>>>>>> suspicion is correct about the router. He volunteered to help late >>>>>>>> yesterday. >>>>>>>> >>>>>>>>> On 10/20/2014 4:34 PM, Digimer wrote: >>>>>>>>> >>>>>>>>> It looks sane on the surface. The 'gethostip' tool comes from the >>>>>>>>> 'syslinux' package, and it's really handy! The '-d' says to give the >>>>>>>>> IP in dotted-decimanl notation only. >>>>>>>>> >>>>>>>>> What I was trying to see was whether the 'uname -n' resolved to the IP >>>>>>>>> on the same network card as the other nodes. This is how corosync >>>>>>>>> decides which interface to send cluster traffic onto. I suspect you >>>>>>>>> might have a general network issue, possibly related to multicast. >>>>>>>>> (Some switches and some hypervisor virtual networks don't play nice >>>>>>>>> with corosync). >>>>>>>>> >>>>>>>>> Have you tried unicast? If not, try setting the <cman ../> element to >>>>>>>>> have the <cman transport="udpu" ... /> attribute. Do note that unicast >>>>>>>>> isn't as efficient as multicast, so thought it might work, I'd >>>>>>>>> personally treat it as a debug tool to isolate the source of the >>>>>>>>> problem. >>>>>>>>> >>>>>>>>> cheers >>>>>>>>> >>>>>>>>> digimer >>>>>>>>> >>>>>>>>> PS - Can you share your pacemaker configuration? >>>>>>>>> >>>>>>>>>> On 20/10/14 03:40 PM, John Scalia wrote: >>>>>>>>>> >>>>>>>>>> Sure, and thanks for helping. >>>>>>>>>> >>>>>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on all >>>>>>>>>> three >>>>>>>>>> systems: >>>>>>>>>> >>>>>>>>>> <cluster config_version="11" name="pgdb_cluster"> >>>>>>>>>> <fence_daemon/> >>>>>>>>>> <clusternodes> >>>>>>>>>> <clusternode name="csgha1" nodeid="1"> >>>>>>>>>> <fence> >>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>> <device name="pcmk" port="csgha1"/> >>>>>>>>>> </method> >>>>>>>>>> </fence> >>>>>>>>>> </clusternode> >>>>>>>>>> <clusternode name="csgha2" nodeid="2"> >>>>>>>>>> <fence> >>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>> <device name="pcmk" port="csgha2"/> >>>>>>>>>> </method> >>>>>>>>>> </fence> >>>>>>>>>> </clusternode> >>>>>>>>>> <clusternode name="csgha3" nodeid="3"> >>>>>>>>>> <fence> >>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>> <device name="pcmk" port="csgha3"/> >>>>>>>>>> </method> >>>>>>>>>> </fence> >>>>>>>>>> </clusternode> >>>>>>>>>> </clusternodes> >>>>>>>>>> <cman/> >>>>>>>>>> <fencedevices> >>>>>>>>>> <fencedevice agent="fence_pcmk" name="pcmk"/> >>>>>>>>>> </fencedevices> >>>>>>>>>> <rm> >>>>>>>>>> <failoverdomains/> >>>>>>>>>> <resources/> >>>>>>>>>> </rm> >>>>>>>>>> </cluster> >>>>>>>>>> >>>>>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, and >>>>>>>>>> "csgha3" on the last system. >>>>>>>>>> I don't seem to have gethostip on any of these systems, so I don't >>>>>>>>>> know if >>>>>>>>>> the next section helps or not. >>>>>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 >>>>>>>>>> eth1 = 10.10.1.128 >>>>>>>>>> csgha2: eth0 = 10.10.1.129 >>>>>>>>>> Yeah, I know this looks a little weird, but it was the way our >>>>>>>>>> automated VM >>>>>>>>>> control did the interfaces >>>>>>>>>> eth1 = 172.,17.1.3 >>>>>>>>>> csgha3: eth0 = 172.17.1.23 >>>>>>>>>> eth1 = 10.10.1.130 >>>>>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24 >>>>>>>>>> address for >>>>>>>>>> each system in in it. >>>>>>>>>> iptables is not running on these systems. >>>>>>>>>> >>>>>>>>>> Let me know if you need more information, and I very much appreciate >>>>>>>>>> your >>>>>>>>>> assistance. >>>>>>>>>> -- >>>>>>>>>> Jay >>>>>>>>>> >>>>>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote: >>>>>>>>>> >>>>>>>>>> On 20/10/14 02:50 PM, John Scalia wrote: >>>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs >>>>>>>>>>>> running >>>>>>>>>>>> CentOS 6.5. I followed the instructions to the letter at: >>>>>>>>>>>> >>>>>>>>>>>> http://clusterlabs.org/quickstart-redhat.html >>>>>>>>>>>> >>>>>>>>>>>> and everything appears to start normally, but if I run "cman_tool >>>>>>>>>>>> nodes >>>>>>>>>>>> -a", I only see: >>>>>>>>>>>> >>>>>>>>>>>> Node Sts Inc Joined Name >>>>>>>>>>>> 1 M 64 2014-10--20 14:00:00 csgha1 >>>>>>>>>>>> Addresses: 10.10.1.128 >>>>>>>>>>>> 2 X 0 >>>>>>>>>>>> csgha2 >>>>>>>>>>>> 3 X 0 >>>>>>>>>>>> csgha3 >>>>>>>>>>>> >>>>>>>>>>>> In the other systems, the output is the same except for which >>>>>>>>>>>> system is >>>>>>>>>>>> shown as joined. Each shows just itself as belonging to the >>>>>>>>>>>> cluster. >>>>>>>>>>>> Also, "pcs status" reflects similarly with non-self systems showing >>>>>>>>>>>> offline. I've checked "netstat -an" and see each machine >>>>>>>>>>>> listening on >>>>>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm not >>>>>>>>>>>> seeing errors in it. >>>>>>>>>>>> >>>>>>>>>>>> Any ideas for where to look for what's causing them to not >>>>>>>>>>>> communicate? >>>>>>>>>>>> -- >>>>>>>>>>>> Jay >>>>>>>>>>> Can you share your cluster.conf file please? Also, for each node: >>>>>>>>>>> >>>>>>>>>>> * uname -n >>>>>>>>>>> * gethostip -d $(uname -n) >>>>>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | >>>>>>>>>>> awk '{ >>>>>>>>>>> print $1 }' >>>>>>>>>>> * iptables-save | grep -i multi >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Digimer >>>>>>>>>>> Papers and Projects: https://alteeve.ca/w/ >>>>>>>>>>> What if the cure for cancer is trapped in the mind of a person >>>>>>>>>>> without >>>>>>>>>>> access to education? >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>> Linux-HA mailing list >>>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>> _______________________________________________ >>>>>>>> Linux-HA mailing list >>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> Linux-HA@lists.linux-ha.org >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>>> >>>>> -- >>>>> Digimer >>>>> Papers and Projects: https://alteeve.ca/w/ >>>>> What if the cure for cancer is trapped in the mind of a person without >>>>> access to education? >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> Linux-HA@lists.linux-ha.org >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems