Sure! But i can't seem to get Redhat to let me see the bug, even though I have an account.
Sent from my iPad > On Oct 21, 2014, at 5:51 PM, Andrew Beekhof <and...@beekhof.net> wrote: > > >> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote: >> >> Yep, my network engineer and I found that the multicast packets were being >> blocked by the underlying hypervisor for the VM systems. > > Yeah, that'll happen :-( > I believe its fixed in newer kernels, but for a while there multicast would > appear to work and then stop for no good reason. > Putting the device into promiscuous mode seemed to help IIRC. > > This is the bug I knew it as: > https://bugzilla.redhat.com/show_bug.cgi?id=1090670 > > > >> At first we thought it was just iptables on the servers, but i was certain I >> had actually turned that off. The issue has been bumped up to the operations >> team for a fixing this, but since I've gotten it to work with unicast, >> there's no pressure >> >> Sent from my iPad >> >>> On Oct 21, 2014, at 3:15 PM, Digimer <li...@alteeve.ca> wrote: >>> >>> Glad you sorted it out! >>> >>> So then, it was almost certainly a multicast issue. I would still strongly >>> recommend trying to source and fix the problem, and reverting to mcast if >>> you can. More efficient. :) >>> >>> digimer >>> >>>> On 21/10/14 02:59 PM, John Scalia wrote: >>>> Ok, got it working after a little more effort, and the cluster is now >>>> properly reporting. >>>> >>>>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> >>>>> wrote: >>>>> >>>>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks >>>>> like this: >>>>> >>>>> <cluster config_version="11" name="pgdb_cluster" transport="udpu"> >>>>> >>>>> <fence_daemon/> >>>>> <clusternodes> >>>>> <clusternode name="csgha1" nodeid="1"> >>>>> <fence> >>>>> <method name="pcmk-redirect"> >>>>> <device name="pcmk" port="csgha1"/> >>>>> </method> >>>>> </fence> >>>>> </clusternode> >>>>> <clusternode name="csgha2" nodeid="2"> >>>>> <fence> >>>>> <method name="pcmk-redirect"> >>>>> <device name="pcmk" port="csgha2"/> >>>>> </method> >>>>> </fence> >>>>> </clusternode> >>>>> <clusternode name="csgha3" nodeid="3"> >>>>> <fence> >>>>> <method name="pcmk-redirect"> >>>>> <device name="pcmk" port="csgha3"/> >>>>> </method> >>>>> </fence> >>>>> </clusternode> >>>>> </clusternodes> >>>>> <cman/> >>>>> <fencedevices> >>>>> <fencedevice agent="fence_pcmk" name="pcmk"/> >>>>> </fencedevices> >>>>> <rm> >>>>> <failoverdomains/> >>>>> <resources/> >>>>> </rm> >>>>> </cluster> >>>>> >>>>> But, after restarting the cluster I don't see any difference. Did I do >>>>> something wrong? >>>>> -- >>>>> Jay >>>>> >>>>>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote: >>>>>> >>>>>> No, you don't need to specify anything in cluster.conf for unicast to >>>>>> work. Corosync will divine the IPs by resolving the node names to IPs. If >>>>>> you set multicast and don't want to use the auto-selected mcast IP, then >>>>>> you can specify the mcast IP group to use via <multicast... />. >>>>>> >>>>>> digimer >>>>>> >>>>>> >>>>>>> On 21/10/14 12:22 PM, John Scalia wrote: >>>>>>> >>>>>>> OK, looking at the cman man page on this system, I see the line saying >>>>>>> "the corosync.conf file is not used." So, I'm guessing I need to set a >>>>>>> unicast address somewhere in the cluster.conf file, but the man page >>>>>>> only mentions the <multicast addr="..."/> parameter. What can I use to >>>>>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I >>>>>>> can't just put a unicast address for the multicast parameter, and the >>>>>>> man page for cluster.conf wasn't much help either. >>>>>>> >>>>>>> We're still working on having the security team permit these 3 systems >>>>>>> to use multicast. >>>>>>> >>>>>>>> On 10/21/2014 11:51 AM, Digimer wrote: >>>>>>>> >>>>>>>> Keep us posted. :) >>>>>>>> >>>>>>>>> On 21/10/14 08:40 AM, John Scalia wrote: >>>>>>>>> >>>>>>>>> I've been check hostname resolution this morning, and all the systems >>>>>>>>> are listed in each /etc/hosts file (No DNS in this environment.) and >>>>>>>>> ping works on every system both to itself and all the other systems. >>>>>>>>> At >>>>>>>>> least it's working on the 10.10.1.0/24 network. >>>>>>>>> >>>>>>>>> I ran tcpdump trying to see what traffic is on port 5405 on each >>>>>>>>> system, >>>>>>>>> and I'm only seeing outbound on each, even though netstat shows each >>>>>>>>> is >>>>>>>>> listening on the multicast address. My suspicion is that the router is >>>>>>>>> eating the multicast broadcasts, so I may try the unicast address >>>>>>>>> instead, but I'm waiting on one of our network engineers to see if my >>>>>>>>> suspicion is correct about the router. He volunteered to help late >>>>>>>>> yesterday. >>>>>>>>> >>>>>>>>>> On 10/20/2014 4:34 PM, Digimer wrote: >>>>>>>>>> >>>>>>>>>> It looks sane on the surface. The 'gethostip' tool comes from the >>>>>>>>>> 'syslinux' package, and it's really handy! The '-d' says to give the >>>>>>>>>> IP in dotted-decimanl notation only. >>>>>>>>>> >>>>>>>>>> What I was trying to see was whether the 'uname -n' resolved to the >>>>>>>>>> IP >>>>>>>>>> on the same network card as the other nodes. This is how corosync >>>>>>>>>> decides which interface to send cluster traffic onto. I suspect you >>>>>>>>>> might have a general network issue, possibly related to multicast. >>>>>>>>>> (Some switches and some hypervisor virtual networks don't play nice >>>>>>>>>> with corosync). >>>>>>>>>> >>>>>>>>>> Have you tried unicast? If not, try setting the <cman ../> element to >>>>>>>>>> have the <cman transport="udpu" ... /> attribute. Do note that >>>>>>>>>> unicast >>>>>>>>>> isn't as efficient as multicast, so thought it might work, I'd >>>>>>>>>> personally treat it as a debug tool to isolate the source of the >>>>>>>>>> problem. >>>>>>>>>> >>>>>>>>>> cheers >>>>>>>>>> >>>>>>>>>> digimer >>>>>>>>>> >>>>>>>>>> PS - Can you share your pacemaker configuration? >>>>>>>>>> >>>>>>>>>>> On 20/10/14 03:40 PM, John Scalia wrote: >>>>>>>>>>> >>>>>>>>>>> Sure, and thanks for helping. >>>>>>>>>>> >>>>>>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on all >>>>>>>>>>> three >>>>>>>>>>> systems: >>>>>>>>>>> >>>>>>>>>>> <cluster config_version="11" name="pgdb_cluster"> >>>>>>>>>>> <fence_daemon/> >>>>>>>>>>> <clusternodes> >>>>>>>>>>> <clusternode name="csgha1" nodeid="1"> >>>>>>>>>>> <fence> >>>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>>> <device name="pcmk" port="csgha1"/> >>>>>>>>>>> </method> >>>>>>>>>>> </fence> >>>>>>>>>>> </clusternode> >>>>>>>>>>> <clusternode name="csgha2" nodeid="2"> >>>>>>>>>>> <fence> >>>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>>> <device name="pcmk" port="csgha2"/> >>>>>>>>>>> </method> >>>>>>>>>>> </fence> >>>>>>>>>>> </clusternode> >>>>>>>>>>> <clusternode name="csgha3" nodeid="3"> >>>>>>>>>>> <fence> >>>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>>> <device name="pcmk" port="csgha3"/> >>>>>>>>>>> </method> >>>>>>>>>>> </fence> >>>>>>>>>>> </clusternode> >>>>>>>>>>> </clusternodes> >>>>>>>>>>> <cman/> >>>>>>>>>>> <fencedevices> >>>>>>>>>>> <fencedevice agent="fence_pcmk" name="pcmk"/> >>>>>>>>>>> </fencedevices> >>>>>>>>>>> <rm> >>>>>>>>>>> <failoverdomains/> >>>>>>>>>>> <resources/> >>>>>>>>>>> </rm> >>>>>>>>>>> </cluster> >>>>>>>>>>> >>>>>>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, >>>>>>>>>>> and >>>>>>>>>>> "csgha3" on the last system. >>>>>>>>>>> I don't seem to have gethostip on any of these systems, so I don't >>>>>>>>>>> know if >>>>>>>>>>> the next section helps or not. >>>>>>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 >>>>>>>>>>> eth1 = 10.10.1.128 >>>>>>>>>>> csgha2: eth0 = 10.10.1.129 >>>>>>>>>>> Yeah, I know this looks a little weird, but it was the way our >>>>>>>>>>> automated VM >>>>>>>>>>> control did the interfaces >>>>>>>>>>> eth1 = 172.,17.1.3 >>>>>>>>>>> csgha3: eth0 = 172.17.1.23 >>>>>>>>>>> eth1 = 10.10.1.130 >>>>>>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24 >>>>>>>>>>> address for >>>>>>>>>>> each system in in it. >>>>>>>>>>> iptables is not running on these systems. >>>>>>>>>>> >>>>>>>>>>> Let me know if you need more information, and I very much appreciate >>>>>>>>>>> your >>>>>>>>>>> assistance. >>>>>>>>>>> -- >>>>>>>>>>> Jay >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote: >>>>>>>>>>> >>>>>>>>>>> On 20/10/14 02:50 PM, John Scalia wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs >>>>>>>>>>>>> running >>>>>>>>>>>>> CentOS 6.5. I followed the instructions to the letter at: >>>>>>>>>>>>> >>>>>>>>>>>>> http://clusterlabs.org/quickstart-redhat.html >>>>>>>>>>>>> >>>>>>>>>>>>> and everything appears to start normally, but if I run "cman_tool >>>>>>>>>>>>> nodes >>>>>>>>>>>>> -a", I only see: >>>>>>>>>>>>> >>>>>>>>>>>>> Node Sts Inc Joined Name >>>>>>>>>>>>> 1 M 64 2014-10--20 14:00:00 csgha1 >>>>>>>>>>>>> Addresses: 10.10.1.128 >>>>>>>>>>>>> 2 X 0 >>>>>>>>>>>>> csgha2 >>>>>>>>>>>>> 3 X 0 >>>>>>>>>>>>> csgha3 >>>>>>>>>>>>> >>>>>>>>>>>>> In the other systems, the output is the same except for which >>>>>>>>>>>>> system is >>>>>>>>>>>>> shown as joined. Each shows just itself as belonging to the >>>>>>>>>>>>> cluster. >>>>>>>>>>>>> Also, "pcs status" reflects similarly with non-self systems >>>>>>>>>>>>> showing >>>>>>>>>>>>> offline. I've checked "netstat -an" and see each machine >>>>>>>>>>>>> listening on >>>>>>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm not >>>>>>>>>>>>> seeing errors in it. >>>>>>>>>>>>> >>>>>>>>>>>>> Any ideas for where to look for what's causing them to not >>>>>>>>>>>>> communicate? >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jay >>>>>>>>>>>> Can you share your cluster.conf file please? Also, for each node: >>>>>>>>>>>> >>>>>>>>>>>> * uname -n >>>>>>>>>>>> * gethostip -d $(uname -n) >>>>>>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | >>>>>>>>>>>> awk '{ >>>>>>>>>>>> print $1 }' >>>>>>>>>>>> * iptables-save | grep -i multi >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Digimer >>>>>>>>>>>> Papers and Projects: https://alteeve.ca/w/ >>>>>>>>>>>> What if the cure for cancer is trapped in the mind of a person >>>>>>>>>>>> without >>>>>>>>>>>> access to education? >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>> _______________________________________________ >>>>>>>>> Linux-HA mailing list >>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>> _______________________________________________ >>>>>>> Linux-HA mailing list >>>>>>> Linux-HA@lists.linux-ha.org >>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>> >>>>>> >>>>>> -- >>>>>> Digimer >>>>>> Papers and Projects: https://alteeve.ca/w/ >>>>>> What if the cure for cancer is trapped in the mind of a person without >>>>>> access to education? >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> Linux-HA@lists.linux-ha.org >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> Linux-HA@lists.linux-ha.org >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>> >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ >>> What if the cure for cancer is trapped in the mind of a person without >>> access to education? >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems