> On 22 Oct 2014, at 9:16 am, Digimer <li...@alteeve.ca> wrote: > > Blocked for me, too. Possible to clone - client data?
Needless paranoia more likely. This is the original fedora bug (nothing marked private): https://bugzilla.redhat.com/show_bug.cgi?id=880035 and the kbase: https://access.redhat.com/solutions/784373 > > On 21/10/14 06:14 PM, jayknowsu...@gmail.com wrote: >> Sure! But i can't seem to get Redhat to let me see the bug, even though I >> have an account. >> >> Sent from my iPad >> >>> On Oct 21, 2014, at 5:51 PM, Andrew Beekhof <and...@beekhof.net> wrote: >>> >>> >>>> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote: >>>> >>>> Yep, my network engineer and I found that the multicast packets were being >>>> blocked by the underlying hypervisor for the VM systems. >>> >>> Yeah, that'll happen :-( >>> I believe its fixed in newer kernels, but for a while there multicast would >>> appear to work and then stop for no good reason. >>> Putting the device into promiscuous mode seemed to help IIRC. >>> >>> This is the bug I knew it as: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1090670 >>> >>> >>> >>>> At first we thought it was just iptables on the servers, but i was certain >>>> I had actually turned that off. The issue has been bumped up to the >>>> operations team for a fixing this, but since I've gotten it to work with >>>> unicast, there's no pressure >>>> >>>> Sent from my iPad >>>> >>>>> On Oct 21, 2014, at 3:15 PM, Digimer <li...@alteeve.ca> wrote: >>>>> >>>>> Glad you sorted it out! >>>>> >>>>> So then, it was almost certainly a multicast issue. I would still >>>>> strongly recommend trying to source and fix the problem, and reverting to >>>>> mcast if you can. More efficient. :) >>>>> >>>>> digimer >>>>> >>>>>> On 21/10/14 02:59 PM, John Scalia wrote: >>>>>> Ok, got it working after a little more effort, and the cluster is now >>>>>> properly reporting. >>>>>> >>>>>>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks >>>>>>> like this: >>>>>>> >>>>>>> <cluster config_version="11" name="pgdb_cluster" transport="udpu"> >>>>>>> >>>>>>> <fence_daemon/> >>>>>>> <clusternodes> >>>>>>> <clusternode name="csgha1" nodeid="1"> >>>>>>> <fence> >>>>>>> <method name="pcmk-redirect"> >>>>>>> <device name="pcmk" port="csgha1"/> >>>>>>> </method> >>>>>>> </fence> >>>>>>> </clusternode> >>>>>>> <clusternode name="csgha2" nodeid="2"> >>>>>>> <fence> >>>>>>> <method name="pcmk-redirect"> >>>>>>> <device name="pcmk" port="csgha2"/> >>>>>>> </method> >>>>>>> </fence> >>>>>>> </clusternode> >>>>>>> <clusternode name="csgha3" nodeid="3"> >>>>>>> <fence> >>>>>>> <method name="pcmk-redirect"> >>>>>>> <device name="pcmk" port="csgha3"/> >>>>>>> </method> >>>>>>> </fence> >>>>>>> </clusternode> >>>>>>> </clusternodes> >>>>>>> <cman/> >>>>>>> <fencedevices> >>>>>>> <fencedevice agent="fence_pcmk" name="pcmk"/> >>>>>>> </fencedevices> >>>>>>> <rm> >>>>>>> <failoverdomains/> >>>>>>> <resources/> >>>>>>> </rm> >>>>>>> </cluster> >>>>>>> >>>>>>> But, after restarting the cluster I don't see any difference. Did I do >>>>>>> something wrong? >>>>>>> -- >>>>>>> Jay >>>>>>> >>>>>>>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote: >>>>>>>> >>>>>>>> No, you don't need to specify anything in cluster.conf for unicast to >>>>>>>> work. Corosync will divine the IPs by resolving the node names to IPs. >>>>>>>> If >>>>>>>> you set multicast and don't want to use the auto-selected mcast IP, >>>>>>>> then >>>>>>>> you can specify the mcast IP group to use via <multicast... />. >>>>>>>> >>>>>>>> digimer >>>>>>>> >>>>>>>> >>>>>>>>> On 21/10/14 12:22 PM, John Scalia wrote: >>>>>>>>> >>>>>>>>> OK, looking at the cman man page on this system, I see the line saying >>>>>>>>> "the corosync.conf file is not used." So, I'm guessing I need to set a >>>>>>>>> unicast address somewhere in the cluster.conf file, but the man page >>>>>>>>> only mentions the <multicast addr="..."/> parameter. What can I use to >>>>>>>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I >>>>>>>>> can't just put a unicast address for the multicast parameter, and the >>>>>>>>> man page for cluster.conf wasn't much help either. >>>>>>>>> >>>>>>>>> We're still working on having the security team permit these 3 systems >>>>>>>>> to use multicast. >>>>>>>>> >>>>>>>>>> On 10/21/2014 11:51 AM, Digimer wrote: >>>>>>>>>> >>>>>>>>>> Keep us posted. :) >>>>>>>>>> >>>>>>>>>>> On 21/10/14 08:40 AM, John Scalia wrote: >>>>>>>>>>> >>>>>>>>>>> I've been check hostname resolution this morning, and all the >>>>>>>>>>> systems >>>>>>>>>>> are listed in each /etc/hosts file (No DNS in this environment.) and >>>>>>>>>>> ping works on every system both to itself and all the other >>>>>>>>>>> systems. At >>>>>>>>>>> least it's working on the 10.10.1.0/24 network. >>>>>>>>>>> >>>>>>>>>>> I ran tcpdump trying to see what traffic is on port 5405 on each >>>>>>>>>>> system, >>>>>>>>>>> and I'm only seeing outbound on each, even though netstat shows >>>>>>>>>>> each is >>>>>>>>>>> listening on the multicast address. My suspicion is that the router >>>>>>>>>>> is >>>>>>>>>>> eating the multicast broadcasts, so I may try the unicast address >>>>>>>>>>> instead, but I'm waiting on one of our network engineers to see if >>>>>>>>>>> my >>>>>>>>>>> suspicion is correct about the router. He volunteered to help late >>>>>>>>>>> yesterday. >>>>>>>>>>> >>>>>>>>>>>> On 10/20/2014 4:34 PM, Digimer wrote: >>>>>>>>>>>> >>>>>>>>>>>> It looks sane on the surface. The 'gethostip' tool comes from the >>>>>>>>>>>> 'syslinux' package, and it's really handy! The '-d' says to give >>>>>>>>>>>> the >>>>>>>>>>>> IP in dotted-decimanl notation only. >>>>>>>>>>>> >>>>>>>>>>>> What I was trying to see was whether the 'uname -n' resolved to >>>>>>>>>>>> the IP >>>>>>>>>>>> on the same network card as the other nodes. This is how corosync >>>>>>>>>>>> decides which interface to send cluster traffic onto. I suspect you >>>>>>>>>>>> might have a general network issue, possibly related to multicast. >>>>>>>>>>>> (Some switches and some hypervisor virtual networks don't play nice >>>>>>>>>>>> with corosync). >>>>>>>>>>>> >>>>>>>>>>>> Have you tried unicast? If not, try setting the <cman ../> element >>>>>>>>>>>> to >>>>>>>>>>>> have the <cman transport="udpu" ... /> attribute. Do note that >>>>>>>>>>>> unicast >>>>>>>>>>>> isn't as efficient as multicast, so thought it might work, I'd >>>>>>>>>>>> personally treat it as a debug tool to isolate the source of the >>>>>>>>>>>> problem. >>>>>>>>>>>> >>>>>>>>>>>> cheers >>>>>>>>>>>> >>>>>>>>>>>> digimer >>>>>>>>>>>> >>>>>>>>>>>> PS - Can you share your pacemaker configuration? >>>>>>>>>>>> >>>>>>>>>>>>> On 20/10/14 03:40 PM, John Scalia wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Sure, and thanks for helping. >>>>>>>>>>>>> >>>>>>>>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on >>>>>>>>>>>>> all >>>>>>>>>>>>> three >>>>>>>>>>>>> systems: >>>>>>>>>>>>> >>>>>>>>>>>>> <cluster config_version="11" name="pgdb_cluster"> >>>>>>>>>>>>> <fence_daemon/> >>>>>>>>>>>>> <clusternodes> >>>>>>>>>>>>> <clusternode name="csgha1" nodeid="1"> >>>>>>>>>>>>> <fence> >>>>>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>>>>> <device name="pcmk" port="csgha1"/> >>>>>>>>>>>>> </method> >>>>>>>>>>>>> </fence> >>>>>>>>>>>>> </clusternode> >>>>>>>>>>>>> <clusternode name="csgha2" nodeid="2"> >>>>>>>>>>>>> <fence> >>>>>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>>>>> <device name="pcmk" port="csgha2"/> >>>>>>>>>>>>> </method> >>>>>>>>>>>>> </fence> >>>>>>>>>>>>> </clusternode> >>>>>>>>>>>>> <clusternode name="csgha3" nodeid="3"> >>>>>>>>>>>>> <fence> >>>>>>>>>>>>> <method name="pcmk-redirect"> >>>>>>>>>>>>> <device name="pcmk" port="csgha3"/> >>>>>>>>>>>>> </method> >>>>>>>>>>>>> </fence> >>>>>>>>>>>>> </clusternode> >>>>>>>>>>>>> </clusternodes> >>>>>>>>>>>>> <cman/> >>>>>>>>>>>>> <fencedevices> >>>>>>>>>>>>> <fencedevice agent="fence_pcmk" name="pcmk"/> >>>>>>>>>>>>> </fencedevices> >>>>>>>>>>>>> <rm> >>>>>>>>>>>>> <failoverdomains/> >>>>>>>>>>>>> <resources/> >>>>>>>>>>>>> </rm> >>>>>>>>>>>>> </cluster> >>>>>>>>>>>>> >>>>>>>>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, >>>>>>>>>>>>> and >>>>>>>>>>>>> "csgha3" on the last system. >>>>>>>>>>>>> I don't seem to have gethostip on any of these systems, so I don't >>>>>>>>>>>>> know if >>>>>>>>>>>>> the next section helps or not. >>>>>>>>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21 >>>>>>>>>>>>> eth1 = 10.10.1.128 >>>>>>>>>>>>> csgha2: eth0 = 10.10.1.129 >>>>>>>>>>>>> Yeah, I know this looks a little weird, but it was the way our >>>>>>>>>>>>> automated VM >>>>>>>>>>>>> control did the interfaces >>>>>>>>>>>>> eth1 = 172.,17.1.3 >>>>>>>>>>>>> csgha3: eth0 = 172.17.1.23 >>>>>>>>>>>>> eth1 = 10.10.1.130 >>>>>>>>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24 >>>>>>>>>>>>> address for >>>>>>>>>>>>> each system in in it. >>>>>>>>>>>>> iptables is not running on these systems. >>>>>>>>>>>>> >>>>>>>>>>>>> Let me know if you need more information, and I very much >>>>>>>>>>>>> appreciate >>>>>>>>>>>>> your >>>>>>>>>>>>> assistance. >>>>>>>>>>>>> -- >>>>>>>>>>>>> Jay >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> On 20/10/14 02:50 PM, John Scalia wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs >>>>>>>>>>>>>>> running >>>>>>>>>>>>>>> CentOS 6.5. I followed the instructions to the letter at: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> http://clusterlabs.org/quickstart-redhat.html >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> and everything appears to start normally, but if I run >>>>>>>>>>>>>>> "cman_tool >>>>>>>>>>>>>>> nodes >>>>>>>>>>>>>>> -a", I only see: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Node Sts Inc Joined Name >>>>>>>>>>>>>>> 1 M 64 2014-10--20 14:00:00 csgha1 >>>>>>>>>>>>>>> Addresses: 10.10.1.128 >>>>>>>>>>>>>>> 2 X 0 >>>>>>>>>>>>>>> csgha2 >>>>>>>>>>>>>>> 3 X 0 >>>>>>>>>>>>>>> csgha3 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In the other systems, the output is the same except for which >>>>>>>>>>>>>>> system is >>>>>>>>>>>>>>> shown as joined. Each shows just itself as belonging to the >>>>>>>>>>>>>>> cluster. >>>>>>>>>>>>>>> Also, "pcs status" reflects similarly with non-self systems >>>>>>>>>>>>>>> showing >>>>>>>>>>>>>>> offline. I've checked "netstat -an" and see each machine >>>>>>>>>>>>>>> listening on >>>>>>>>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>> seeing errors in it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any ideas for where to look for what's causing them to not >>>>>>>>>>>>>>> communicate? >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Jay >>>>>>>>>>>>>> Can you share your cluster.conf file please? Also, for each node: >>>>>>>>>>>>>> >>>>>>>>>>>>>> * uname -n >>>>>>>>>>>>>> * gethostip -d $(uname -n) >>>>>>>>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr | >>>>>>>>>>>>>> awk '{ >>>>>>>>>>>>>> print $1 }' >>>>>>>>>>>>>> * iptables-save | grep -i multi >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Digimer >>>>>>>>>>>>>> Papers and Projects: https://alteeve.ca/w/ >>>>>>>>>>>>>> What if the cure for cancer is trapped in the mind of a person >>>>>>>>>>>>>> without >>>>>>>>>>>>>> access to education? >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Linux-HA mailing list >>>>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>>> _______________________________________________ >>>>>>>>> Linux-HA mailing list >>>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Digimer >>>>>>>> Papers and Projects: https://alteeve.ca/w/ >>>>>>>> What if the cure for cancer is trapped in the mind of a person without >>>>>>>> access to education? >>>>>>>> _______________________________________________ >>>>>>>> Linux-HA mailing list >>>>>>>> Linux-HA@lists.linux-ha.org >>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> Linux-HA@lists.linux-ha.org >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>>> >>>>> -- >>>>> Digimer >>>>> Papers and Projects: https://alteeve.ca/w/ >>>>> What if the cure for cancer is trapped in the mind of a person without >>>>> access to education? >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> Linux-HA@lists.linux-ha.org >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> Linux-HA@lists.linux-ha.org >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> Linux-HA@lists.linux-ha.org >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without access > to education? > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems