Re: [Linux-HA] New user can't get cman to recognize other systems

Andrew Beekhof Tue, 21 Oct 2014 14:51:59 -0700

> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote:
> 
> Yep, my network engineer and I found that the multicast packets were being 
> blocked by the underlying hypervisor for the VM systems.


Yeah, that'll happen :-(
I believe its fixed in newer kernels, but for a while there multicast would 
appear to work and then stop for no good reason.
Putting the device into promiscuous mode seemed to help IIRC.

This is the bug I knew it as: 
https://bugzilla.redhat.com/show_bug.cgi?id=1090670



> At first we thought it was just iptables on the servers, but i was certain I 
> had actually turned that off. The issue has been bumped up to the operations 
> team for a fixing this, but since I've gotten it to work with unicast, 
> there's no pressure
> 
> Sent from my iPad
> 
>> On Oct 21, 2014, at 3:15 PM, Digimer <li...@alteeve.ca> wrote:
>> 
>> Glad you sorted it out!
>> 
>> So then, it was almost certainly a multicast issue. I would still strongly 
>> recommend trying to source and fix the problem, and reverting to mcast if 
>> you can. More efficient. :)
>> 
>> digimer
>> 
>>> On 21/10/14 02:59 PM, John Scalia wrote:
>>> Ok, got it working after a little more effort, and the cluster is now
>>> properly reporting.
>>> 
>>>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> 
>>>> wrote:
>>>> 
>>>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks
>>>> like this:
>>>> 
>>>> <cluster config_version="11" name="pgdb_cluster" transport="udpu">
>>>> 
>>>>  <fence_daemon/>
>>>>  <clusternodes>
>>>>    <clusternode name="csgha1" nodeid="1">
>>>>      <fence>
>>>>        <method name="pcmk-redirect">
>>>>          <device name="pcmk" port="csgha1"/>
>>>>        </method>
>>>>      </fence>
>>>>    </clusternode>
>>>>    <clusternode name="csgha2" nodeid="2">
>>>>      <fence>
>>>>        <method name="pcmk-redirect">
>>>>          <device name="pcmk" port="csgha2"/>
>>>>        </method>
>>>>      </fence>
>>>>    </clusternode>
>>>>    <clusternode name="csgha3" nodeid="3">
>>>>      <fence>
>>>>        <method name="pcmk-redirect">
>>>>          <device name="pcmk" port="csgha3"/>
>>>>        </method>
>>>>      </fence>
>>>>    </clusternode>
>>>>  </clusternodes>
>>>>  <cman/>
>>>>  <fencedevices>
>>>>    <fencedevice agent="fence_pcmk" name="pcmk"/>
>>>>  </fencedevices>
>>>>  <rm>
>>>>    <failoverdomains/>
>>>>    <resources/>
>>>>  </rm>
>>>> </cluster>
>>>> 
>>>> But, after restarting the cluster I don't see any difference. Did I do
>>>> something wrong?
>>>> --
>>>> Jay
>>>> 
>>>>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote:
>>>>> 
>>>>> No, you don't need to specify anything in cluster.conf for unicast to
>>>>> work. Corosync will divine the IPs by resolving the node names to IPs. If
>>>>> you set multicast and don't want to use the auto-selected mcast IP, then
>>>>> you can specify the mcast IP group to use via <multicast... />.
>>>>> 
>>>>> digimer
>>>>> 
>>>>> 
>>>>>> On 21/10/14 12:22 PM, John Scalia wrote:
>>>>>> 
>>>>>> OK, looking at the cman man page on this system, I see the line saying
>>>>>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>>>>>> unicast address somewhere in the cluster.conf file, but the man page
>>>>>> only mentions the <multicast addr="..."/> parameter. What can I use to
>>>>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>>>>>> can't just put a unicast address for the multicast parameter, and the
>>>>>> man page for cluster.conf wasn't much help either.
>>>>>> 
>>>>>> We're still working on having the security team permit these 3 systems
>>>>>> to use multicast.
>>>>>> 
>>>>>>> On 10/21/2014 11:51 AM, Digimer wrote:
>>>>>>> 
>>>>>>> Keep us posted. :)
>>>>>>> 
>>>>>>>> On 21/10/14 08:40 AM, John Scalia wrote:
>>>>>>>> 
>>>>>>>> I've been check hostname resolution this morning, and all the systems
>>>>>>>> are listed in each /etc/hosts file (No DNS in this environment.) and
>>>>>>>> ping works on every system both to itself and all the other systems. At
>>>>>>>> least it's working on the 10.10.1.0/24 network.
>>>>>>>> 
>>>>>>>> I ran tcpdump trying to see what traffic is on port 5405 on each
>>>>>>>> system,
>>>>>>>> and I'm only seeing outbound on each, even though netstat shows each is
>>>>>>>> listening on the multicast address. My suspicion is that the router is
>>>>>>>> eating the multicast broadcasts, so I may try the unicast address
>>>>>>>> instead, but I'm waiting on one of our network engineers to see if my
>>>>>>>> suspicion is correct about the router. He volunteered to help late
>>>>>>>> yesterday.
>>>>>>>> 
>>>>>>>>> On 10/20/2014 4:34 PM, Digimer wrote:
>>>>>>>>> 
>>>>>>>>> It looks sane on the surface. The 'gethostip' tool comes from the
>>>>>>>>> 'syslinux' package, and it's really handy! The '-d' says to give the
>>>>>>>>> IP in dotted-decimanl notation only.
>>>>>>>>> 
>>>>>>>>> What I was trying to see was whether the 'uname -n' resolved to the IP
>>>>>>>>> on the same network card as the other nodes. This is how corosync
>>>>>>>>> decides which interface to send cluster traffic onto. I suspect you
>>>>>>>>> might have a general network issue, possibly related to multicast.
>>>>>>>>> (Some switches and some hypervisor virtual networks don't play nice
>>>>>>>>> with corosync).
>>>>>>>>> 
>>>>>>>>> Have you tried unicast? If not, try setting the <cman ../> element to
>>>>>>>>> have the <cman transport="udpu" ... /> attribute. Do note that unicast
>>>>>>>>> isn't as efficient as multicast, so thought it might work, I'd
>>>>>>>>> personally treat it as a debug tool to isolate the source of the
>>>>>>>>> problem.
>>>>>>>>> 
>>>>>>>>> cheers
>>>>>>>>> 
>>>>>>>>> digimer
>>>>>>>>> 
>>>>>>>>> PS - Can you share your pacemaker configuration?
>>>>>>>>> 
>>>>>>>>>> On 20/10/14 03:40 PM, John Scalia wrote:
>>>>>>>>>> 
>>>>>>>>>> Sure, and thanks for helping.
>>>>>>>>>> 
>>>>>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on all
>>>>>>>>>> three
>>>>>>>>>> systems:
>>>>>>>>>> 
>>>>>>>>>> <cluster config_version="11" name="pgdb_cluster">
>>>>>>>>>>   <fence_daemon/>
>>>>>>>>>>   <clusternodes>
>>>>>>>>>>     <clusternode name="csgha1" nodeid="1">
>>>>>>>>>>       <fence>
>>>>>>>>>>         <method name="pcmk-redirect">
>>>>>>>>>>           <device name="pcmk" port="csgha1"/>
>>>>>>>>>>         </method>
>>>>>>>>>>       </fence>
>>>>>>>>>>     </clusternode>
>>>>>>>>>>     <clusternode name="csgha2" nodeid="2">
>>>>>>>>>>       <fence>
>>>>>>>>>>         <method name="pcmk-redirect">
>>>>>>>>>>           <device name="pcmk" port="csgha2"/>
>>>>>>>>>>         </method>
>>>>>>>>>>       </fence>
>>>>>>>>>>     </clusternode>
>>>>>>>>>>     <clusternode name="csgha3" nodeid="3">
>>>>>>>>>>       <fence>
>>>>>>>>>>         <method name="pcmk-redirect">
>>>>>>>>>>           <device name="pcmk" port="csgha3"/>
>>>>>>>>>>         </method>
>>>>>>>>>>       </fence>
>>>>>>>>>>     </clusternode>
>>>>>>>>>>   </clusternodes>
>>>>>>>>>>   <cman/>
>>>>>>>>>>   <fencedevices>
>>>>>>>>>>     <fencedevice agent="fence_pcmk" name="pcmk"/>
>>>>>>>>>>   </fencedevices>
>>>>>>>>>>   <rm>
>>>>>>>>>>     <failoverdomains/>
>>>>>>>>>>     <resources/>
>>>>>>>>>>   </rm>
>>>>>>>>>> </cluster>
>>>>>>>>>> 
>>>>>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, and
>>>>>>>>>> "csgha3" on the last system.
>>>>>>>>>> I don't seem to have gethostip on any of these systems, so I don't
>>>>>>>>>> know if
>>>>>>>>>> the next section helps or not.
>>>>>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21
>>>>>>>>>>                                          eth1 = 10.10.1.128
>>>>>>>>>>                             csgha2: eth0 = 10.10.1.129
>>>>>>>>>> Yeah, I know this looks a little weird, but it was the way our
>>>>>>>>>> automated VM
>>>>>>>>>> control did the interfaces
>>>>>>>>>>                                          eth1 = 172.,17.1.3
>>>>>>>>>>                             csgha3: eth0 = 172.17.1.23
>>>>>>>>>>                                          eth1 = 10.10.1.130
>>>>>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24
>>>>>>>>>> address for
>>>>>>>>>> each system in in it.
>>>>>>>>>> iptables is not running on these systems.
>>>>>>>>>> 
>>>>>>>>>> Let me know if you need more information, and I very much appreciate
>>>>>>>>>> your
>>>>>>>>>> assistance.
>>>>>>>>>> --
>>>>>>>>>> Jay
>>>>>>>>>> 
>>>>>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 20/10/14 02:50 PM, John Scalia wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs
>>>>>>>>>>>> running
>>>>>>>>>>>> CentOS 6.5. I followed the instructions to the letter at:
>>>>>>>>>>>> 
>>>>>>>>>>>> http://clusterlabs.org/quickstart-redhat.html
>>>>>>>>>>>> 
>>>>>>>>>>>> and everything appears to start normally, but if I run "cman_tool
>>>>>>>>>>>> nodes
>>>>>>>>>>>> -a", I only see:
>>>>>>>>>>>> 
>>>>>>>>>>>> Node     Sts    Inc          Joined Name
>>>>>>>>>>>>          1      M     64         2014-10--20 14:00:00 csgha1
>>>>>>>>>>>>                  Addresses: 10.10.1.128
>>>>>>>>>>>>          2      X 0
>>>>>>>>>>>> csgha2
>>>>>>>>>>>>          3      X 0
>>>>>>>>>>>> csgha3
>>>>>>>>>>>> 
>>>>>>>>>>>> In the other systems, the output is the same except for which
>>>>>>>>>>>> system is
>>>>>>>>>>>> shown as joined. Each shows just itself as belonging to the
>>>>>>>>>>>> cluster.
>>>>>>>>>>>> Also, "pcs status" reflects similarly with non-self systems showing
>>>>>>>>>>>> offline. I've checked "netstat -an" and see each machine
>>>>>>>>>>>> listening on
>>>>>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm not
>>>>>>>>>>>> seeing errors in it.
>>>>>>>>>>>> 
>>>>>>>>>>>> Any ideas for where to look for what's causing them to not
>>>>>>>>>>>> communicate?
>>>>>>>>>>>> --
>>>>>>>>>>>> Jay
>>>>>>>>>>> Can you share your cluster.conf file please? Also, for each node:
>>>>>>>>>>> 
>>>>>>>>>>> * uname -n
>>>>>>>>>>> * gethostip -d $(uname -n)
>>>>>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
>>>>>>>>>>> awk '{
>>>>>>>>>>> print $1 }'
>>>>>>>>>>> * iptables-save | grep -i multi
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Digimer
>>>>>>>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>>>>>>>> What if the cure for cancer is trapped in the mind of a person
>>>>>>>>>>> without
>>>>>>>>>>> access to education?
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>> _______________________________________________
>>>>>>>> Linux-HA mailing list
>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>> _______________________________________________
>>>>>> Linux-HA mailing list
>>>>>> Linux-HA@lists.linux-ha.org
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>> 
>>>>> 
>>>>> --
>>>>> Digimer
>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>> What if the cure for cancer is trapped in the mind of a person without
>>>>> access to education?
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA@lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> 
>> 
>> -- 
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without 
>> access to education?
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

Reply via email to