Re: [Linux-HA] New user can't get cman to recognize other systems

jayknowsunix Tue, 21 Oct 2014 15:15:47 -0700

Sure! But i can't seem to get Redhat to let me see the bug, even though I have 
an account.


Sent from my iPad

> On Oct 21, 2014, at 5:51 PM, Andrew Beekhof <and...@beekhof.net> wrote:
> 
> 
>> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote:
>> 
>> Yep, my network engineer and I found that the multicast packets were being 
>> blocked by the underlying hypervisor for the VM systems.
> 
> Yeah, that'll happen :-(
> I believe its fixed in newer kernels, but for a while there multicast would 
> appear to work and then stop for no good reason.
> Putting the device into promiscuous mode seemed to help IIRC.
> 
> This is the bug I knew it as: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1090670
> 
> 
> 
>> At first we thought it was just iptables on the servers, but i was certain I 
>> had actually turned that off. The issue has been bumped up to the operations 
>> team for a fixing this, but since I've gotten it to work with unicast, 
>> there's no pressure
>> 
>> Sent from my iPad
>> 
>>> On Oct 21, 2014, at 3:15 PM, Digimer <li...@alteeve.ca> wrote:
>>> 
>>> Glad you sorted it out!
>>> 
>>> So then, it was almost certainly a multicast issue. I would still strongly 
>>> recommend trying to source and fix the problem, and reverting to mcast if 
>>> you can. More efficient. :)
>>> 
>>> digimer
>>> 
>>>> On 21/10/14 02:59 PM, John Scalia wrote:
>>>> Ok, got it working after a little more effort, and the cluster is now
>>>> properly reporting.
>>>> 
>>>>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks
>>>>> like this:
>>>>> 
>>>>> <cluster config_version="11" name="pgdb_cluster" transport="udpu">
>>>>> 
>>>>> <fence_daemon/>
>>>>> <clusternodes>
>>>>>   <clusternode name="csgha1" nodeid="1">
>>>>>     <fence>
>>>>>       <method name="pcmk-redirect">
>>>>>         <device name="pcmk" port="csgha1"/>
>>>>>       </method>
>>>>>     </fence>
>>>>>   </clusternode>
>>>>>   <clusternode name="csgha2" nodeid="2">
>>>>>     <fence>
>>>>>       <method name="pcmk-redirect">
>>>>>         <device name="pcmk" port="csgha2"/>
>>>>>       </method>
>>>>>     </fence>
>>>>>   </clusternode>
>>>>>   <clusternode name="csgha3" nodeid="3">
>>>>>     <fence>
>>>>>       <method name="pcmk-redirect">
>>>>>         <device name="pcmk" port="csgha3"/>
>>>>>       </method>
>>>>>     </fence>
>>>>>   </clusternode>
>>>>> </clusternodes>
>>>>> <cman/>
>>>>> <fencedevices>
>>>>>   <fencedevice agent="fence_pcmk" name="pcmk"/>
>>>>> </fencedevices>
>>>>> <rm>
>>>>>   <failoverdomains/>
>>>>>   <resources/>
>>>>> </rm>
>>>>> </cluster>
>>>>> 
>>>>> But, after restarting the cluster I don't see any difference. Did I do
>>>>> something wrong?
>>>>> --
>>>>> Jay
>>>>> 
>>>>>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote:
>>>>>> 
>>>>>> No, you don't need to specify anything in cluster.conf for unicast to
>>>>>> work. Corosync will divine the IPs by resolving the node names to IPs. If
>>>>>> you set multicast and don't want to use the auto-selected mcast IP, then
>>>>>> you can specify the mcast IP group to use via <multicast... />.
>>>>>> 
>>>>>> digimer
>>>>>> 
>>>>>> 
>>>>>>> On 21/10/14 12:22 PM, John Scalia wrote:
>>>>>>> 
>>>>>>> OK, looking at the cman man page on this system, I see the line saying
>>>>>>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>>>>>>> unicast address somewhere in the cluster.conf file, but the man page
>>>>>>> only mentions the <multicast addr="..."/> parameter. What can I use to
>>>>>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>>>>>>> can't just put a unicast address for the multicast parameter, and the
>>>>>>> man page for cluster.conf wasn't much help either.
>>>>>>> 
>>>>>>> We're still working on having the security team permit these 3 systems
>>>>>>> to use multicast.
>>>>>>> 
>>>>>>>> On 10/21/2014 11:51 AM, Digimer wrote:
>>>>>>>> 
>>>>>>>> Keep us posted. :)
>>>>>>>> 
>>>>>>>>> On 21/10/14 08:40 AM, John Scalia wrote:
>>>>>>>>> 
>>>>>>>>> I've been check hostname resolution this morning, and all the systems
>>>>>>>>> are listed in each /etc/hosts file (No DNS in this environment.) and
>>>>>>>>> ping works on every system both to itself and all the other systems. 
>>>>>>>>> At
>>>>>>>>> least it's working on the 10.10.1.0/24 network.
>>>>>>>>> 
>>>>>>>>> I ran tcpdump trying to see what traffic is on port 5405 on each
>>>>>>>>> system,
>>>>>>>>> and I'm only seeing outbound on each, even though netstat shows each 
>>>>>>>>> is
>>>>>>>>> listening on the multicast address. My suspicion is that the router is
>>>>>>>>> eating the multicast broadcasts, so I may try the unicast address
>>>>>>>>> instead, but I'm waiting on one of our network engineers to see if my
>>>>>>>>> suspicion is correct about the router. He volunteered to help late
>>>>>>>>> yesterday.
>>>>>>>>> 
>>>>>>>>>> On 10/20/2014 4:34 PM, Digimer wrote:
>>>>>>>>>> 
>>>>>>>>>> It looks sane on the surface. The 'gethostip' tool comes from the
>>>>>>>>>> 'syslinux' package, and it's really handy! The '-d' says to give the
>>>>>>>>>> IP in dotted-decimanl notation only.
>>>>>>>>>> 
>>>>>>>>>> What I was trying to see was whether the 'uname -n' resolved to the 
>>>>>>>>>> IP
>>>>>>>>>> on the same network card as the other nodes. This is how corosync
>>>>>>>>>> decides which interface to send cluster traffic onto. I suspect you
>>>>>>>>>> might have a general network issue, possibly related to multicast.
>>>>>>>>>> (Some switches and some hypervisor virtual networks don't play nice
>>>>>>>>>> with corosync).
>>>>>>>>>> 
>>>>>>>>>> Have you tried unicast? If not, try setting the <cman ../> element to
>>>>>>>>>> have the <cman transport="udpu" ... /> attribute. Do note that 
>>>>>>>>>> unicast
>>>>>>>>>> isn't as efficient as multicast, so thought it might work, I'd
>>>>>>>>>> personally treat it as a debug tool to isolate the source of the
>>>>>>>>>> problem.
>>>>>>>>>> 
>>>>>>>>>> cheers
>>>>>>>>>> 
>>>>>>>>>> digimer
>>>>>>>>>> 
>>>>>>>>>> PS - Can you share your pacemaker configuration?
>>>>>>>>>> 
>>>>>>>>>>> On 20/10/14 03:40 PM, John Scalia wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Sure, and thanks for helping.
>>>>>>>>>>> 
>>>>>>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on all
>>>>>>>>>>> three
>>>>>>>>>>> systems:
>>>>>>>>>>> 
>>>>>>>>>>> <cluster config_version="11" name="pgdb_cluster">
>>>>>>>>>>>  <fence_daemon/>
>>>>>>>>>>>  <clusternodes>
>>>>>>>>>>>    <clusternode name="csgha1" nodeid="1">
>>>>>>>>>>>      <fence>
>>>>>>>>>>>        <method name="pcmk-redirect">
>>>>>>>>>>>          <device name="pcmk" port="csgha1"/>
>>>>>>>>>>>        </method>
>>>>>>>>>>>      </fence>
>>>>>>>>>>>    </clusternode>
>>>>>>>>>>>    <clusternode name="csgha2" nodeid="2">
>>>>>>>>>>>      <fence>
>>>>>>>>>>>        <method name="pcmk-redirect">
>>>>>>>>>>>          <device name="pcmk" port="csgha2"/>
>>>>>>>>>>>        </method>
>>>>>>>>>>>      </fence>
>>>>>>>>>>>    </clusternode>
>>>>>>>>>>>    <clusternode name="csgha3" nodeid="3">
>>>>>>>>>>>      <fence>
>>>>>>>>>>>        <method name="pcmk-redirect">
>>>>>>>>>>>          <device name="pcmk" port="csgha3"/>
>>>>>>>>>>>        </method>
>>>>>>>>>>>      </fence>
>>>>>>>>>>>    </clusternode>
>>>>>>>>>>>  </clusternodes>
>>>>>>>>>>>  <cman/>
>>>>>>>>>>>  <fencedevices>
>>>>>>>>>>>    <fencedevice agent="fence_pcmk" name="pcmk"/>
>>>>>>>>>>>  </fencedevices>
>>>>>>>>>>>  <rm>
>>>>>>>>>>>    <failoverdomains/>
>>>>>>>>>>>    <resources/>
>>>>>>>>>>>  </rm>
>>>>>>>>>>> </cluster>
>>>>>>>>>>> 
>>>>>>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, 
>>>>>>>>>>> and
>>>>>>>>>>> "csgha3" on the last system.
>>>>>>>>>>> I don't seem to have gethostip on any of these systems, so I don't
>>>>>>>>>>> know if
>>>>>>>>>>> the next section helps or not.
>>>>>>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21
>>>>>>>>>>>                                         eth1 = 10.10.1.128
>>>>>>>>>>>                            csgha2: eth0 = 10.10.1.129
>>>>>>>>>>> Yeah, I know this looks a little weird, but it was the way our
>>>>>>>>>>> automated VM
>>>>>>>>>>> control did the interfaces
>>>>>>>>>>>                                         eth1 = 172.,17.1.3
>>>>>>>>>>>                            csgha3: eth0 = 172.17.1.23
>>>>>>>>>>>                                         eth1 = 10.10.1.130
>>>>>>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24
>>>>>>>>>>> address for
>>>>>>>>>>> each system in in it.
>>>>>>>>>>> iptables is not running on these systems.
>>>>>>>>>>> 
>>>>>>>>>>> Let me know if you need more information, and I very much appreciate
>>>>>>>>>>> your
>>>>>>>>>>> assistance.
>>>>>>>>>>> --
>>>>>>>>>>> Jay
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 20/10/14 02:50 PM, John Scalia wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs
>>>>>>>>>>>>> running
>>>>>>>>>>>>> CentOS 6.5. I followed the instructions to the letter at:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://clusterlabs.org/quickstart-redhat.html
>>>>>>>>>>>>> 
>>>>>>>>>>>>> and everything appears to start normally, but if I run "cman_tool
>>>>>>>>>>>>> nodes
>>>>>>>>>>>>> -a", I only see:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Node     Sts    Inc          Joined Name
>>>>>>>>>>>>>         1      M     64         2014-10--20 14:00:00 csgha1
>>>>>>>>>>>>>                 Addresses: 10.10.1.128
>>>>>>>>>>>>>         2      X 0
>>>>>>>>>>>>> csgha2
>>>>>>>>>>>>>         3      X 0
>>>>>>>>>>>>> csgha3
>>>>>>>>>>>>> 
>>>>>>>>>>>>> In the other systems, the output is the same except for which
>>>>>>>>>>>>> system is
>>>>>>>>>>>>> shown as joined. Each shows just itself as belonging to the
>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>> Also, "pcs status" reflects similarly with non-self systems 
>>>>>>>>>>>>> showing
>>>>>>>>>>>>> offline. I've checked "netstat -an" and see each machine
>>>>>>>>>>>>> listening on
>>>>>>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm not
>>>>>>>>>>>>> seeing errors in it.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any ideas for where to look for what's causing them to not
>>>>>>>>>>>>> communicate?
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jay
>>>>>>>>>>>> Can you share your cluster.conf file please? Also, for each node:
>>>>>>>>>>>> 
>>>>>>>>>>>> * uname -n
>>>>>>>>>>>> * gethostip -d $(uname -n)
>>>>>>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
>>>>>>>>>>>> awk '{
>>>>>>>>>>>> print $1 }'
>>>>>>>>>>>> * iptables-save | grep -i multi
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Digimer
>>>>>>>>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>>>>>>>>> What if the cure for cancer is trapped in the mind of a person
>>>>>>>>>>>> without
>>>>>>>>>>>> access to education?
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>> _______________________________________________
>>>>>>>>> Linux-HA mailing list
>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>> _______________________________________________
>>>>>>> Linux-HA mailing list
>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Digimer
>>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>>> What if the cure for cancer is trapped in the mind of a person without
>>>>>> access to education?
>>>>>> _______________________________________________
>>>>>> Linux-HA mailing list
>>>>>> Linux-HA@lists.linux-ha.org
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA@lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>> 
>>> 
>>> -- 
>>> Digimer
>>> Papers and Projects: https://alteeve.ca/w/
>>> What if the cure for cancer is trapped in the mind of a person without 
>>> access to education?
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

Reply via email to