Re: [Linux-HA] New user can't get cman to recognize other systems

Andrew Beekhof Tue, 21 Oct 2014 18:43:07 -0700

> On 22 Oct 2014, at 9:16 am, Digimer <li...@alteeve.ca> wrote:
> 
> Blocked for me, too. Possible to clone - client data?


Needless paranoia more likely.

This is the original fedora bug (nothing marked private):
   https://bugzilla.redhat.com/show_bug.cgi?id=880035

and the kbase:
   https://access.redhat.com/solutions/784373


> 
> On 21/10/14 06:14 PM, jayknowsu...@gmail.com wrote:
>> Sure! But i can't seem to get Redhat to let me see the bug, even though I 
>> have an account.
>> 
>> Sent from my iPad
>> 
>>> On Oct 21, 2014, at 5:51 PM, Andrew Beekhof <and...@beekhof.net> wrote:
>>> 
>>> 
>>>> On 22 Oct 2014, at 7:36 am, jayknowsu...@gmail.com wrote:
>>>> 
>>>> Yep, my network engineer and I found that the multicast packets were being 
>>>> blocked by the underlying hypervisor for the VM systems.
>>> 
>>> Yeah, that'll happen :-(
>>> I believe its fixed in newer kernels, but for a while there multicast would 
>>> appear to work and then stop for no good reason.
>>> Putting the device into promiscuous mode seemed to help IIRC.
>>> 
>>> This is the bug I knew it as: 
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1090670
>>> 
>>> 
>>> 
>>>> At first we thought it was just iptables on the servers, but i was certain 
>>>> I had actually turned that off. The issue has been bumped up to the 
>>>> operations team for a fixing this, but since I've gotten it to work with 
>>>> unicast, there's no pressure
>>>> 
>>>> Sent from my iPad
>>>> 
>>>>> On Oct 21, 2014, at 3:15 PM, Digimer <li...@alteeve.ca> wrote:
>>>>> 
>>>>> Glad you sorted it out!
>>>>> 
>>>>> So then, it was almost certainly a multicast issue. I would still 
>>>>> strongly recommend trying to source and fix the problem, and reverting to 
>>>>> mcast if you can. More efficient. :)
>>>>> 
>>>>> digimer
>>>>> 
>>>>>> On 21/10/14 02:59 PM, John Scalia wrote:
>>>>>> Ok, got it working after a little more effort, and the cluster is now
>>>>>> properly reporting.
>>>>>> 
>>>>>>> On Tue, Oct 21, 2014 at 1:34 PM, John Scalia <jayknowsu...@gmail.com> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> So, I set "transport="udpi"' in the cluster.conf file, and it now looks
>>>>>>> like this:
>>>>>>> 
>>>>>>> <cluster config_version="11" name="pgdb_cluster" transport="udpu">
>>>>>>> 
>>>>>>> <fence_daemon/>
>>>>>>> <clusternodes>
>>>>>>>   <clusternode name="csgha1" nodeid="1">
>>>>>>>     <fence>
>>>>>>>       <method name="pcmk-redirect">
>>>>>>>         <device name="pcmk" port="csgha1"/>
>>>>>>>       </method>
>>>>>>>     </fence>
>>>>>>>   </clusternode>
>>>>>>>   <clusternode name="csgha2" nodeid="2">
>>>>>>>     <fence>
>>>>>>>       <method name="pcmk-redirect">
>>>>>>>         <device name="pcmk" port="csgha2"/>
>>>>>>>       </method>
>>>>>>>     </fence>
>>>>>>>   </clusternode>
>>>>>>>   <clusternode name="csgha3" nodeid="3">
>>>>>>>     <fence>
>>>>>>>       <method name="pcmk-redirect">
>>>>>>>         <device name="pcmk" port="csgha3"/>
>>>>>>>       </method>
>>>>>>>     </fence>
>>>>>>>   </clusternode>
>>>>>>> </clusternodes>
>>>>>>> <cman/>
>>>>>>> <fencedevices>
>>>>>>>   <fencedevice agent="fence_pcmk" name="pcmk"/>
>>>>>>> </fencedevices>
>>>>>>> <rm>
>>>>>>>   <failoverdomains/>
>>>>>>>   <resources/>
>>>>>>> </rm>
>>>>>>> </cluster>
>>>>>>> 
>>>>>>> But, after restarting the cluster I don't see any difference. Did I do
>>>>>>> something wrong?
>>>>>>> --
>>>>>>> Jay
>>>>>>> 
>>>>>>>> On Tue, Oct 21, 2014 at 12:25 PM, Digimer <li...@alteeve.ca> wrote:
>>>>>>>> 
>>>>>>>> No, you don't need to specify anything in cluster.conf for unicast to
>>>>>>>> work. Corosync will divine the IPs by resolving the node names to IPs. 
>>>>>>>> If
>>>>>>>> you set multicast and don't want to use the auto-selected mcast IP, 
>>>>>>>> then
>>>>>>>> you can specify the mcast IP group to use via <multicast... />.
>>>>>>>> 
>>>>>>>> digimer
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 21/10/14 12:22 PM, John Scalia wrote:
>>>>>>>>> 
>>>>>>>>> OK, looking at the cman man page on this system, I see the line saying
>>>>>>>>> "the corosync.conf file is not used." So, I'm guessing I need to set a
>>>>>>>>> unicast address somewhere in the cluster.conf file, but the man page
>>>>>>>>> only mentions the <multicast addr="..."/> parameter. What can I use to
>>>>>>>>> set this to a unicast address for ports 5404 and 5405? I'm assuming I
>>>>>>>>> can't just put a unicast address for the multicast parameter, and the
>>>>>>>>> man page for cluster.conf wasn't much help either.
>>>>>>>>> 
>>>>>>>>> We're still working on having the security team permit these 3 systems
>>>>>>>>> to use multicast.
>>>>>>>>> 
>>>>>>>>>> On 10/21/2014 11:51 AM, Digimer wrote:
>>>>>>>>>> 
>>>>>>>>>> Keep us posted. :)
>>>>>>>>>> 
>>>>>>>>>>> On 21/10/14 08:40 AM, John Scalia wrote:
>>>>>>>>>>> 
>>>>>>>>>>> I've been check hostname resolution this morning, and all the 
>>>>>>>>>>> systems
>>>>>>>>>>> are listed in each /etc/hosts file (No DNS in this environment.) and
>>>>>>>>>>> ping works on every system both to itself and all the other 
>>>>>>>>>>> systems. At
>>>>>>>>>>> least it's working on the 10.10.1.0/24 network.
>>>>>>>>>>> 
>>>>>>>>>>> I ran tcpdump trying to see what traffic is on port 5405 on each
>>>>>>>>>>> system,
>>>>>>>>>>> and I'm only seeing outbound on each, even though netstat shows 
>>>>>>>>>>> each is
>>>>>>>>>>> listening on the multicast address. My suspicion is that the router 
>>>>>>>>>>> is
>>>>>>>>>>> eating the multicast broadcasts, so I may try the unicast address
>>>>>>>>>>> instead, but I'm waiting on one of our network engineers to see if 
>>>>>>>>>>> my
>>>>>>>>>>> suspicion is correct about the router. He volunteered to help late
>>>>>>>>>>> yesterday.
>>>>>>>>>>> 
>>>>>>>>>>>> On 10/20/2014 4:34 PM, Digimer wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> It looks sane on the surface. The 'gethostip' tool comes from the
>>>>>>>>>>>> 'syslinux' package, and it's really handy! The '-d' says to give 
>>>>>>>>>>>> the
>>>>>>>>>>>> IP in dotted-decimanl notation only.
>>>>>>>>>>>> 
>>>>>>>>>>>> What I was trying to see was whether the 'uname -n' resolved to 
>>>>>>>>>>>> the IP
>>>>>>>>>>>> on the same network card as the other nodes. This is how corosync
>>>>>>>>>>>> decides which interface to send cluster traffic onto. I suspect you
>>>>>>>>>>>> might have a general network issue, possibly related to multicast.
>>>>>>>>>>>> (Some switches and some hypervisor virtual networks don't play nice
>>>>>>>>>>>> with corosync).
>>>>>>>>>>>> 
>>>>>>>>>>>> Have you tried unicast? If not, try setting the <cman ../> element 
>>>>>>>>>>>> to
>>>>>>>>>>>> have the <cman transport="udpu" ... /> attribute. Do note that 
>>>>>>>>>>>> unicast
>>>>>>>>>>>> isn't as efficient as multicast, so thought it might work, I'd
>>>>>>>>>>>> personally treat it as a debug tool to isolate the source of the
>>>>>>>>>>>> problem.
>>>>>>>>>>>> 
>>>>>>>>>>>> cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> digimer
>>>>>>>>>>>> 
>>>>>>>>>>>> PS - Can you share your pacemaker configuration?
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 20/10/14 03:40 PM, John Scalia wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Sure, and thanks for helping.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Here's the /etc/cluster/cluster.conf file and it is identical on 
>>>>>>>>>>>>> all
>>>>>>>>>>>>> three
>>>>>>>>>>>>> systems:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> <cluster config_version="11" name="pgdb_cluster">
>>>>>>>>>>>>>  <fence_daemon/>
>>>>>>>>>>>>>  <clusternodes>
>>>>>>>>>>>>>    <clusternode name="csgha1" nodeid="1">
>>>>>>>>>>>>>      <fence>
>>>>>>>>>>>>>        <method name="pcmk-redirect">
>>>>>>>>>>>>>          <device name="pcmk" port="csgha1"/>
>>>>>>>>>>>>>        </method>
>>>>>>>>>>>>>      </fence>
>>>>>>>>>>>>>    </clusternode>
>>>>>>>>>>>>>    <clusternode name="csgha2" nodeid="2">
>>>>>>>>>>>>>      <fence>
>>>>>>>>>>>>>        <method name="pcmk-redirect">
>>>>>>>>>>>>>          <device name="pcmk" port="csgha2"/>
>>>>>>>>>>>>>        </method>
>>>>>>>>>>>>>      </fence>
>>>>>>>>>>>>>    </clusternode>
>>>>>>>>>>>>>    <clusternode name="csgha3" nodeid="3">
>>>>>>>>>>>>>      <fence>
>>>>>>>>>>>>>        <method name="pcmk-redirect">
>>>>>>>>>>>>>          <device name="pcmk" port="csgha3"/>
>>>>>>>>>>>>>        </method>
>>>>>>>>>>>>>      </fence>
>>>>>>>>>>>>>    </clusternode>
>>>>>>>>>>>>>  </clusternodes>
>>>>>>>>>>>>>  <cman/>
>>>>>>>>>>>>>  <fencedevices>
>>>>>>>>>>>>>    <fencedevice agent="fence_pcmk" name="pcmk"/>
>>>>>>>>>>>>>  </fencedevices>
>>>>>>>>>>>>>  <rm>
>>>>>>>>>>>>>    <failoverdomains/>
>>>>>>>>>>>>>    <resources/>
>>>>>>>>>>>>>  </rm>
>>>>>>>>>>>>> </cluster>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> uname -n reports "csgha1" on that system, "csgha2" on its system, 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> "csgha3" on the last system.
>>>>>>>>>>>>> I don't seem to have gethostip on any of these systems, so I don't
>>>>>>>>>>>>> know if
>>>>>>>>>>>>> the next section helps or not.
>>>>>>>>>>>>> "ifconfig -a" reports csgha1: eth0 = 172.17.1.21
>>>>>>>>>>>>>                                         eth1 = 10.10.1.128
>>>>>>>>>>>>>                            csgha2: eth0 = 10.10.1.129
>>>>>>>>>>>>> Yeah, I know this looks a little weird, but it was the way our
>>>>>>>>>>>>> automated VM
>>>>>>>>>>>>> control did the interfaces
>>>>>>>>>>>>>                                         eth1 = 172.,17.1.3
>>>>>>>>>>>>>                            csgha3: eth0 = 172.17.1.23
>>>>>>>>>>>>>                                         eth1 = 10.10.1.130
>>>>>>>>>>>>> The /etc/hosts file on each system only has the 10.10.1.0/24
>>>>>>>>>>>>> address for
>>>>>>>>>>>>> each system in in it.
>>>>>>>>>>>>> iptables is not running on these systems.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Let me know if you need more information, and I very much 
>>>>>>>>>>>>> appreciate
>>>>>>>>>>>>> your
>>>>>>>>>>>>> assistance.
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Jay
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Oct 20, 2014 at 3:18 PM, Digimer <li...@alteeve.ca> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 20/10/14 02:50 PM, John Scalia wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'm trying to build my first ever HA cluster and I'm using 3 VMs
>>>>>>>>>>>>>>> running
>>>>>>>>>>>>>>> CentOS 6.5. I followed the instructions to the letter at:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> http://clusterlabs.org/quickstart-redhat.html
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> and everything appears to start normally, but if I run 
>>>>>>>>>>>>>>> "cman_tool
>>>>>>>>>>>>>>> nodes
>>>>>>>>>>>>>>> -a", I only see:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Node     Sts    Inc          Joined Name
>>>>>>>>>>>>>>>         1      M     64         2014-10--20 14:00:00 csgha1
>>>>>>>>>>>>>>>                 Addresses: 10.10.1.128
>>>>>>>>>>>>>>>         2      X 0
>>>>>>>>>>>>>>> csgha2
>>>>>>>>>>>>>>>         3      X 0
>>>>>>>>>>>>>>> csgha3
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In the other systems, the output is the same except for which
>>>>>>>>>>>>>>> system is
>>>>>>>>>>>>>>> shown as joined. Each shows just itself as belonging to the
>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>> Also, "pcs status" reflects similarly with non-self systems 
>>>>>>>>>>>>>>> showing
>>>>>>>>>>>>>>> offline. I've checked "netstat -an" and see each machine
>>>>>>>>>>>>>>> listening on
>>>>>>>>>>>>>>> ports 5405 and 5405. And the logs are rather involved, but I'm 
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> seeing errors in it.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Any ideas for where to look for what's causing them to not
>>>>>>>>>>>>>>> communicate?
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Jay
>>>>>>>>>>>>>> Can you share your cluster.conf file please? Also, for each node:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> * uname -n
>>>>>>>>>>>>>> * gethostip -d $(uname -n)
>>>>>>>>>>>>>> * ifconfig |grep -B 1 $(gethostip -d $(uname -n)) | grep HWaddr |
>>>>>>>>>>>>>> awk '{
>>>>>>>>>>>>>> print $1 }'
>>>>>>>>>>>>>> * iptables-save | grep -i multi
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Digimer
>>>>>>>>>>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>>>>>>>>>>> What if the cure for cancer is trapped in the mind of a person
>>>>>>>>>>>>>> without
>>>>>>>>>>>>>> access to education?
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Linux-HA mailing list
>>>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>>> _______________________________________________
>>>>>>>>> Linux-HA mailing list
>>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Digimer
>>>>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>>>>> What if the cure for cancer is trapped in the mind of a person without
>>>>>>>> access to education?
>>>>>>>> _______________________________________________
>>>>>>>> Linux-HA mailing list
>>>>>>>> Linux-HA@lists.linux-ha.org
>>>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>> _______________________________________________
>>>>>> Linux-HA mailing list
>>>>>> Linux-HA@lists.linux-ha.org
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>> 
>>>>> 
>>>>> --
>>>>> Digimer
>>>>> Papers and Projects: https://alteeve.ca/w/
>>>>> What if the cure for cancer is trapped in the mind of a person without 
>>>>> access to education?
>>>>> _______________________________________________
>>>>> Linux-HA mailing list
>>>>> Linux-HA@lists.linux-ha.org
>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>> See also: http://linux-ha.org/ReportingProblems
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA@lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>> 
>>> _______________________________________________
>>> Linux-HA mailing list
>>> Linux-HA@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>> 
> 
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without access 
> to education?
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New user can't get cman to recognize other systems

Reply via email to