Re: [Linux-HA] Node remains offline after host restart

James Guthrie Fri, 26 Oct 2012 07:25:48 -0700

Hi Michael,

I'm working with a Linux From Scratch based kernel (version 3.4.7) 
running in a virtual machine and with virtual switches. Corosync and 
Pacemaker have been compiled from sources. As mentioned in a previous 
e-mail, I don't have python, so no crmsh or pcs.


`corosync-cpgtool` returns the same result on both hosts:

Group Name             PID         Node ID
crmd\x00
                      7427       650684608 (192.168.200.166)
                      9055       717793472 (192.168.200.170)
attrd\x00
                      7425       650684608 (192.168.200.166)
                      9053       717793472 (192.168.200.170)
stonith-ng\x00
                      7423       650684608 (192.168.200.166)
                      9051       717793472 (192.168.200.170)
cib\x00
                      7422       650684608 (192.168.200.166)
                      9050       717793472 (192.168.200.170)
pcmk\x00
                      7420       650684608 (192.168.200.166)
                      9048       717793472 (192.168.200.170)


`tcpdump -ni eth1 port 5404` returns:

listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
16:22:27.849551 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87
16:22:28.210578 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87
16:22:28.770181 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87
16:22:28.989802 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87
16:22:29.370684 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87
16:22:29.751062 IP 192.168.200.166.5404 > 224.0.0.18.5405: UDP, length 87

Every now and then there is a packet from r4 (192.168.200.170), it does 
appear as though r4 is quite quiet though.

Regards,
James


On 10/26/2012 04:05 PM, Michael Schwartzkopff wrote:
>> Hi Michael,
>>
>> Yes, that is exactly the problem I'm having.
>>
>> As far as I can tell everything is working fine (or appears to be) on
>> the communication layer. I have not once had a problem with corosync. As
>> I mentioned before, I can get the cluster to work perfectly fine for
>> periods of time, but I can also with ease get it into this state of one
>> node online and the other offline.
>>
>> What confuses me is that the cluster doesn't seem to try to recover from
>> this state in any way. Even more vexing is the fact that there doesn't
>> seem to be anything that points to why this could be taking place.
>>
>> Regards,
>> James
>>
>> P.S. I'm working with your "Clusterbau" book!
>
> hi,
>
> what environemnt are you working on? virtual machines? What distribution do
> you use?
>
> What says
>
> corosync-cpgtool
>
> on both nodes?
>
> What does
>
> tcpdump -n -i bond0 port 5405
>
> provided that bond0 is your cluster heartbeat connection and 5405 is your
> corosync port?
>
> Greetings,
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Node remains offline after host restart

Reply via email to