Hi, On Wed, Jun 11, 2008 at 05:43:19PM +0000, maximatt wrote: > hi... thanks again.... > This is most probably a communication problem. Please make sure > > that the packets are reaching both nodes. You can use "tcpdump > > udp port 694" to check that. There is also cl_status with which > > you can check the link status. You can also try unicast... >
> here are log of node1 and node2 (setting ucast parameter): > > node1 log: > > heartbeat[8899]: 2008/06/11_14:02:03 info: Version 2 support: false > heartbeat[8899]: 2008/06/11_14:02:03 WARN: Logging daemon is disabled > --enabling logging daemon is recommended > heartbeat[8899]: 2008/06/11_14:02:03 info: ************************** > heartbeat[8899]: 2008/06/11_14:02:03 info: Configuration validated. Starting > heartbeat 2.1.3 [snip] > heartbeat[8900]: 2008/06/11_14:02:04 info: Local status now set to: 'up' > *heartbeat[8900]: 2008/06/11_14:04:04 WARN: node einstein.prueba.uy: is dead * The node waits for two minutes and sees no sign of life from the other node. > heartbeat[8900]: 2008/06/11_14:05:21 info: Link einstein.prueba.uy:dev20603 > up. The other node's up. > heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own our resources! > heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own foreign > resources! But it's also running the resources which should indicate that the other node's not seeing packets from us. > node2 log: > > heartbeat[7044]: 2008/06/11_14:17:03 info: ************************** > heartbeat[7044]: 2008/06/11_14:17:03 info: Configuration validated. Starting > heartbeat 2.1.3 [snip] > heartbeat[7045]: 2008/06/11_14:17:03 info: Local status now set to: 'up' > *heartbeat[7045]: 2008/06/11_14:19:04 WARN: node maximatt.prueba.uy: is dead > * Nothing's seen in 2 minutes from node maximatt, so it's considered to be down. And all resources are started. According to this, it seems like einstein either doesn't understand maximatt or doesn't hear from it. BTW, it seems like clocks are not synchronized. Please use ntp on all members of the cluster. > and tcpdum results from node1, (in node 2 i have the same results, without > packed droppeds): > > # tcpdump -i dev20603 -n -p udp port 694 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on dev20603, link-type EN10MB (Ethernet), capture size 96 bytes > 14:05:42.328650 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 221 > 14:05:42.329821 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 217 > 14:05:43.332506 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 221 > 14:05:43.332672 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 217 > : > : > 14:07:23.406543 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 221 > 14:07:23.461439 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 222 > 14:07:24.409366 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 221 > 14:07:24.457139 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 222 > 14:07:25.412177 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 221 > 14:07:25.461096 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 222 > 14:07:26.416052 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 231 > 14:07:26.416099 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 221 > 14:07:26.466779 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 222 > 14:07:27.408794 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 221 > 14:07:27.471381 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 232 > 14:07:27.471423 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 222 > 14:07:28.411623 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 221 > 14:07:28.467746 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 222 > 14:07:28.467915 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 231 > 14:07:29.414431 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, > length 221 > 14:07:29.473391 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, > length 222 Yes, according to the tcpdump packets are arriving to both nodes. It is really perplexing. My guess is that something's not right with the network. Did you check interface stats for errors? Did you try cl_status? > mmm :( I am remaining without ideas to test what's happend... (but i keep > trying :) ) > > i try to change the net interface used, to have the same ethetnet device > ("eth1") for heartbeat channel... and setting the others ethernet card for > the service... Not sure if that's going to help. Thanks, Dejan > thanks again!!!! :) > > > Salu2!! ;) > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems