hi... thanks again....
> This deadtime is probably to big. Something like 10 should be > more appropriate. > your suggestion is done :) This is most probably a communication problem. Please make sure > that the packets are reaching both nodes. You can use "tcpdump > udp port 694" to check that. There is also cl_status with which > you can check the link status. You can also try unicast... i think i not have a communication problem :( (but... never knows) i do your suggestion (trying with mcats and ucast.. and i have the same results :( here are log of node1 and node2 (setting ucast parameter): node1 log: heartbeat[8899]: 2008/06/11_14:02:03 info: Version 2 support: false heartbeat[8899]: 2008/06/11_14:02:03 WARN: Logging daemon is disabled --enabling logging daemon is recommended heartbeat[8899]: 2008/06/11_14:02:03 info: ************************** heartbeat[8899]: 2008/06/11_14:02:03 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[8900]: 2008/06/11_14:02:03 info: heartbeat: version 2.1.3 heartbeat[8900]: 2008/06/11_14:02:04 info: Heartbeat generation: 1207833073 heartbeat[8900]: 2008/06/11_14:02:04 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on dev20603 heartbeat[8900]: 2008/06/11_14:02:04 info: glib: ucast: bound send socket to device: dev20603 heartbeat[8900]: 2008/06/11_14:02:04 info: glib: ucast: bound receive socket to device: dev20603 heartbeat[8900]: 2008/06/11_14:02:04 info: glib: ucast: started on port 694 interface dev20603 to 192.168.140.2 heartbeat[8900]: 2008/06/11_14:02:04 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[8900]: 2008/06/11_14:02:04 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[8900]: 2008/06/11_14:02:04 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[8900]: 2008/06/11_14:02:04 info: Local status now set to: 'up' *heartbeat[8900]: 2008/06/11_14:04:04 WARN: node einstein.prueba.uy: is dead * heartbeat[8900]: 2008/06/11_14:04:04 info: Comm_now_up(): updating status to active heartbeat[8900]: 2008/06/11_14:04:04 info: Local status now set to: 'active' heartbeat[8900]: 2008/06/11_14:04:04 info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496) heartbeat[8914]: 2008/06/11_14:04:04 info: Starting "/usr/lib/heartbeat/ipfail" as uid 498 gid 496 (pid 8914) heartbeat[8900]: 2008/06/11_14:04:04 WARN: No STONITH device configured. heartbeat[8900]: 2008/06/11_14:04:04 WARN: Shared disks are not protected. heartbeat[8900]: 2008/06/11_14:04:04 info: Resources being acquired from einstein.prueba.uy. harc[8915]: 2008/06/11_14:04:04 info: Running /etc/ha.d/rc.d/status status mach_down[8945]: 2008/06/11_14:04:04 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[8945]: 2008/06/11_14:04:04 info: mach_down takeover complete for node einstein.prueba.uy. heartbeat[8900]: 2008/06/11_14:04:04 info: mach_down takeover complete. heartbeat[8900]: 2008/06/11_14:04:04 info: Initial resource acquisition complete (mach_down) IPaddr[8988]: 2008/06/11_14:04:04 INFO: Resource is stopped heartbeat[8916]: 2008/06/11_14:04:04 info: Local Resource acquisition completed. harc[9040]: 2008/06/11_14:04:04 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp ip-request-resp[9040]: 2008/06/11_14:04:04 received ip-request-resp 100.0.4.100 OK yes ResourceManager[9061]: 2008/06/11_14:04:04 info: Acquiring resource group: maximatt.prueba.uy 100.0.4.100 httpd IPaddr[9088]: 2008/06/11_14:04:04 INFO: Resource is stopped ResourceManager[9061]: 2008/06/11_14:04:04 info: Running /etc/ha.d/resource.d/IPaddr 100.0.4.100 start IPaddr[9164]: 2008/06/11_14:04:04 INFO: Using calculated nic for 100.0.4.100: eth1 IPaddr[9164]: 2008/06/11_14:04:04 INFO: Using calculated netmask for 100.0.4.100: 255.0.0.0 IPaddr[9164]: 2008/06/11_14:04:04 INFO: eval ifconfig eth1:0 100.0.4.100netmask 255.0.0.0 broadcast 100.255.255.255 IPaddr[9147]: 2008/06/11_14:04:05 INFO: Success ResourceManager[9061]: 2008/06/11_14:04:05 info: Running /etc/init.d/httpd start heartbeat[8900]: 2008/06/11_14:04:14 info: Local Resource acquisition completed. (none) heartbeat[8900]: 2008/06/11_14:04:14 info: local resource transition completed. heartbeat[8900]: 2008/06/11_14:05:21 info: Link einstein.prueba.uy:dev20603 up. heartbeat[8900]: 2008/06/11_14:05:21 info: Status update for node einstein.prueba.uy: status init heartbeat[8900]: 2008/06/11_14:05:21 info: Status update for node einstein.prueba.uy: status up ipfail[8914]: 2008/06/11_14:05:21 info: Link Status update: Link einstein.prueba.uy/dev20603 now has status up ipfail[8914]: 2008/06/11_14:05:21 info: Status update: Node einstein.prueba.uy now has status init ipfail[8914]: 2008/06/11_14:05:21 info: Status update: Node einstein.prueba.uy now has status up harc[9329]: 2008/06/11_14:05:21 info: Running /etc/ha.d/rc.d/status status harc[9346]: 2008/06/11_14:05:21 info: Running /etc/ha.d/rc.d/status status heartbeat[8900]: 2008/06/11_14:06:01 info: all clients are now paused *heartbeat[8900]: 2008/06/11_14:07:20 WARN: 1 lost packet(s) for [ einstein.prueba.uy] [124:126]* *heartbeat[8900]: 2008/06/11_14:07:20 info: Status update for node einstein.prueba.uy: status active* *heartbeat[8900]: 2008/06/11_14:07:20 info: No pkts missing from einstein.prueba.uy!* *ipfail[8914]: 2008/06/11_14:07:20 info: Status update: Node einstein.prueba.uy now has status active* heartbeat[8900]: 2008/06/11_14:07:20 info: remote resource transition completed. heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own our resources! heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own foreign resources! heartbeat[8900]: 2008/06/11_14:07:20 info: maximatt.prueba.uy wants to go standby [foreign] heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own our resources! heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own foreign resources! harc[9368]: 2008/06/11_14:07:20 info: Running /etc/ha.d/rc.d/status status heartbeat[8900]: 2008/06/11_14:07:22 ERROR: Both machines own our resources! heartbeat[8900]: 2008/06/11_14:07:22 ERROR: Both machines own foreign resources! node2 log: heartbeat[7044]: 2008/06/11_14:17:03 info: ************************** heartbeat[7044]: 2008/06/11_14:17:03 info: Configuration validated. Starting heartbeat 2.1.3 heartbeat[7045]: 2008/06/11_14:17:03 info: heartbeat: version 2.1.3 heartbeat[7045]: 2008/06/11_14:17:03 info: Heartbeat generation: 1207843598 heartbeat[7045]: 2008/06/11_14:17:03 info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 heartbeat[7045]: 2008/06/11_14:17:03 info: glib: ucast: bound send socket to device: eth0 heartbeat[7045]: 2008/06/11_14:17:03 info: glib: ucast: bound receive socket to device: eth0 heartbeat[7045]: 2008/06/11_14:17:03 info: glib: ucast: started on port 694 interface eth0 to 192.168.140.1 heartbeat[7045]: 2008/06/11_14:17:03 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[7045]: 2008/06/11_14:17:03 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[7045]: 2008/06/11_14:17:03 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[7045]: 2008/06/11_14:17:03 info: Local status now set to: 'up' *heartbeat[7045]: 2008/06/11_14:19:04 WARN: node maximatt.prueba.uy: is dead * heartbeat[7045]: 2008/06/11_14:19:04 info: Comm_now_up(): updating status to active heartbeat[7045]: 2008/06/11_14:19:04 info: Local status now set to: 'active' heartbeat[7045]: 2008/06/11_14:19:04 info: Starting child client "/usr/lib/heartbeat/ipfail" (498,496) heartbeat[7045]: 2008/06/11_14:19:04 WARN: No STONITH device configured. heartbeat[7045]: 2008/06/11_14:19:04 WARN: Shared disks are not protected. heartbeat[7045]: 2008/06/11_14:19:04 info: Resources being acquired from maximatt.prueba.uy. heartbeat[7053]: 2008/06/11_14:19:04 info: Starting "/usr/lib/heartbeat/ipfail" as uid 498 gid 496 (pid 7053) heartbeat[7055]: 2008/06/11_14:19:04 info: No local resources [/usr/share/heartbeat/ResourceManager listkeys einstein.prueba.uy] to acquire. harc[7054]: 2008/06/11_14:19:04 info: Running /etc/ha.d/rc.d/status status mach_down[7083]: 2008/06/11_14:19:04 info: Taking over resource group 100.0.4.100 ResourceManager[7109]: 2008/06/11_14:19:04 info: Acquiring resource group: maximatt.prueba.uy 100.0.4.100 httpd IPaddr[7136]: 2008/06/11_14:19:04 INFO: Resource is stopped ResourceManager[7109]: 2008/06/11_14:19:04 info: Running /etc/ha.d/resource.d/IPaddr 100.0.4.100 start IPaddr[7212]: 2008/06/11_14:19:04 INFO: Using calculated nic for 100.0.4.100: eth1 IPaddr[7212]: 2008/06/11_14:19:04 INFO: Using calculated netmask for 100.0.4.100: 255.0.0.0 IPaddr[7212]: 2008/06/11_14:19:04 INFO: eval ifconfig eth1:0 100.0.4.100netmask 255.0.0.0 broadcast 100.255.255.255 IPaddr[7195]: 2008/06/11_14:19:04 INFO: Success ResourceManager[7109]: 2008/06/11_14:19:04 info: Running /etc/init.d/httpd start mach_down[7083]: 2008/06/11_14:19:06 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired mach_down[7083]: 2008/06/11_14:19:06 info: mach_down takeover complete for node maximatt.prueba.uy. heartbeat[7045]: 2008/06/11_14:19:06 info: mach_down takeover complete. heartbeat[7045]: 2008/06/11_14:19:06 info: Initial resource acquisition complete (mach_down) and tcpdum results from node1, (in node 2 i have the same results, without packed droppeds): # tcpdump -i dev20603 -n -p udp port 694 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on dev20603, link-type EN10MB (Ethernet), capture size 96 bytes 14:05:42.328650 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 221 14:05:42.329821 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 217 14:05:43.332506 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 221 14:05:43.332672 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 217 : : 14:07:23.406543 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 221 14:07:23.461439 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 222 14:07:24.409366 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 221 14:07:24.457139 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 222 14:07:25.412177 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 221 14:07:25.461096 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 222 14:07:26.416052 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 231 14:07:26.416099 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 221 14:07:26.466779 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 222 14:07:27.408794 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 221 14:07:27.471381 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 232 14:07:27.471423 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 222 14:07:28.411623 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 221 14:07:28.467746 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 222 14:07:28.467915 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 231 14:07:29.414431 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP, length 221 14:07:29.473391 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP, length 222 262 packets captured 262 packets received by filter 0 packets dropped by kernel mmm :( I am remaining without ideas to test what's happend... (but i keep trying :) ) i try to change the net interface used, to have the same ethetnet device ("eth1") for heartbeat channel... and setting the others ethernet card for the service... thanks again!!!! :) Salu2!! ;) _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems