hi... thanks again....


> This deadtime is probably to big. Something like 10 should be
> more appropriate.
>

your suggestion is done :)

This is most probably a communication problem. Please make sure
> that the packets are reaching both nodes. You can use "tcpdump
> udp port 694" to check that. There is also cl_status with which
> you can check the link status. You can also try unicast...


i think i not have a communication problem :( (but... never knows)

i do your suggestion (trying with mcats and ucast.. and i have the same
results :(

here are log of node1 and node2 (setting ucast parameter):

node1 log:

heartbeat[8899]: 2008/06/11_14:02:03 info: Version 2 support: false
heartbeat[8899]: 2008/06/11_14:02:03 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[8899]: 2008/06/11_14:02:03 info: **************************
heartbeat[8899]: 2008/06/11_14:02:03 info: Configuration validated. Starting
heartbeat 2.1.3
heartbeat[8900]: 2008/06/11_14:02:03 info: heartbeat: version 2.1.3
heartbeat[8900]: 2008/06/11_14:02:04 info: Heartbeat generation: 1207833073
heartbeat[8900]: 2008/06/11_14:02:04 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on dev20603
heartbeat[8900]: 2008/06/11_14:02:04 info: glib: ucast: bound send socket to
device: dev20603
heartbeat[8900]: 2008/06/11_14:02:04 info: glib: ucast: bound receive socket
to device: dev20603
heartbeat[8900]: 2008/06/11_14:02:04 info: glib: ucast: started on port 694
interface dev20603 to 192.168.140.2
heartbeat[8900]: 2008/06/11_14:02:04 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[8900]: 2008/06/11_14:02:04 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[8900]: 2008/06/11_14:02:04 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[8900]: 2008/06/11_14:02:04 info: Local status now set to: 'up'
*heartbeat[8900]: 2008/06/11_14:04:04 WARN: node einstein.prueba.uy: is dead
*
heartbeat[8900]: 2008/06/11_14:04:04 info: Comm_now_up(): updating status to
active
heartbeat[8900]: 2008/06/11_14:04:04 info: Local status now set to: 'active'
heartbeat[8900]: 2008/06/11_14:04:04 info: Starting child client
"/usr/lib/heartbeat/ipfail" (498,496)
heartbeat[8914]: 2008/06/11_14:04:04 info: Starting
"/usr/lib/heartbeat/ipfail" as uid 498  gid 496 (pid 8914)
heartbeat[8900]: 2008/06/11_14:04:04 WARN: No STONITH device configured.
heartbeat[8900]: 2008/06/11_14:04:04 WARN: Shared disks are not protected.
heartbeat[8900]: 2008/06/11_14:04:04 info: Resources being acquired from
einstein.prueba.uy.
harc[8915]:     2008/06/11_14:04:04 info: Running /etc/ha.d/rc.d/status
status
mach_down[8945]:        2008/06/11_14:04:04 info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[8945]:        2008/06/11_14:04:04 info: mach_down takeover
complete for node einstein.prueba.uy.
heartbeat[8900]: 2008/06/11_14:04:04 info: mach_down takeover complete.
heartbeat[8900]: 2008/06/11_14:04:04 info: Initial resource acquisition
complete (mach_down)
IPaddr[8988]:   2008/06/11_14:04:04 INFO:  Resource is stopped
heartbeat[8916]: 2008/06/11_14:04:04 info: Local Resource acquisition
completed.
harc[9040]:     2008/06/11_14:04:04 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[9040]:  2008/06/11_14:04:04 received ip-request-resp
100.0.4.100 OK yes
ResourceManager[9061]:  2008/06/11_14:04:04 info: Acquiring resource group:
maximatt.prueba.uy 100.0.4.100 httpd
IPaddr[9088]:   2008/06/11_14:04:04 INFO:  Resource is stopped
ResourceManager[9061]:  2008/06/11_14:04:04 info: Running
/etc/ha.d/resource.d/IPaddr 100.0.4.100 start
IPaddr[9164]:   2008/06/11_14:04:04 INFO: Using calculated nic for
100.0.4.100: eth1
IPaddr[9164]:   2008/06/11_14:04:04 INFO: Using calculated netmask for
100.0.4.100: 255.0.0.0
IPaddr[9164]:   2008/06/11_14:04:04 INFO: eval ifconfig eth1:0
100.0.4.100netmask
255.0.0.0 broadcast 100.255.255.255
IPaddr[9147]:   2008/06/11_14:04:05 INFO:  Success
ResourceManager[9061]:  2008/06/11_14:04:05 info: Running /etc/init.d/httpd
start
heartbeat[8900]: 2008/06/11_14:04:14 info: Local Resource acquisition
completed. (none)
heartbeat[8900]: 2008/06/11_14:04:14 info: local resource transition
completed.
heartbeat[8900]: 2008/06/11_14:05:21 info: Link einstein.prueba.uy:dev20603
up.
heartbeat[8900]: 2008/06/11_14:05:21 info: Status update for node
einstein.prueba.uy: status init
heartbeat[8900]: 2008/06/11_14:05:21 info: Status update for node
einstein.prueba.uy: status up
ipfail[8914]: 2008/06/11_14:05:21 info: Link Status update: Link
einstein.prueba.uy/dev20603 now has status up
ipfail[8914]: 2008/06/11_14:05:21 info: Status update: Node
einstein.prueba.uy now has status init
ipfail[8914]: 2008/06/11_14:05:21 info: Status update: Node
einstein.prueba.uy now has status up
harc[9329]:     2008/06/11_14:05:21 info: Running /etc/ha.d/rc.d/status
status
harc[9346]:     2008/06/11_14:05:21 info: Running /etc/ha.d/rc.d/status
status
heartbeat[8900]: 2008/06/11_14:06:01 info: all clients are now paused
*heartbeat[8900]: 2008/06/11_14:07:20 WARN: 1 lost packet(s) for [
einstein.prueba.uy] [124:126]*
*heartbeat[8900]: 2008/06/11_14:07:20 info: Status update for node
einstein.prueba.uy: status active*
*heartbeat[8900]: 2008/06/11_14:07:20 info: No pkts missing from
einstein.prueba.uy!*
*ipfail[8914]: 2008/06/11_14:07:20 info: Status update: Node
einstein.prueba.uy now has status active*
heartbeat[8900]: 2008/06/11_14:07:20 info: remote resource transition
completed.
heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own our resources!
heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own foreign
resources!
heartbeat[8900]: 2008/06/11_14:07:20 info: maximatt.prueba.uy wants to go
standby [foreign]
heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own our resources!
heartbeat[8900]: 2008/06/11_14:07:20 ERROR: Both machines own foreign
resources!
harc[9368]:     2008/06/11_14:07:20 info: Running /etc/ha.d/rc.d/status
status
heartbeat[8900]: 2008/06/11_14:07:22 ERROR: Both machines own our resources!
heartbeat[8900]: 2008/06/11_14:07:22 ERROR: Both machines own foreign
resources!

node2 log:

heartbeat[7044]: 2008/06/11_14:17:03 info: **************************
heartbeat[7044]: 2008/06/11_14:17:03 info: Configuration validated. Starting
heartbeat 2.1.3
heartbeat[7045]: 2008/06/11_14:17:03 info: heartbeat: version 2.1.3
heartbeat[7045]: 2008/06/11_14:17:03 info: Heartbeat generation: 1207843598
heartbeat[7045]: 2008/06/11_14:17:03 info: glib: ucast: write socket
priority set to IPTOS_LOWDELAY on eth0
heartbeat[7045]: 2008/06/11_14:17:03 info: glib: ucast: bound send socket to
device: eth0
heartbeat[7045]: 2008/06/11_14:17:03 info: glib: ucast: bound receive socket
to device: eth0
heartbeat[7045]: 2008/06/11_14:17:03 info: glib: ucast: started on port 694
interface eth0 to 192.168.140.1
heartbeat[7045]: 2008/06/11_14:17:03 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[7045]: 2008/06/11_14:17:03 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[7045]: 2008/06/11_14:17:03 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[7045]: 2008/06/11_14:17:03 info: Local status now set to: 'up'
*heartbeat[7045]: 2008/06/11_14:19:04 WARN: node maximatt.prueba.uy: is dead
*
heartbeat[7045]: 2008/06/11_14:19:04 info: Comm_now_up(): updating status to
active
heartbeat[7045]: 2008/06/11_14:19:04 info: Local status now set to: 'active'
heartbeat[7045]: 2008/06/11_14:19:04 info: Starting child client
"/usr/lib/heartbeat/ipfail" (498,496)
heartbeat[7045]: 2008/06/11_14:19:04 WARN: No STONITH device configured.
heartbeat[7045]: 2008/06/11_14:19:04 WARN: Shared disks are not protected.
heartbeat[7045]: 2008/06/11_14:19:04 info: Resources being acquired from
maximatt.prueba.uy.
heartbeat[7053]: 2008/06/11_14:19:04 info: Starting
"/usr/lib/heartbeat/ipfail" as uid 498  gid 496 (pid 7053)
heartbeat[7055]: 2008/06/11_14:19:04 info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys einstein.prueba.uy] to
acquire.
harc[7054]:     2008/06/11_14:19:04 info: Running /etc/ha.d/rc.d/status
status
mach_down[7083]:        2008/06/11_14:19:04 info: Taking over resource group
100.0.4.100
ResourceManager[7109]:  2008/06/11_14:19:04 info: Acquiring resource group:
maximatt.prueba.uy 100.0.4.100 httpd
IPaddr[7136]:   2008/06/11_14:19:04 INFO:  Resource is stopped
ResourceManager[7109]:  2008/06/11_14:19:04 info: Running
/etc/ha.d/resource.d/IPaddr 100.0.4.100 start
IPaddr[7212]:   2008/06/11_14:19:04 INFO: Using calculated nic for
100.0.4.100: eth1
IPaddr[7212]:   2008/06/11_14:19:04 INFO: Using calculated netmask for
100.0.4.100: 255.0.0.0
IPaddr[7212]:   2008/06/11_14:19:04 INFO: eval ifconfig eth1:0
100.0.4.100netmask
255.0.0.0 broadcast 100.255.255.255
IPaddr[7195]:   2008/06/11_14:19:04 INFO:  Success
ResourceManager[7109]:  2008/06/11_14:19:04 info: Running /etc/init.d/httpd
start
mach_down[7083]:        2008/06/11_14:19:06 info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[7083]:        2008/06/11_14:19:06 info: mach_down takeover
complete for node maximatt.prueba.uy.
heartbeat[7045]: 2008/06/11_14:19:06 info: mach_down takeover complete.
heartbeat[7045]: 2008/06/11_14:19:06 info: Initial resource acquisition
complete (mach_down)

and tcpdum results from node1, (in node 2 i have the same results, without
packed droppeds):

# tcpdump -i dev20603 -n -p udp port 694
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on dev20603, link-type EN10MB (Ethernet), capture size 96 bytes
14:05:42.328650 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 221
14:05:42.329821 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 217
14:05:43.332506 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 221
14:05:43.332672 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 217
:
:
14:07:23.406543 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 221
14:07:23.461439 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 222
14:07:24.409366 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 221
14:07:24.457139 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 222
14:07:25.412177 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 221
14:07:25.461096 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 222
14:07:26.416052 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 231
14:07:26.416099 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 221
14:07:26.466779 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 222
14:07:27.408794 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 221
14:07:27.471381 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 232
14:07:27.471423 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 222
14:07:28.411623 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 221
14:07:28.467746 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 222
14:07:28.467915 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 231
14:07:29.414431 IP 192.168.140.2.32773 > 192.168.140.1.ha-cluster: UDP,
length 221
14:07:29.473391 IP 192.168.140.1.20070 > 192.168.140.2.ha-cluster: UDP,
length 222

262 packets captured
262 packets received by filter
0 packets dropped by kernel


mmm :(  I am remaining without ideas to test what's happend... (but i keep
trying :) )

i try to change the net interface used, to have the same ethetnet device
("eth1") for heartbeat channel... and setting the others ethernet card for
the service...

thanks again!!!!  :)


Salu2!! ;)
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to