Ah, I fixed this one already - which is why when I looked at the code
I couldn't figure out the reason.
This contents of this block:
if(num_active < 0) {
g_hash_table_foreach(ping_nodes, count_ping_nodes, &num_active);
}
needs:
num_active = 0;
as the first statement.
On Mon, Mar 2, 2009 at 11:38, Michael Schwartzkopff <[email protected]> wrote:
> Am Montag, 2. März 2009 08:35:51 schrieb Andrew Beekhof:
>> On Fri, Feb 27, 2009 at 16:04, Michael Schwartzkopff <[email protected]>
> wrote:
>> > Am Freitag, 27. Februar 2009 15:21:34 schrieb Michael Schwartzkopff:
>> >> Hi,
>> >>
>> >> my system: debian lenny, heartbeat 2.99.2-1, pacemaker 1.0.1-1.
>> >>
>> >> In ha.cf I have 2 ping nodes:
>> >> ping 82.135.103.97 192.168.188.19
>> >>
>> >> From the command line I can ping both hosts. When I start heartbeat I
>> >> see that my node is sending and receiving icmp packets to and from both
>> >> hosts.
>> >>
>> >> Also the log file recognises two ping nodes:
>> >> Feb 27 14:13:54 fw4 heartbeat: [24762]: info: Link
>> >> 82.135.103.97:82.135.103.97 up.
>> >> Feb 27 14:13:54 fw4 heartbeat: [24762]: info: Status update for node
>> >> 82.135.103.97: status ping
>> >> Feb 27 14:13:54 fw4 heartbeat: [24762]: info: Status update for node
>> >> 192.168.188.119: status ping
>> >> Feb 27 14:13:54 fw4 heartbeat: [24762]: info: Link
>> >> 192.168.188.119:192.168.188.119 up.
>> >>
>> >> When I start ping manually
>> >> /usr/lib/heartbeat/pingd -m100 -d5
>> >>
>> >> I see that the ndoe only recoginses ONE ping node and according only
>> >> gives 100 points:
>> >
>> > (...)
>> >
>> >> the problem occures on BOTH nodes of the cluster. The ping node that is
>> >> not recognized is the default router. Anybody any idea what went wrong
>> >> here? Thank for helping.
>> >
>> > Hi,
>> > playing further with the cluster it seems that it always counts one ping
>> > node to few points.
>> > I configured 3 ping nodes and I got 200 points.
>> > When I disable icmp for one ping node I got 100 points.
>> > When I disable icmp for the next host I get 0 points.
>> > When I disable icmp for the last pingnode I get -100 points.
>> >
>> > Is this feature somewhere documented?
>>
>> Its not a feature.
>> Can you add a couple of -V options to your pingd command and attach the
>> logs? _______________________________________________
>
> Here you are:
>
> Mar 2 11:31:07 fw4 pingd: [21065]: info: Invoked: /usr/lib/heartbeat/pingd -V
> -V -V -V -V -m 100 -d5s -a pingd
> Mar 2 11:31:07 fw4 pingd: [21065]: debug: main: attrd registration attempt: 0
> Mar 2 11:31:12 fw4 pingd: [21065]: debug: init_client_ipc_comms_nodispatch:
> Attempting to talk on: /var/run/heartbeat/crm/attrd
> Mar 2 11:31:12 fw4 pingd: [21065]: debug: debug3:
> init_client_ipc_comms_nodispatch: Processing of /var/run/heartbeat/crm/attrd
> complete
> Mar 2 11:31:12 fw4 pingd: [21065]: debug: register_with_ha: Signing in with
> Heartbeat
> Mar 2 11:31:12 fw4 pingd: [21065]: debug: debug2: do_node_walk: Invoked
> Mar 2 11:31:12 fw4 pingd: [21065]: debug: debug3: do_node_walk: Requesting an
> initial dump of CRMD client_status
> Mar 2 11:31:13 fw4 pingd: [21065]: info: do_node_walk: Requesting the list of
> configured nodes
> Mar 2 11:31:13 fw4 pingd: [21065]: debug: do_node_walk: Node fw3: skipping
> 'normal'
> Mar 2 11:31:13 fw4 pingd: [21065]: debug: do_node_walk: Node fw4: skipping
> 'normal'
> Mar 2 11:31:14 fw4 pingd: [21065]: debug: do_node_walk: Adding:
> 192.168.189.4=ping
> Mar 2 11:31:14 fw4 pingd: [21065]: debug: do_node_walk: Adding:
> 192.168.188.110=ping
> Mar 2 11:31:15 fw4 pingd: [21065]: debug: do_node_walk: Adding:
> 82.135.103.97=ping
> Mar 2 11:31:15 fw4 pingd: [21065]: debug: debug2: do_node_walk: Complete
> Mar 2 11:31:15 fw4 pingd: [21065]: info: send_update: 2 active ping nodes
> Mar 2 11:31:15 fw4 pingd: [21065]: debug: debug3: register_with_ha: Be
> informed of Node Status changes
> Mar 2 11:31:15 fw4 pingd: [21065]: debug: debug3: register_with_ha: Adding
> channel to mainloop
> Mar 2 11:31:15 fw4 pingd: [21065]: info: main: Starting pingd
>
> OK. I started pingd. ha.cf tells pingd about 3 nodes, but it sees only 2
> nodes. Now I do a
> iptables -I INPUT -p icmp -s 82.135.103.97 -j DROP to simulate a network
> failure. Syslog detects this and reduces the points by 100:
>
> Mar 2 11:31:49 fw4 pingd: [21065]: debug: debug2: pingd_ha_dispatch: Invoked
> Mar 2 11:31:49 fw4 pingd: [21065]: notice: pingd_nstatus_callback: Status
> update: Ping node 82.135.103.97 now has status [dead]
> Mar 2 11:31:49 fw4 pingd: [21065]: info: send_update: 1 active ping nodes
> Mar 2 11:31:49 fw4 pingd: [21065]: debug: debug2: pingd_ha_dispatch: no
> message ready yet
> Mar 2 11:31:49 fw4 heartbeat: [20957]: WARN: node 82.135.103.97: is dead
> Mar 2 11:31:49 fw4 crmd: [20979]: notice: crmd_ha_status_callback: Status
> update: Node 82.135.103.97 now has status [dead] (DC=false)
> Mar 2 11:31:49 fw4 crmd: [20979]: WARN: get_uuid: Could not calculate UUID
> for 82.135.103.97
> Mar 2 11:31:54 fw4 attrd: [20978]: info: attrd_trigger_update: Sending flush
> op to all hosts for: pingd
> Mar 2 11:31:54 fw4 attrd: [20978]: info: attrd_ha_callback: flush message
> from
> fw4
> Mar 2 11:31:54 fw4 cib: [20975]: info: cib_process_xpath: Processing
> cib_query op for //cib/status//node_sta...@id='cbb68cf2-594f-4775-a604-
> ded2f6aa08a5']//nvpa...@name='pingd']
> (/cib/status/node_state[2]/transient_attributes/instance_attributes/nvpair[2])
> Mar 2 11:31:54 fw4 attrd: [20978]: info: attrd_perform_update: Sent update
> 19: pingd=100
>
> I remove the iptables rule and the cluster can see the ping node again:
> Mar 2 11:32:01 fw4 pingd: [21065]: debug: debug2: pingd_ha_dispatch: Invoked
> Mar 2 11:32:01 fw4 pingd: [21065]: notice: pingd_nstatus_callback: Status
> update: Ping node 82.135.103.97 now has status [ping]
> Mar 2 11:32:01 fw4 pingd: [21065]: info: send_update: 2 active ping nodes
> Mar 2 11:32:01 fw4 pingd: [21065]: debug: debug2: pingd_ha_dispatch: no
> message ready yet
> Mar 2 11:32:01 fw4 heartbeat: [20957]: WARN: Late heartbeat: Node
> 82.135.103.97: interval 18000 ms
> Mar 2 11:32:01 fw4 heartbeat: [20957]: info: Status update for node
> 82.135.103.97: status ping
> Mar 2 11:32:01 fw4 crmd: [20979]: notice: crmd_ha_status_callback: Status
> update: Node 82.135.103.97 now has status [ping] (DC=false)
> Mar 2 11:32:01 fw4 crmd: [20979]: WARN: get_uuid: Could not calculate UUID
> for 82.135.103.97
> Mar 2 11:32:06 fw4 attrd: [20978]: info: attrd_trigger_update: Sending flush
> op to all hosts for: pingd
> Mar 2 11:32:06 fw4 cib: [20975]: info: cib_process_xpath: Processing
> cib_query op for //cib/status//node_sta...@id='cbb68cf2-594f-4775-a604-
> ded2f6aa08a5']//nvpa...@name='pingd']
> (/cib/status/node_state[2]/transient_attributes/instance_attributes/nvpair[2])
> Mar 2 11:32:06 fw4 attrd: [20978]: info: attrd_ha_callback: flush message
> from
> fw4
> Mar 2 11:32:06 fw4 attrd: [20978]: info: attrd_perform_update: Sent update
> 21: pingd=200
>
> The same applies for the other tow ping nodes.
>
>
> --
> Dr. Michael Schwartzkopff
> MultiNET Services GmbH
> Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
> Tel: +49 - 89 - 45 69 11 0
> Fax: +49 - 89 - 45 69 11 21
> mob: +49 - 174 - 343 28 75
>
> mail: [email protected]
> web: www.multinet.de
>
> Sitz der Gesellschaft: 85630 Grasbrunn
> Registergericht: Amtsgericht München HRB 114375
> Geschäftsführer: Günter Jurgeneit, Hubert Martens
>
> ---
>
> PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
> Skype: misch42
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems