We have 2 nodes running heartbeat 2.1.3
 
Node 1 (hostname 'ha03') is primary for resource name 'ha_mysql'
 
Node 2 (hostname 'ha04') is primary for resource name 'ha_ftp'
 
For two days, Node 2 was offline while we upgraded its kernel and drbd 
versions. It's back up and now we're trying to upgrade Node 1. When we try to 
force Node 1 to go standby, it succeeds. A few seconds later it fails back.
 
However, resource 'ha_ftp' did not fail back. Node 2 kept it (perhaps because 
it it primary for that resource?).
 
ha.cf from Node 1
-----------------
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
traditional_compression false
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 696
baud    19200
serial  /dev/ttyS0
bcast   bond0
#bcast  eth1
#mcast eth0 225.0.0.1 696 1 0
auto_failback off
#watchdog /dev/watchdog
node    ha03.domain-name-censored.local
node    ha04.domain-name-censored.local
respawn hacluster /usr/lib/heartbeat/ipfail
ping 192.168.10.100
debug 1
apiauth ipfail gid=haclient uid=hacluster
#apiauth ccm uid=hacluster
#apiauth ipfail gid=haclient uid=alanr,root
#apiauth default gid=haclient
 
ha.cf from Node 2
-----------------
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility     local0
traditional_compression false
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 696
baud    19200
serial  /dev/ttyS0
bcast   bond0
#bcast  eth1
#mcast eth0 225.0.0.1 696 1 0
auto_failback off
#watchdog /dev/watchdog
node    ha03.domain-name-censored.local
node    ha04.domain-name-censored.local
respawn hacluster /usr/lib/heartbeat/ipfail
ping 192.168.10.100
debug 1
apiauth ipfail gid=haclient uid=hacluster
#apiauth ccm uid=hacluster
#apiauth ipfail gid=haclient uid=alanr,root
#apiauth default gid=haclient

ha-debug from Node 1
--------------------
heartbeat[9733]: 2008/12/22_07:27:26 debug: StartNextRemoteRscReq() - calling 
hook
heartbeat[9733]: 2008/12/22_07:27:26 debug: notify_world: invoking harc: OLD 
status: active
heartbeat[9733]: 2008/12/22_07:27:26 debug: Process [hb_takeover] started pid 
17604
heartbeat[9733]: 2008/12/22_07:27:26 debug: Starting notify process 
[hb_takeover]
heartbeat[17604]: 2008/12/22_07:27:26 debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
heartbeat[17604]: 2008/12/22_07:27:26 debug: notify_world: Running harc 
hb_takeover
harc[17604]:    2008/12/22_07:27:26 info: Running /etc/ha.d/rc.d/hb_takeover 
hb_takeover
hb_standby[17620]:      2008/12/22_07:27:26 Going standby [local].
heartbeat[9733]: 2008/12/22_07:27:26 debug: Received standby message me from 
ha03.domain-name-censored.local in state 0
heartbeat[9733]: 2008/12/22_07:27:26 debug: ask_for_resources: other now 
unstable
heartbeat[9733]: 2008/12/22_07:27:26 info: ha03.domain-name-censored.local 
wants to go standby [local]
heartbeat[9733]: 2008/12/22_07:27:26 info: i_hold_resources: 1
heartbeat[9733]: 2008/12/22_07:27:26 info: New standby state: 1
heartbeat[9733]: 2008/12/22_07:27:26 info: Managed hb_takeover process 17604 
exited with return code 0.
heartbeat[9733]: 2008/12/22_07:27:26 debug: RscMgmtProc 'hb_takeover' exited 
code 0
heartbeat[9733]: 2008/12/22_07:27:26 debug: Received standby message other from 
ha04.domain-name-censored.local in state 1
heartbeat[9733]: 2008/12/22_07:27:26 info: standby: 
ha04.domain-name-censored.local can take our local resources
heartbeat[9733]: 2008/12/22_07:27:26 debug: go_standby: other is unstable
heartbeat[9733]: 2008/12/22_07:27:26 debug: Sending hold resources msg: none, 
stable=0 # standby
heartbeat[9733]: 2008/12/22_07:27:26 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0, 
going_standby: 1, standby running(ms): -521182662, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:27:26 debug: Process [go_standby] started pid 
17634
heartbeat[9733]: 2008/12/22_07:27:26 info: New standby state: 1
heartbeat[17634]: 2008/12/22_07:27:26 info: give up local HA resources 
(standby).
heartbeat[17634]: 2008/12/22_07:27:26 info: go_standby: who: 1 resource set: 
local
heartbeat[17634]: 2008/12/22_07:27:26 info: go_standby: (query/action): 
(ourkeys/givegroup)
ResourceManager[17647]: 2008/12/22_07:27:26 info: Releasing resource group: 
ha03.domain-name-censored.local drbddisk::ha_mysql 
Filesystem::/dev/drbd0::/ha_mysql::ext3 IPaddr2::192.168.10.201/24/bond0 
mysql_001 mysql_002
ResourceManager[17647]: 2008/12/22_07:27:26 info: Running /etc/init.d/mysql_002 
 stop
ResourceManager[17647]: 2008/12/22_07:27:26 debug: Starting 
/etc/init.d/mysql_002  stop
Killing mysqld with pid 17298
ResourceManager[17647]: 2008/12/22_07:27:27 debug: /etc/init.d/mysql_002  stop 
done. RC=0
ResourceManager[17647]: 2008/12/22_07:27:27 info: Running /etc/init.d/mysql_001 
 stop
ResourceManager[17647]: 2008/12/22_07:27:27 debug: Starting 
/etc/init.d/mysql_001  stop
Killing mysqld with pid 17281
ResourceManager[17647]: 2008/12/22_07:27:28 debug: /etc/init.d/mysql_001  stop 
done. RC=0
ResourceManager[17647]: 2008/12/22_07:27:28 info: Running 
/etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 stop
ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting 
/etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 stop
IPaddr2[17782]: 2008/12/22_07:27:28 INFO: ip -f inet addr delete 
192.168.10.201/24 dev bond0
IPaddr2[17782]: 2008/12/22_07:27:28 INFO: ip -o -f inet addr show bond0
IPaddr2[17753]: 2008/12/22_07:27:28 INFO:  Success
INFO:  Success
ResourceManager[17647]: 2008/12/22_07:27:28 debug: /etc/ha.d/resource.d/IPaddr2 
192.168.10.201/24/bond0 stop done. RC=0
ResourceManager[17647]: 2008/12/22_07:27:28 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop
ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop
Filesystem[17863]:      2008/12/22_07:27:28 INFO: Running stop for /dev/drbd0 
on /ha_mysql
Filesystem[17863]:      2008/12/22_07:27:28 INFO: Trying to unmount /ha_mysql
Filesystem[17863]:      2008/12/22_07:27:28 INFO: unmounted /ha_mysql 
successfully
Filesystem[17852]:      2008/12/22_07:27:28 INFO:  Success
INFO:  Success
ResourceManager[17647]: 2008/12/22_07:27:28 debug: 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop done. RC=0
ResourceManager[17647]: 2008/12/22_07:27:28 info: Running 
/etc/ha.d/resource.d/drbddisk ha_mysql stop
ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting 
/etc/ha.d/resource.d/drbddisk ha_mysql stop
ResourceManager[17647]: 2008/12/22_07:27:28 debug: 
/etc/ha.d/resource.d/drbddisk ha_mysql stop done. RC=0
heartbeat[17634]: 2008/12/22_07:27:28 info: local HA resource release completed 
(standby).
heartbeat[17634]: 2008/12/22_07:27:28 debug: Sending standby [done] msg
heartbeat[17634]: 2008/12/22_07:27:28 info: FIFO message [type ask_resources] 
written rc=49
heartbeat[9733]: 2008/12/22_07:27:28 debug: Received standby message done from 
ha03.domain-name-censored.local in state 1
heartbeat[9733]: 2008/12/22_07:27:28 info: Local standby process completed 
[local].
heartbeat[9733]: 2008/12/22_07:27:28 info: New standby state: 3
heartbeat[9733]: 2008/12/22_07:27:28 info: Managed go_standby process 17634 
exited with return code 0.
heartbeat[9733]: 2008/12/22_07:27:28 debug: RscMgmtProc 'go_standby' exited 
code 0
heartbeat[9733]: 2008/12/22_07:27:50 WARN: 1 lost packet(s) for 
[ha04.domain-name-censored.local] [3100:3102]
heartbeat[9733]: 2008/12/22_07:27:50 info: remote resource transition completed.
heartbeat[9733]: 2008/12/22_07:27:50 debug: Sending hold resources msg: none, 
stable=1 # <none>
heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
going_standby: 3, standby running(ms): -521180402, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:27:50 debug: Calling PerformAutoFailback()
heartbeat[9733]: 2008/12/22_07:27:50 info: other_holds_resources: 3
heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
going_standby: 3, standby running(ms): -521180402, resourcestate: 4
ipfail[10103]: 2008/12/22_07:27:50 debug: Other side is now stable.
heartbeat[9733]: 2008/12/22_07:27:50 info: No pkts missing from 
ha04.domain-name-censored.local!
heartbeat[9733]: 2008/12/22_07:27:50 debug: Received standby message done from 
ha04.domain-name-censored.local in state 3
heartbeat[9733]: 2008/12/22_07:27:50 info: Other node completed standby 
takeover of local resources.
heartbeat[9733]: 2008/12/22_07:27:50 debug: Sending hold resources msg: none, 
stable=1 # <none>
heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:27:50 info: New standby state: 0
heartbeat[9733]: 2008/12/22_07:27:51 info: other_holds_resources: 3
heartbeat[9733]: 2008/12/22_07:27:51 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[10103]: 2008/12/22_07:27:51 debug: Other side is now stable.
heartbeat[9733]: 2008/12/22_07:28:21 debug: Received standby message me from 
ha04.domain-name-censored.local in state 0
heartbeat[9733]: 2008/12/22_07:28:21 debug: ask_for_resources: other now 
unstable
heartbeat[9733]: 2008/12/22_07:28:21 info: ha04.domain-name-censored.local 
wants to go standby [foreign]
heartbeat[9733]: 2008/12/22_07:28:21 info: standby: other_holds_resources: 3
heartbeat[9733]: 2008/12/22_07:28:21 debug: Sending standby [other] msg
heartbeat[9733]: 2008/12/22_07:28:21 debug: Received standby message other from 
ha03.domain-name-censored.local in state 2
heartbeat[9733]: 2008/12/22_07:28:21 info: New standby state: 2
heartbeat[9733]: 2008/12/22_07:28:21 info: New standby state: 2
heartbeat[9733]: 2008/12/22_07:28:21 debug: process_resources(2):  other now 
unstable
heartbeat[9733]: 2008/12/22_07:28:21 info: other_holds_resources: 1
heartbeat[9733]: 2008/12/22_07:28:21 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0, 
going_standby: 2, standby running(ms): -521128082, resourcestate: 4
ipfail[10103]: 2008/12/22_07:28:21 debug: Other side is unstable.
heartbeat[9733]: 2008/12/22_07:28:42 debug: Received standby message done from 
ha04.domain-name-censored.local in state 2
heartbeat[9733]: 2008/12/22_07:28:42 info: standby: acquire [foreign] resources 
from ha04.domain-name-censored.local
heartbeat[9733]: 2008/12/22_07:28:42 debug: Process [go_standby] started pid 
18012
heartbeat[9733]: 2008/12/22_07:28:42 info: New standby state: 3
heartbeat[18012]: 2008/12/22_07:28:42 info: acquire local HA resources 
(standby).
heartbeat[18012]: 2008/12/22_07:28:42 info: go_standby: who: 2 resource set: 
local
heartbeat[18012]: 2008/12/22_07:28:42 info: go_standby: (query/action): 
(ourkeys/takegroup)
ResourceManager[18025]: 2008/12/22_07:28:42 info: Acquiring resource group: 
ha03.domain-name-censored.local drbddisk::ha_mysql 
Filesystem::/dev/drbd0::/ha_mysql::ext3 IPaddr2::192.168.10.201/24/bond0 
mysql_001 mysql_002
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running 
/etc/ha.d/resource.d/drbddisk ha_mysql start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
/etc/ha.d/resource.d/drbddisk ha_mysql start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: 
/etc/ha.d/resource.d/drbddisk ha_mysql start done. RC=0
Filesystem[18093]:      2008/12/22_07:28:42 INFO:  Resource is stopped
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start
Filesystem[18174]:      2008/12/22_07:28:42 INFO: Running start for /dev/drbd0 
on /ha_mysql
Filesystem[18163]:      2008/12/22_07:28:42 INFO:  Success
INFO:  Success
ResourceManager[18025]: 2008/12/22_07:28:42 debug: 
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start done. RC=0
IPaddr2[18248]: 2008/12/22_07:28:42 INFO:  Resource is stopped
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running 
/etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
/etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 start
IPaddr2[18360]: 2008/12/22_07:28:42 INFO: ip -f inet addr add 192.168.10.201/24 
brd 192.168.10.255 dev bond0
IPaddr2[18360]: 2008/12/22_07:28:42 INFO: ip link set bond0 up
IPaddr2[18360]: 2008/12/22_07:28:42 INFO: /usr/lib/heartbeat/send_arp -i 200 -r 
5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.10.201 bond0 
192.168.10.201 auto not_used not_used
IPaddr2[18331]: 2008/12/22_07:28:42 INFO:  Success
INFO:  Success
ResourceManager[18025]: 2008/12/22_07:28:42 debug: /etc/ha.d/resource.d/IPaddr2 
192.168.10.201/24/bond0 start done. RC=0
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running /etc/init.d/mysql_001 
 start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
/etc/init.d/mysql_001  start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: /etc/init.d/mysql_001  start 
done. RC=0
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running /etc/init.d/mysql_002 
 start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
/etc/init.d/mysql_002  start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: /etc/init.d/mysql_002  start 
done. RC=0
heartbeat[18012]: 2008/12/22_07:28:42 info: local HA resource acquisition 
completed (standby).
heartbeat[18012]: 2008/12/22_07:28:42 debug: Sending standby [done] msg
heartbeat[18012]: 2008/12/22_07:28:42 info: FIFO message [type ask_resources] 
written rc=51
heartbeat[9733]: 2008/12/22_07:28:42 debug: Received standby message done from 
ha03.domain-name-censored.local in state 3
heartbeat[9733]: 2008/12/22_07:28:42 info: Standby resource acquisition done 
[foreign].
heartbeat[9733]: 2008/12/22_07:28:42 debug: Sending hold resources msg: local, 
stable=1 # <none>
heartbeat[9733]: 2008/12/22_07:28:42 info: AnnounceTakeover(local 1, foreign 1, 
reason 'T_RESOURCES(us)' (1))
heartbeat[9733]: 2008/12/22_07:28:42 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, 
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:28:42 info: New standby state: 0
heartbeat[9733]: 2008/12/22_07:28:42 info: Managed go_standby process 18012 
exited with return code 0.
heartbeat[9733]: 2008/12/22_07:28:42 debug: RscMgmtProc 'go_standby' exited 
code 0
heartbeat[9733]: 2008/12/22_07:28:43 info: remote resource transition completed.
heartbeat[9733]: 2008/12/22_07:28:43 debug: Sending hold resources msg: local, 
stable=1 # <none>
heartbeat[9733]: 2008/12/22_07:28:43 info: AnnounceTakeover(local 1, foreign 1, 
reason 'T_RESOURCES(us)' (1))
heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:28:43 debug: Calling PerformAutoFailback()
heartbeat[9733]: 2008/12/22_07:28:43 info: other_holds_resources: 1
heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[10103]: 2008/12/22_07:28:43 debug: Other side is now stable.
heartbeat[9733]: 2008/12/22_07:28:43 info: other_holds_resources: 1
heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[10103]: 2008/12/22_07:28:43 debug: Other side is now stable.
heartbeat[9733]: 2008/12/22_07:29:23 debug: APIregistration_dispatch() {
heartbeat[9733]: 2008/12/22_07:29:23 debug: process_registerevent() {
heartbeat[9733]: 2008/12/22_07:29:23 debug: client->gsource = 0x8bcdb40
heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*process_registerevent*/;
heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*APIregistration_dispatch*/;
heartbeat[9733]: 2008/12/22_07:29:23 debug: Checking client authorization for 
client 18641 (0:496)
heartbeat[9733]: 2008/12/22_07:29:23 debug: create_seq_snapshot_table:no 
missing packets found for node ha03.domain-name-censored.local
heartbeat[9733]: 2008/12/22_07:29:23 debug: create_seq_snapshot_table:no 
missing packets found for node ha04.domain-name-censored.local
heartbeat[9733]: 2008/12/22_07:29:23 debug: Signing on API client 18641 
('casual')
heartbeat[9733]: 2008/12/22_07:29:23 debug: hb_rsc_isstable: 
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:29:23 debug: Signing client 18641 off
heartbeat[9733]: 2008/12/22_07:29:23 debug: G_remove_client(pid=18641, 
reason='signoff' gsource=0x8bcdb40) {
heartbeat[9733]: 2008/12/22_07:29:23 debug: api_remove_client_int: removing pid 
[18641] reason: signoff
heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*G_remove_client;*/
 
--Eric
 
Sorry for annoying server-appended disclaimer
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
 
 


Disclaimer - December 22, 2008 
This email and any files transmitted with it are confidential and intended 
solely for General Linux-HA mailing list,[email protected]. If you 
are not the named addressee you should not disseminate, distribute, copy or 
alter this email. Any views or opinions presented in this email are solely 
those of the author and might not represent those of . Warning: Although  has 
taken reasonable precautions to ensure no viruses are present in this email, 
the company cannot accept responsibility for any loss or damage arising from 
the use of this email or attachments. 
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to