We have 2 nodes running heartbeat 2.1.3
Node 1 (hostname 'ha03') is primary for resource name 'ha_mysql'
Node 2 (hostname 'ha04') is primary for resource name 'ha_ftp'
For two days, Node 2 was offline while we upgraded its kernel and drbd
versions. It's back up and now we're trying to upgrade Node 1. When we try to
force Node 1 to go standby, it succeeds. A few seconds later it fails back.
However, resource 'ha_ftp' did not fail back. Node 2 kept it (perhaps because
it it primary for that resource?).
ha.cf from Node 1
-----------------
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
traditional_compression false
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 696
baud 19200
serial /dev/ttyS0
bcast bond0
#bcast eth1
#mcast eth0 225.0.0.1 696 1 0
auto_failback off
#watchdog /dev/watchdog
node ha03.domain-name-censored.local
node ha04.domain-name-censored.local
respawn hacluster /usr/lib/heartbeat/ipfail
ping 192.168.10.100
debug 1
apiauth ipfail gid=haclient uid=hacluster
#apiauth ccm uid=hacluster
#apiauth ipfail gid=haclient uid=alanr,root
#apiauth default gid=haclient
ha.cf from Node 2
-----------------
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
traditional_compression false
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 696
baud 19200
serial /dev/ttyS0
bcast bond0
#bcast eth1
#mcast eth0 225.0.0.1 696 1 0
auto_failback off
#watchdog /dev/watchdog
node ha03.domain-name-censored.local
node ha04.domain-name-censored.local
respawn hacluster /usr/lib/heartbeat/ipfail
ping 192.168.10.100
debug 1
apiauth ipfail gid=haclient uid=hacluster
#apiauth ccm uid=hacluster
#apiauth ipfail gid=haclient uid=alanr,root
#apiauth default gid=haclient
ha-debug from Node 1
--------------------
heartbeat[9733]: 2008/12/22_07:27:26 debug: StartNextRemoteRscReq() - calling
hook
heartbeat[9733]: 2008/12/22_07:27:26 debug: notify_world: invoking harc: OLD
status: active
heartbeat[9733]: 2008/12/22_07:27:26 debug: Process [hb_takeover] started pid
17604
heartbeat[9733]: 2008/12/22_07:27:26 debug: Starting notify process
[hb_takeover]
heartbeat[17604]: 2008/12/22_07:27:26 debug: notify_world: setting SIGCHLD
Handler to SIG_DFL
heartbeat[17604]: 2008/12/22_07:27:26 debug: notify_world: Running harc
hb_takeover
harc[17604]: 2008/12/22_07:27:26 info: Running /etc/ha.d/rc.d/hb_takeover
hb_takeover
hb_standby[17620]: 2008/12/22_07:27:26 Going standby [local].
heartbeat[9733]: 2008/12/22_07:27:26 debug: Received standby message me from
ha03.domain-name-censored.local in state 0
heartbeat[9733]: 2008/12/22_07:27:26 debug: ask_for_resources: other now
unstable
heartbeat[9733]: 2008/12/22_07:27:26 info: ha03.domain-name-censored.local
wants to go standby [local]
heartbeat[9733]: 2008/12/22_07:27:26 info: i_hold_resources: 1
heartbeat[9733]: 2008/12/22_07:27:26 info: New standby state: 1
heartbeat[9733]: 2008/12/22_07:27:26 info: Managed hb_takeover process 17604
exited with return code 0.
heartbeat[9733]: 2008/12/22_07:27:26 debug: RscMgmtProc 'hb_takeover' exited
code 0
heartbeat[9733]: 2008/12/22_07:27:26 debug: Received standby message other from
ha04.domain-name-censored.local in state 1
heartbeat[9733]: 2008/12/22_07:27:26 info: standby:
ha04.domain-name-censored.local can take our local resources
heartbeat[9733]: 2008/12/22_07:27:26 debug: go_standby: other is unstable
heartbeat[9733]: 2008/12/22_07:27:26 debug: Sending hold resources msg: none,
stable=0 # standby
heartbeat[9733]: 2008/12/22_07:27:26 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 1, standby running(ms): -521182662, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:27:26 debug: Process [go_standby] started pid
17634
heartbeat[9733]: 2008/12/22_07:27:26 info: New standby state: 1
heartbeat[17634]: 2008/12/22_07:27:26 info: give up local HA resources
(standby).
heartbeat[17634]: 2008/12/22_07:27:26 info: go_standby: who: 1 resource set:
local
heartbeat[17634]: 2008/12/22_07:27:26 info: go_standby: (query/action):
(ourkeys/givegroup)
ResourceManager[17647]: 2008/12/22_07:27:26 info: Releasing resource group:
ha03.domain-name-censored.local drbddisk::ha_mysql
Filesystem::/dev/drbd0::/ha_mysql::ext3 IPaddr2::192.168.10.201/24/bond0
mysql_001 mysql_002
ResourceManager[17647]: 2008/12/22_07:27:26 info: Running /etc/init.d/mysql_002
stop
ResourceManager[17647]: 2008/12/22_07:27:26 debug: Starting
/etc/init.d/mysql_002 stop
Killing mysqld with pid 17298
ResourceManager[17647]: 2008/12/22_07:27:27 debug: /etc/init.d/mysql_002 stop
done. RC=0
ResourceManager[17647]: 2008/12/22_07:27:27 info: Running /etc/init.d/mysql_001
stop
ResourceManager[17647]: 2008/12/22_07:27:27 debug: Starting
/etc/init.d/mysql_001 stop
Killing mysqld with pid 17281
ResourceManager[17647]: 2008/12/22_07:27:28 debug: /etc/init.d/mysql_001 stop
done. RC=0
ResourceManager[17647]: 2008/12/22_07:27:28 info: Running
/etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 stop
ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting
/etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 stop
IPaddr2[17782]: 2008/12/22_07:27:28 INFO: ip -f inet addr delete
192.168.10.201/24 dev bond0
IPaddr2[17782]: 2008/12/22_07:27:28 INFO: ip -o -f inet addr show bond0
IPaddr2[17753]: 2008/12/22_07:27:28 INFO: Success
INFO: Success
ResourceManager[17647]: 2008/12/22_07:27:28 debug: /etc/ha.d/resource.d/IPaddr2
192.168.10.201/24/bond0 stop done. RC=0
ResourceManager[17647]: 2008/12/22_07:27:28 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop
ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop
Filesystem[17863]: 2008/12/22_07:27:28 INFO: Running stop for /dev/drbd0
on /ha_mysql
Filesystem[17863]: 2008/12/22_07:27:28 INFO: Trying to unmount /ha_mysql
Filesystem[17863]: 2008/12/22_07:27:28 INFO: unmounted /ha_mysql
successfully
Filesystem[17852]: 2008/12/22_07:27:28 INFO: Success
INFO: Success
ResourceManager[17647]: 2008/12/22_07:27:28 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop done. RC=0
ResourceManager[17647]: 2008/12/22_07:27:28 info: Running
/etc/ha.d/resource.d/drbddisk ha_mysql stop
ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting
/etc/ha.d/resource.d/drbddisk ha_mysql stop
ResourceManager[17647]: 2008/12/22_07:27:28 debug:
/etc/ha.d/resource.d/drbddisk ha_mysql stop done. RC=0
heartbeat[17634]: 2008/12/22_07:27:28 info: local HA resource release completed
(standby).
heartbeat[17634]: 2008/12/22_07:27:28 debug: Sending standby [done] msg
heartbeat[17634]: 2008/12/22_07:27:28 info: FIFO message [type ask_resources]
written rc=49
heartbeat[9733]: 2008/12/22_07:27:28 debug: Received standby message done from
ha03.domain-name-censored.local in state 1
heartbeat[9733]: 2008/12/22_07:27:28 info: Local standby process completed
[local].
heartbeat[9733]: 2008/12/22_07:27:28 info: New standby state: 3
heartbeat[9733]: 2008/12/22_07:27:28 info: Managed go_standby process 17634
exited with return code 0.
heartbeat[9733]: 2008/12/22_07:27:28 debug: RscMgmtProc 'go_standby' exited
code 0
heartbeat[9733]: 2008/12/22_07:27:50 WARN: 1 lost packet(s) for
[ha04.domain-name-censored.local] [3100:3102]
heartbeat[9733]: 2008/12/22_07:27:50 info: remote resource transition completed.
heartbeat[9733]: 2008/12/22_07:27:50 debug: Sending hold resources msg: none,
stable=1 # <none>
heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 3, standby running(ms): -521180402, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:27:50 debug: Calling PerformAutoFailback()
heartbeat[9733]: 2008/12/22_07:27:50 info: other_holds_resources: 3
heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 3, standby running(ms): -521180402, resourcestate: 4
ipfail[10103]: 2008/12/22_07:27:50 debug: Other side is now stable.
heartbeat[9733]: 2008/12/22_07:27:50 info: No pkts missing from
ha04.domain-name-censored.local!
heartbeat[9733]: 2008/12/22_07:27:50 debug: Received standby message done from
ha04.domain-name-censored.local in state 3
heartbeat[9733]: 2008/12/22_07:27:50 info: Other node completed standby
takeover of local resources.
heartbeat[9733]: 2008/12/22_07:27:50 debug: Sending hold resources msg: none,
stable=1 # <none>
heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:27:50 info: New standby state: 0
heartbeat[9733]: 2008/12/22_07:27:51 info: other_holds_resources: 3
heartbeat[9733]: 2008/12/22_07:27:51 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[10103]: 2008/12/22_07:27:51 debug: Other side is now stable.
heartbeat[9733]: 2008/12/22_07:28:21 debug: Received standby message me from
ha04.domain-name-censored.local in state 0
heartbeat[9733]: 2008/12/22_07:28:21 debug: ask_for_resources: other now
unstable
heartbeat[9733]: 2008/12/22_07:28:21 info: ha04.domain-name-censored.local
wants to go standby [foreign]
heartbeat[9733]: 2008/12/22_07:28:21 info: standby: other_holds_resources: 3
heartbeat[9733]: 2008/12/22_07:28:21 debug: Sending standby [other] msg
heartbeat[9733]: 2008/12/22_07:28:21 debug: Received standby message other from
ha03.domain-name-censored.local in state 2
heartbeat[9733]: 2008/12/22_07:28:21 info: New standby state: 2
heartbeat[9733]: 2008/12/22_07:28:21 info: New standby state: 2
heartbeat[9733]: 2008/12/22_07:28:21 debug: process_resources(2): other now
unstable
heartbeat[9733]: 2008/12/22_07:28:21 info: other_holds_resources: 1
heartbeat[9733]: 2008/12/22_07:28:21 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 2, standby running(ms): -521128082, resourcestate: 4
ipfail[10103]: 2008/12/22_07:28:21 debug: Other side is unstable.
heartbeat[9733]: 2008/12/22_07:28:42 debug: Received standby message done from
ha04.domain-name-censored.local in state 2
heartbeat[9733]: 2008/12/22_07:28:42 info: standby: acquire [foreign] resources
from ha04.domain-name-censored.local
heartbeat[9733]: 2008/12/22_07:28:42 debug: Process [go_standby] started pid
18012
heartbeat[9733]: 2008/12/22_07:28:42 info: New standby state: 3
heartbeat[18012]: 2008/12/22_07:28:42 info: acquire local HA resources
(standby).
heartbeat[18012]: 2008/12/22_07:28:42 info: go_standby: who: 2 resource set:
local
heartbeat[18012]: 2008/12/22_07:28:42 info: go_standby: (query/action):
(ourkeys/takegroup)
ResourceManager[18025]: 2008/12/22_07:28:42 info: Acquiring resource group:
ha03.domain-name-censored.local drbddisk::ha_mysql
Filesystem::/dev/drbd0::/ha_mysql::ext3 IPaddr2::192.168.10.201/24/bond0
mysql_001 mysql_002
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running
/etc/ha.d/resource.d/drbddisk ha_mysql start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting
/etc/ha.d/resource.d/drbddisk ha_mysql start
ResourceManager[18025]: 2008/12/22_07:28:42 debug:
/etc/ha.d/resource.d/drbddisk ha_mysql start done. RC=0
Filesystem[18093]: 2008/12/22_07:28:42 INFO: Resource is stopped
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start
Filesystem[18174]: 2008/12/22_07:28:42 INFO: Running start for /dev/drbd0
on /ha_mysql
Filesystem[18163]: 2008/12/22_07:28:42 INFO: Success
INFO: Success
ResourceManager[18025]: 2008/12/22_07:28:42 debug:
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start done. RC=0
IPaddr2[18248]: 2008/12/22_07:28:42 INFO: Resource is stopped
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running
/etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting
/etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 start
IPaddr2[18360]: 2008/12/22_07:28:42 INFO: ip -f inet addr add 192.168.10.201/24
brd 192.168.10.255 dev bond0
IPaddr2[18360]: 2008/12/22_07:28:42 INFO: ip link set bond0 up
IPaddr2[18360]: 2008/12/22_07:28:42 INFO: /usr/lib/heartbeat/send_arp -i 200 -r
5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.10.201 bond0
192.168.10.201 auto not_used not_used
IPaddr2[18331]: 2008/12/22_07:28:42 INFO: Success
INFO: Success
ResourceManager[18025]: 2008/12/22_07:28:42 debug: /etc/ha.d/resource.d/IPaddr2
192.168.10.201/24/bond0 start done. RC=0
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running /etc/init.d/mysql_001
start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting
/etc/init.d/mysql_001 start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: /etc/init.d/mysql_001 start
done. RC=0
ResourceManager[18025]: 2008/12/22_07:28:42 info: Running /etc/init.d/mysql_002
start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting
/etc/init.d/mysql_002 start
ResourceManager[18025]: 2008/12/22_07:28:42 debug: /etc/init.d/mysql_002 start
done. RC=0
heartbeat[18012]: 2008/12/22_07:28:42 info: local HA resource acquisition
completed (standby).
heartbeat[18012]: 2008/12/22_07:28:42 debug: Sending standby [done] msg
heartbeat[18012]: 2008/12/22_07:28:42 info: FIFO message [type ask_resources]
written rc=51
heartbeat[9733]: 2008/12/22_07:28:42 debug: Received standby message done from
ha03.domain-name-censored.local in state 3
heartbeat[9733]: 2008/12/22_07:28:42 info: Standby resource acquisition done
[foreign].
heartbeat[9733]: 2008/12/22_07:28:42 debug: Sending hold resources msg: local,
stable=1 # <none>
heartbeat[9733]: 2008/12/22_07:28:42 info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (1))
heartbeat[9733]: 2008/12/22_07:28:42 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:28:42 info: New standby state: 0
heartbeat[9733]: 2008/12/22_07:28:42 info: Managed go_standby process 18012
exited with return code 0.
heartbeat[9733]: 2008/12/22_07:28:42 debug: RscMgmtProc 'go_standby' exited
code 0
heartbeat[9733]: 2008/12/22_07:28:43 info: remote resource transition completed.
heartbeat[9733]: 2008/12/22_07:28:43 debug: Sending hold resources msg: local,
stable=1 # <none>
heartbeat[9733]: 2008/12/22_07:28:43 info: AnnounceTakeover(local 1, foreign 1,
reason 'T_RESOURCES(us)' (1))
heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:28:43 debug: Calling PerformAutoFailback()
heartbeat[9733]: 2008/12/22_07:28:43 info: other_holds_resources: 1
heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[10103]: 2008/12/22_07:28:43 debug: Other side is now stable.
heartbeat[9733]: 2008/12/22_07:28:43 info: other_holds_resources: 1
heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
ipfail[10103]: 2008/12/22_07:28:43 debug: Other side is now stable.
heartbeat[9733]: 2008/12/22_07:29:23 debug: APIregistration_dispatch() {
heartbeat[9733]: 2008/12/22_07:29:23 debug: process_registerevent() {
heartbeat[9733]: 2008/12/22_07:29:23 debug: client->gsource = 0x8bcdb40
heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*process_registerevent*/;
heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*APIregistration_dispatch*/;
heartbeat[9733]: 2008/12/22_07:29:23 debug: Checking client authorization for
client 18641 (0:496)
heartbeat[9733]: 2008/12/22_07:29:23 debug: create_seq_snapshot_table:no
missing packets found for node ha03.domain-name-censored.local
heartbeat[9733]: 2008/12/22_07:29:23 debug: create_seq_snapshot_table:no
missing packets found for node ha04.domain-name-censored.local
heartbeat[9733]: 2008/12/22_07:29:23 debug: Signing on API client 18641
('casual')
heartbeat[9733]: 2008/12/22_07:29:23 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat[9733]: 2008/12/22_07:29:23 debug: Signing client 18641 off
heartbeat[9733]: 2008/12/22_07:29:23 debug: G_remove_client(pid=18641,
reason='signoff' gsource=0x8bcdb40) {
heartbeat[9733]: 2008/12/22_07:29:23 debug: api_remove_client_int: removing pid
[18641] reason: signoff
heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*G_remove_client;*/
--Eric
Sorry for annoying server-appended disclaimer
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Disclaimer - December 22, 2008
This email and any files transmitted with it are confidential and intended
solely for General Linux-HA mailing list,[email protected]. If you
are not the named addressee you should not disseminate, distribute, copy or
alter this email. Any views or opinions presented in this email are solely
those of the author and might not represent those of . Warning: Although has
taken reasonable precautions to ensure no viruses are present in this email,
the company cannot accept responsibility for any loss or damage arising from
the use of this email or attachments.
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems