Hi,

On Mon, Dec 22, 2008 at 07:50:20AM -0800, Robinson, Eric wrote:
> We have 2 nodes running heartbeat 2.1.3
>  
> Node 1 (hostname 'ha03') is primary for resource name 'ha_mysql'
>  
> Node 2 (hostname 'ha04') is primary for resource name 'ha_ftp'
>  
> For two days, Node 2 was offline while we upgraded its kernel
> and drbd versions. It's back up and now we're trying to upgrade
> Node 1. When we try to force Node 1 to go standby, it succeeds.
> A few seconds later it fails back.

It? What fails back?

> However, resource 'ha_ftp' did not fail back. Node 2 kept it
> (perhaps because it it primary for that resource?).

Don't understand what's going on. ha_ftp is not touched,
according to your logs.

Thanks,

Dejan

>  
> ha.cf from Node 1
> -----------------
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility     local0
> traditional_compression false
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 696
> baud    19200
> serial  /dev/ttyS0
> bcast   bond0
> #bcast  eth1
> #mcast eth0 225.0.0.1 696 1 0
> auto_failback off
> #watchdog /dev/watchdog
> node    ha03.domain-name-censored.local
> node    ha04.domain-name-censored.local
> respawn hacluster /usr/lib/heartbeat/ipfail
> ping 192.168.10.100
> debug 1
> apiauth ipfail gid=haclient uid=hacluster
> #apiauth ccm uid=hacluster
> #apiauth ipfail gid=haclient uid=alanr,root
> #apiauth default gid=haclient
>  
> ha.cf from Node 2
> -----------------
> debugfile /var/log/ha-debug
> logfile /var/log/ha-log
> logfacility     local0
> traditional_compression false
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport 696
> baud    19200
> serial  /dev/ttyS0
> bcast   bond0
> #bcast  eth1
> #mcast eth0 225.0.0.1 696 1 0
> auto_failback off
> #watchdog /dev/watchdog
> node    ha03.domain-name-censored.local
> node    ha04.domain-name-censored.local
> respawn hacluster /usr/lib/heartbeat/ipfail
> ping 192.168.10.100
> debug 1
> apiauth ipfail gid=haclient uid=hacluster
> #apiauth ccm uid=hacluster
> #apiauth ipfail gid=haclient uid=alanr,root
> #apiauth default gid=haclient
> 
> ha-debug from Node 1
> --------------------
> heartbeat[9733]: 2008/12/22_07:27:26 debug: StartNextRemoteRscReq() - calling 
> hook
> heartbeat[9733]: 2008/12/22_07:27:26 debug: notify_world: invoking harc: OLD 
> status: active
> heartbeat[9733]: 2008/12/22_07:27:26 debug: Process [hb_takeover] started pid 
> 17604
> heartbeat[9733]: 2008/12/22_07:27:26 debug: Starting notify process 
> [hb_takeover]
> heartbeat[17604]: 2008/12/22_07:27:26 debug: notify_world: setting SIGCHLD 
> Handler to SIG_DFL
> heartbeat[17604]: 2008/12/22_07:27:26 debug: notify_world: Running harc 
> hb_takeover
> harc[17604]:    2008/12/22_07:27:26 info: Running /etc/ha.d/rc.d/hb_takeover 
> hb_takeover
> hb_standby[17620]:      2008/12/22_07:27:26 Going standby [local].
> heartbeat[9733]: 2008/12/22_07:27:26 debug: Received standby message me from 
> ha03.domain-name-censored.local in state 0
> heartbeat[9733]: 2008/12/22_07:27:26 debug: ask_for_resources: other now 
> unstable
> heartbeat[9733]: 2008/12/22_07:27:26 info: ha03.domain-name-censored.local 
> wants to go standby [local]
> heartbeat[9733]: 2008/12/22_07:27:26 info: i_hold_resources: 1
> heartbeat[9733]: 2008/12/22_07:27:26 info: New standby state: 1
> heartbeat[9733]: 2008/12/22_07:27:26 info: Managed hb_takeover process 17604 
> exited with return code 0.
> heartbeat[9733]: 2008/12/22_07:27:26 debug: RscMgmtProc 'hb_takeover' exited 
> code 0
> heartbeat[9733]: 2008/12/22_07:27:26 debug: Received standby message other 
> from ha04.domain-name-censored.local in state 1
> heartbeat[9733]: 2008/12/22_07:27:26 info: standby: 
> ha04.domain-name-censored.local can take our local resources
> heartbeat[9733]: 2008/12/22_07:27:26 debug: go_standby: other is unstable
> heartbeat[9733]: 2008/12/22_07:27:26 debug: Sending hold resources msg: none, 
> stable=0 # standby
> heartbeat[9733]: 2008/12/22_07:27:26 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0, 
> going_standby: 1, standby running(ms): -521182662, resourcestate: 4
> heartbeat[9733]: 2008/12/22_07:27:26 debug: Process [go_standby] started pid 
> 17634
> heartbeat[9733]: 2008/12/22_07:27:26 info: New standby state: 1
> heartbeat[17634]: 2008/12/22_07:27:26 info: give up local HA resources 
> (standby).
> heartbeat[17634]: 2008/12/22_07:27:26 info: go_standby: who: 1 resource set: 
> local
> heartbeat[17634]: 2008/12/22_07:27:26 info: go_standby: (query/action): 
> (ourkeys/givegroup)
> ResourceManager[17647]: 2008/12/22_07:27:26 info: Releasing resource group: 
> ha03.domain-name-censored.local drbddisk::ha_mysql 
> Filesystem::/dev/drbd0::/ha_mysql::ext3 IPaddr2::192.168.10.201/24/bond0 
> mysql_001 mysql_002
> ResourceManager[17647]: 2008/12/22_07:27:26 info: Running 
> /etc/init.d/mysql_002  stop
> ResourceManager[17647]: 2008/12/22_07:27:26 debug: Starting 
> /etc/init.d/mysql_002  stop
> Killing mysqld with pid 17298
> ResourceManager[17647]: 2008/12/22_07:27:27 debug: /etc/init.d/mysql_002  
> stop done. RC=0
> ResourceManager[17647]: 2008/12/22_07:27:27 info: Running 
> /etc/init.d/mysql_001  stop
> ResourceManager[17647]: 2008/12/22_07:27:27 debug: Starting 
> /etc/init.d/mysql_001  stop
> Killing mysqld with pid 17281
> ResourceManager[17647]: 2008/12/22_07:27:28 debug: /etc/init.d/mysql_001  
> stop done. RC=0
> ResourceManager[17647]: 2008/12/22_07:27:28 info: Running 
> /etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 stop
> ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting 
> /etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 stop
> IPaddr2[17782]: 2008/12/22_07:27:28 INFO: ip -f inet addr delete 
> 192.168.10.201/24 dev bond0
> IPaddr2[17782]: 2008/12/22_07:27:28 INFO: ip -o -f inet addr show bond0
> IPaddr2[17753]: 2008/12/22_07:27:28 INFO:  Success
> INFO:  Success
> ResourceManager[17647]: 2008/12/22_07:27:28 debug: 
> /etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 stop done. RC=0
> ResourceManager[17647]: 2008/12/22_07:27:28 info: Running 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop
> ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop
> Filesystem[17863]:      2008/12/22_07:27:28 INFO: Running stop for /dev/drbd0 
> on /ha_mysql
> Filesystem[17863]:      2008/12/22_07:27:28 INFO: Trying to unmount /ha_mysql
> Filesystem[17863]:      2008/12/22_07:27:28 INFO: unmounted /ha_mysql 
> successfully
> Filesystem[17852]:      2008/12/22_07:27:28 INFO:  Success
> INFO:  Success
> ResourceManager[17647]: 2008/12/22_07:27:28 debug: 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 stop done. RC=0
> ResourceManager[17647]: 2008/12/22_07:27:28 info: Running 
> /etc/ha.d/resource.d/drbddisk ha_mysql stop
> ResourceManager[17647]: 2008/12/22_07:27:28 debug: Starting 
> /etc/ha.d/resource.d/drbddisk ha_mysql stop
> ResourceManager[17647]: 2008/12/22_07:27:28 debug: 
> /etc/ha.d/resource.d/drbddisk ha_mysql stop done. RC=0
> heartbeat[17634]: 2008/12/22_07:27:28 info: local HA resource release 
> completed (standby).
> heartbeat[17634]: 2008/12/22_07:27:28 debug: Sending standby [done] msg
> heartbeat[17634]: 2008/12/22_07:27:28 info: FIFO message [type ask_resources] 
> written rc=49
> heartbeat[9733]: 2008/12/22_07:27:28 debug: Received standby message done 
> from ha03.domain-name-censored.local in state 1
> heartbeat[9733]: 2008/12/22_07:27:28 info: Local standby process completed 
> [local].
> heartbeat[9733]: 2008/12/22_07:27:28 info: New standby state: 3
> heartbeat[9733]: 2008/12/22_07:27:28 info: Managed go_standby process 17634 
> exited with return code 0.
> heartbeat[9733]: 2008/12/22_07:27:28 debug: RscMgmtProc 'go_standby' exited 
> code 0
> heartbeat[9733]: 2008/12/22_07:27:50 WARN: 1 lost packet(s) for 
> [ha04.domain-name-censored.local] [3100:3102]
> heartbeat[9733]: 2008/12/22_07:27:50 info: remote resource transition 
> completed.
> heartbeat[9733]: 2008/12/22_07:27:50 debug: Sending hold resources msg: none, 
> stable=1 # <none>
> heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
> going_standby: 3, standby running(ms): -521180402, resourcestate: 4
> heartbeat[9733]: 2008/12/22_07:27:50 debug: Calling PerformAutoFailback()
> heartbeat[9733]: 2008/12/22_07:27:50 info: other_holds_resources: 3
> heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
> going_standby: 3, standby running(ms): -521180402, resourcestate: 4
> ipfail[10103]: 2008/12/22_07:27:50 debug: Other side is now stable.
> heartbeat[9733]: 2008/12/22_07:27:50 info: No pkts missing from 
> ha04.domain-name-censored.local!
> heartbeat[9733]: 2008/12/22_07:27:50 debug: Received standby message done 
> from ha04.domain-name-censored.local in state 3
> heartbeat[9733]: 2008/12/22_07:27:50 info: Other node completed standby 
> takeover of local resources.
> heartbeat[9733]: 2008/12/22_07:27:50 debug: Sending hold resources msg: none, 
> stable=1 # <none>
> heartbeat[9733]: 2008/12/22_07:27:50 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> heartbeat[9733]: 2008/12/22_07:27:50 info: New standby state: 0
> heartbeat[9733]: 2008/12/22_07:27:51 info: other_holds_resources: 3
> heartbeat[9733]: 2008/12/22_07:27:51 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> ipfail[10103]: 2008/12/22_07:27:51 debug: Other side is now stable.
> heartbeat[9733]: 2008/12/22_07:28:21 debug: Received standby message me from 
> ha04.domain-name-censored.local in state 0
> heartbeat[9733]: 2008/12/22_07:28:21 debug: ask_for_resources: other now 
> unstable
> heartbeat[9733]: 2008/12/22_07:28:21 info: ha04.domain-name-censored.local 
> wants to go standby [foreign]
> heartbeat[9733]: 2008/12/22_07:28:21 info: standby: other_holds_resources: 3
> heartbeat[9733]: 2008/12/22_07:28:21 debug: Sending standby [other] msg
> heartbeat[9733]: 2008/12/22_07:28:21 debug: Received standby message other 
> from ha03.domain-name-censored.local in state 2
> heartbeat[9733]: 2008/12/22_07:28:21 info: New standby state: 2
> heartbeat[9733]: 2008/12/22_07:28:21 info: New standby state: 2
> heartbeat[9733]: 2008/12/22_07:28:21 debug: process_resources(2):  other now 
> unstable
> heartbeat[9733]: 2008/12/22_07:28:21 info: other_holds_resources: 1
> heartbeat[9733]: 2008/12/22_07:28:21 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0, 
> going_standby: 2, standby running(ms): -521128082, resourcestate: 4
> ipfail[10103]: 2008/12/22_07:28:21 debug: Other side is unstable.
> heartbeat[9733]: 2008/12/22_07:28:42 debug: Received standby message done 
> from ha04.domain-name-censored.local in state 2
> heartbeat[9733]: 2008/12/22_07:28:42 info: standby: acquire [foreign] 
> resources from ha04.domain-name-censored.local
> heartbeat[9733]: 2008/12/22_07:28:42 debug: Process [go_standby] started pid 
> 18012
> heartbeat[9733]: 2008/12/22_07:28:42 info: New standby state: 3
> heartbeat[18012]: 2008/12/22_07:28:42 info: acquire local HA resources 
> (standby).
> heartbeat[18012]: 2008/12/22_07:28:42 info: go_standby: who: 2 resource set: 
> local
> heartbeat[18012]: 2008/12/22_07:28:42 info: go_standby: (query/action): 
> (ourkeys/takegroup)
> ResourceManager[18025]: 2008/12/22_07:28:42 info: Acquiring resource group: 
> ha03.domain-name-censored.local drbddisk::ha_mysql 
> Filesystem::/dev/drbd0::/ha_mysql::ext3 IPaddr2::192.168.10.201/24/bond0 
> mysql_001 mysql_002
> ResourceManager[18025]: 2008/12/22_07:28:42 info: Running 
> /etc/ha.d/resource.d/drbddisk ha_mysql start
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
> /etc/ha.d/resource.d/drbddisk ha_mysql start
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: 
> /etc/ha.d/resource.d/drbddisk ha_mysql start done. RC=0
> Filesystem[18093]:      2008/12/22_07:28:42 INFO:  Resource is stopped
> ResourceManager[18025]: 2008/12/22_07:28:42 info: Running 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start
> Filesystem[18174]:      2008/12/22_07:28:42 INFO: Running start for 
> /dev/drbd0 on /ha_mysql
> Filesystem[18163]:      2008/12/22_07:28:42 INFO:  Success
> INFO:  Success
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: 
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /ha_mysql ext3 start done. RC=0
> IPaddr2[18248]: 2008/12/22_07:28:42 INFO:  Resource is stopped
> ResourceManager[18025]: 2008/12/22_07:28:42 info: Running 
> /etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 start
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
> /etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 start
> IPaddr2[18360]: 2008/12/22_07:28:42 INFO: ip -f inet addr add 
> 192.168.10.201/24 brd 192.168.10.255 dev bond0
> IPaddr2[18360]: 2008/12/22_07:28:42 INFO: ip link set bond0 up
> IPaddr2[18360]: 2008/12/22_07:28:42 INFO: /usr/lib/heartbeat/send_arp -i 200 
> -r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.10.201 bond0 
> 192.168.10.201 auto not_used not_used
> IPaddr2[18331]: 2008/12/22_07:28:42 INFO:  Success
> INFO:  Success
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: 
> /etc/ha.d/resource.d/IPaddr2 192.168.10.201/24/bond0 start done. RC=0
> ResourceManager[18025]: 2008/12/22_07:28:42 info: Running 
> /etc/init.d/mysql_001  start
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
> /etc/init.d/mysql_001  start
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: /etc/init.d/mysql_001  
> start done. RC=0
> ResourceManager[18025]: 2008/12/22_07:28:42 info: Running 
> /etc/init.d/mysql_002  start
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: Starting 
> /etc/init.d/mysql_002  start
> ResourceManager[18025]: 2008/12/22_07:28:42 debug: /etc/init.d/mysql_002  
> start done. RC=0
> heartbeat[18012]: 2008/12/22_07:28:42 info: local HA resource acquisition 
> completed (standby).
> heartbeat[18012]: 2008/12/22_07:28:42 debug: Sending standby [done] msg
> heartbeat[18012]: 2008/12/22_07:28:42 info: FIFO message [type ask_resources] 
> written rc=51
> heartbeat[9733]: 2008/12/22_07:28:42 debug: Received standby message done 
> from ha03.domain-name-censored.local in state 3
> heartbeat[9733]: 2008/12/22_07:28:42 info: Standby resource acquisition done 
> [foreign].
> heartbeat[9733]: 2008/12/22_07:28:42 debug: Sending hold resources msg: 
> local, stable=1 # <none>
> heartbeat[9733]: 2008/12/22_07:28:42 info: AnnounceTakeover(local 1, foreign 
> 1, reason 'T_RESOURCES(us)' (1))
> heartbeat[9733]: 2008/12/22_07:28:42 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0, 
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> heartbeat[9733]: 2008/12/22_07:28:42 info: New standby state: 0
> heartbeat[9733]: 2008/12/22_07:28:42 info: Managed go_standby process 18012 
> exited with return code 0.
> heartbeat[9733]: 2008/12/22_07:28:42 debug: RscMgmtProc 'go_standby' exited 
> code 0
> heartbeat[9733]: 2008/12/22_07:28:43 info: remote resource transition 
> completed.
> heartbeat[9733]: 2008/12/22_07:28:43 debug: Sending hold resources msg: 
> local, stable=1 # <none>
> heartbeat[9733]: 2008/12/22_07:28:43 info: AnnounceTakeover(local 1, foreign 
> 1, reason 'T_RESOURCES(us)' (1))
> heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> heartbeat[9733]: 2008/12/22_07:28:43 debug: Calling PerformAutoFailback()
> heartbeat[9733]: 2008/12/22_07:28:43 info: other_holds_resources: 1
> heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> ipfail[10103]: 2008/12/22_07:28:43 debug: Other side is now stable.
> heartbeat[9733]: 2008/12/22_07:28:43 info: other_holds_resources: 1
> heartbeat[9733]: 2008/12/22_07:28:43 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> ipfail[10103]: 2008/12/22_07:28:43 debug: Other side is now stable.
> heartbeat[9733]: 2008/12/22_07:29:23 debug: APIregistration_dispatch() {
> heartbeat[9733]: 2008/12/22_07:29:23 debug: process_registerevent() {
> heartbeat[9733]: 2008/12/22_07:29:23 debug: client->gsource = 0x8bcdb40
> heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*process_registerevent*/;
> heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*APIregistration_dispatch*/;
> heartbeat[9733]: 2008/12/22_07:29:23 debug: Checking client authorization for 
> client 18641 (0:496)
> heartbeat[9733]: 2008/12/22_07:29:23 debug: create_seq_snapshot_table:no 
> missing packets found for node ha03.domain-name-censored.local
> heartbeat[9733]: 2008/12/22_07:29:23 debug: create_seq_snapshot_table:no 
> missing packets found for node ha04.domain-name-censored.local
> heartbeat[9733]: 2008/12/22_07:29:23 debug: Signing on API client 18641 
> ('casual')
> heartbeat[9733]: 2008/12/22_07:29:23 debug: hb_rsc_isstable: 
> ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0, 
> going_standby: 0, standby running(ms): 0, resourcestate: 4
> heartbeat[9733]: 2008/12/22_07:29:23 debug: Signing client 18641 off
> heartbeat[9733]: 2008/12/22_07:29:23 debug: G_remove_client(pid=18641, 
> reason='signoff' gsource=0x8bcdb40) {
> heartbeat[9733]: 2008/12/22_07:29:23 debug: api_remove_client_int: removing 
> pid [18641] reason: signoff
> heartbeat[9733]: 2008/12/22_07:29:23 debug: }/*G_remove_client;*/
>  
> --Eric
>  
> Sorry for annoying server-appended disclaimer
> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>  
>  
> 
> 
> Disclaimer - December 22, 2008 
> This email and any files transmitted with it are confidential and intended 
> solely for General Linux-HA mailing list,[email protected]. If you 
> are not the named addressee you should not disseminate, distribute, copy or 
> alter this email. Any views or opinions presented in this email are solely 
> those of the author and might not represent those of . Warning: Although  has 
> taken reasonable precautions to ensure no viruses are present in this email, 
> the company cannot accept responsibility for any loss or damage arising from 
> the use of this email or attachments. 
> This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to