Hello all,

I have been using Heartbeat for a while now, and I have many systems working
fine.  However I have one set of systems that is really giving me headaches.
Here's the problem;

Sys1: (Master) Debian Sarge system, with heartbeat 1.2.5-3, drbd, drbdlinks
and apache
Sys2: (Slave) - Identical.

It I run I fail sys1 (stopping heartbeat) sys2 picks up and runs everything
fine, drbddisk takes over, the Ips transfer it creates the Links and starts
apache.  No issues.  I then finish the work on Sys1 and bring it back in as
a Slave.  Then kill heartbeat on Sys2 so that Sys1 will become Master again.
Everything seems to work fins, but then after 30 seconds I get one single
entry in the log saying  apache failed (this is already after it's up and
running), and all the resources go away.  At this point I can manually stop
heartbeat and start it again and Sys1 becomes the master again.  I have put
all the relevant logs below.

Also it seems to do the same thing in another situation.  If Sys1 is master
and Sys2 is slave and I shutdown (#shutdown -h now) sys2 (remember Slave),
with out manually stopping heartbeat, Sys1 will so the same thing as above,
Throw and apache failed error and give up resources (even though everything
was running fine).  

Like I said, I have other systems with all the same hardware and software
(same versions even) and everything works fine.  I don't see anything
obvious in the Logs, and I'm banging my head up against the wall at this
point.

Thanks in advance.

Logs from Sys1 heartbeat debug level 1
*****************************
*** fail Sys1 to Sys2
***  /etc/init.d/heartbeat stop
*****************************
heartbeat: 2007/12/17_11:55:37 debug: Process 13262 processing SIGTERM
heartbeat: 2007/12/17_11:55:37 debug: hb_initiate_shutdown() called.
heartbeat: 2007/12/17_11:55:37 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_11:55:37 debug: Sending hold resources msg: none,
stable=0 # shutdown
heartbeat: 2007/12/17_11:55:37 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat: 2007/12/17_11:55:37 debug: Process [hb_giveup_resources] started
pid 14194
heartbeat: 2007/12/17_11:55:38 debug: Starting /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS stop
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting /etc/init.d/apache  stop
Stopping web server: apache.
heartbeat: 2007/12/17_11:56:06 debug: /etc/init.d/apache  stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting
/etc/ha.d/resource.d/drbdlinks  stop
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/drbdlinks  stop
done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /shared ext3 stop
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/Filesystem
/dev/drbd0 /shared ext3 stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting /etc/ha.d/resource.d/drbddisk
stop
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/drbddisk  stop
done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_11:56:06 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_11:56:06 debug: Sending T_SHUTDONE.
heartbeat: 2007/12/17_11:56:06 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat: 2007/12/17_11:56:06 debug: Received T_SHUTDONE from us.
heartbeat: 2007/12/17_11:56:06 debug: Calling hb_mcp_final_shutdown in a
second.
heartbeat: 2007/12/17_11:56:06 debug: RscMgmtProc 'hb_giveup_resources'
exited code 0
heartbeat: 2007/12/17_11:56:06 debug: hb_mcp_final_shutdown() phase 0
heartbeat: 2007/12/17_11:56:06 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat: 2007/12/17_11:56:07 debug: Signing client 13276 off
heartbeat: 2007/12/17_11:56:07 debug: EOF from client pid 13276
heartbeat: 2007/12/17_11:56:07 debug: G_remove_client(pid=13276,
reason='EOF' gsource=0x80e6ca8) {
heartbeat: 2007/12/17_11:56:07 debug: api_remove_client_int: removing pid
[13276] reason: EOF
heartbeat: 2007/12/17_11:56:07 debug: }/*G_remove_client;*/
heartbeat: 2007/12/17_11:56:07 debug: hb_mcp_final_shutdown() phase 1
heartbeat: 2007/12/17_11:56:07 debug: Process 13269 processing SIGTERM
heartbeat: 2007/12/17_11:56:07 debug: Exiting from pid 13269 [rc=15]
heartbeat: 2007/12/17_11:56:07 debug: Process 13270 processing SIGTERM
heartbeat: 2007/12/17_11:56:07 debug: Exiting from pid 13270 [rc=15]
heartbeat: 2007/12/17_11:56:07 debug: Process 13271 processing SIGTERM
heartbeat: 2007/12/17_11:56:07 debug: Exiting from pid 13271 [rc=15]
heartbeat: 2007/12/17_11:56:07 debug: Process 13272 processing SIGTERM
heartbeat: 2007/12/17_11:56:07 debug: Exiting from pid 13272 [rc=15]
heartbeat: 2007/12/17_11:56:07 debug: Process 13273 processing SIGTERM
heartbeat: 2007/12/17_11:56:07 debug: Exiting from pid 13273 [rc=15]
heartbeat: 2007/12/17_11:56:07 debug: Exiting from pid 13262 [rc=0]
***********************************
*** Sys1 heartbeat stopped
***********************************

***********************************
*** Sys1 heartbeat start
*** Sys1 will be slave
***********************************
heartbeat: 2007/12/17_12:00:10 debug: Adding [ug]id hacluster [101] to
authorization g_hash_table
heartbeat: 2007/12/17_12:00:10 debug: Adding [ug]id hacluster [101] to
authorization g_hash_table
heartbeat: 2007/12/17_12:00:10 debug: Adding [ug]id haclient [104] to
authorization g_hash_table
heartbeat: 2007/12/17_12:00:10 debug: Adding [ug]id haclient [104] to
authorization g_hash_table
heartbeat: 2007/12/17_12:00:10 debug: Adding [ug]id haclient [104] to
authorization g_hash_table
heartbeat: 2007/12/17_12:00:10 debug: Beginning authentication parsing
heartbeat: 2007/12/17_12:00:10 debug: 16 max authentication methods
heartbeat: 2007/12/17_12:00:10 debug: Keyfile opened
heartbeat: 2007/12/17_12:00:10 debug: Keyfile perms OK
heartbeat: 2007/12/17_12:00:10 debug: 16 max authentication methods
heartbeat: 2007/12/17_12:00:10 debug: Found authentication method [sha1]
heartbeat: 2007/12/17_12:00:10 debug: Outbound signing method is 1
heartbeat: 2007/12/17_12:00:10 debug: Authentication parsing complete [1]
heartbeat: 2007/12/17_12:00:10 debug: add_option(hopfudge,1)
heartbeat: 2007/12/17_12:00:10 debug: add_option(baud,19200)
heartbeat: 2007/12/17_12:00:10 debug: add_option(hbgenmethod,file)
heartbeat: 2007/12/17_12:00:10 debug: add_option(realtime,true)
heartbeat: 2007/12/17_12:00:10 debug: add_option(normalpoll,true)
heartbeat: 2007/12/17_12:00:10 debug: add_option(msgfmt,classic)
heartbeat: 2007/12/17_12:00:10 debug: add_option(log_badpack,true)
heartbeat: 2007/12/17_12:00:10 debug: add_option(coredumps,true)
heartbeat: 2007/12/17_12:00:10 debug: HA configuration OK.  Heartbeat
starting.
heartbeat: 2007/12/17_12:00:10 debug: opening bcast eth0 (UDP/IP broadcast)
heartbeat: 2007/12/17_12:00:10 debug: SO_BINDTODEVICE(r) set for device eth0
heartbeat: 2007/12/17_12:00:10 debug: bcast channel eth0 now open...
heartbeat: 2007/12/17_12:00:10 debug: opening ping 192.168.232.1 (ping
membership)
heartbeat: 2007/12/17_12:00:10 debug: ping channel 192.168.232.1 now open...
heartbeat: 2007/12/17_12:00:10 debug: FIFO process pid: 14665
heartbeat: 2007/12/17_12:00:10 debug: write process pid: 14666
heartbeat: 2007/12/17_12:00:10 debug: read child process pid: 14667
heartbeat: 2007/12/17_12:00:10 debug: write process pid: 14668
heartbeat: 2007/12/17_12:00:10 debug: read child process pid: 14669
heartbeat: 2007/12/17_12:00:10 debug: Limiting CPU: 30 CPU seconds every
60000 milliseconds
heartbeat: 2007/12/17_12:00:10 debug: Waiting for child processes to start
heartbeat: 2007/12/17_12:00:10 debug: All your child process are belong to
us
heartbeat: 2007/12/17_12:00:10 debug: Starting local status message @ 1000
ms intervals
heartbeat: 2007/12/17_12:00:11 debug: Limiting CPU: 6 CPU seconds every
60000 milliseconds
heartbeat: 2007/12/17_12:00:11 debug: Limiting CPU: 24 CPU seconds every
60000 milliseconds
heartbeat: 2007/12/17_12:00:11 debug: Limiting CPU: 6 CPU seconds every
60000 milliseconds
heartbeat: 2007/12/17_12:00:11 debug: CreateInitialFilter: ip-request-resp
heartbeat: 2007/12/17_12:00:11 debug: CreateInitialFilter: status
heartbeat: 2007/12/17_12:00:11 debug: CreateInitialFilter: ask_resources
heartbeat: 2007/12/17_12:00:11 debug: CreateInitialFilter: hb_takeover
heartbeat: 2007/12/17_12:00:11 debug: CreateInitialFilter: ip-request
heartbeat: 2007/12/17_12:00:11 debug: Status seqno: 439 msgtime: 1197910811
heartbeat: 2007/12/17_12:00:11 debug: StartNextRemoteRscReq() - calling hook
heartbeat: 2007/12/17_12:00:11 debug: Limiting CPU: 24 CPU seconds every
60000 milliseconds
heartbeat: 2007/12/17_12:00:11 debug: notify_world: invoking harc: OLD
status: up
heartbeat: 2007/12/17_12:00:11 debug: Limiting CPU: 6 CPU seconds every
60000 milliseconds
heartbeat: 2007/12/17_12:00:11 debug: Process [status] started pid 14670
heartbeat: 2007/12/17_12:00:11 debug: Starting notify process [status]
heartbeat: 2007/12/17_12:00:11 debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
heartbeat: 2007/12/17_12:00:11 debug: notify_world: Running harc status
heartbeat: 2007/12/17_12:00:11 debug: RscMgmtProc 'status' exited code 0
heartbeat: 2007/12/17_12:00:30 debug: StartNextRemoteRscReq() - calling hook
heartbeat: 2007/12/17_12:00:30 debug: notify_world: invoking harc: OLD
status: up
heartbeat: 2007/12/17_12:00:30 debug: Process [status] started pid 14680
heartbeat: 2007/12/17_12:00:30 debug: Starting notify process [status]
heartbeat: 2007/12/17_12:00:30 debug: Comm_now_up(): updating status to
active
heartbeat: 2007/12/17_12:00:30 debug: Sending local starting msg:
resourcestate = 0
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 0
heartbeat: 2007/12/17_12:00:30 debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
heartbeat: 2007/12/17_12:00:30 debug: notify_world: Running harc status
heartbeat: 2007/12/17_12:00:30 debug: Sending hold resources msg: none,
stable=0 # <none>
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 3
heartbeat: 2007/12/17_12:00:30 debug: Calling PerformAutoFailback()
heartbeat: 2007/12/17_12:00:30 debug: Sending hold resources msg: none,
stable=1 # <none>
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 3
heartbeat: 2007/12/17_12:00:30 debug: Calling PerformAutoFailback()
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:00:30 debug: RscMgmtProc 'status' exited code 0
heartbeat: 2007/12/17_12:00:30 debug: APIregistration_dispatch() {
heartbeat: 2007/12/17_12:00:30 debug: process_registerevent() {
heartbeat: 2007/12/17_12:00:30 debug: client->gsource = 0x80dd120
heartbeat: 2007/12/17_12:00:30 debug: }/*process_registerevent*/;
heartbeat: 2007/12/17_12:00:30 debug: }/*APIregistration_dispatch*/;
heartbeat: 2007/12/17_12:00:30 debug: Checking client authorization for
client ipfail (101:104)
heartbeat: 2007/12/17_12:00:30 debug: Signing on API client 14681 (ipfail)
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
******************************
*** Sys1 start complete
*** Sys2 is primary
*** Sys1 is slave
******************************

******************************
*** Sys2 shutdown
*** Sys1 resource assumtion
*******************************
heartbeat: 2007/12/17_12:03:05 debug: process_resources(2):  other now
unstable
heartbeat: 2007/12/17_12:03:05 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:03:34 debug: process_resources(4):  other now
stable - T_SHUTDONE
heartbeat: 2007/12/17_12:03:34 debug: Process [go_standby] started pid 14820
heartbeat: 2007/12/17_12:03:34 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 3, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:03:34 debug: StartNextRemoteRscReq(): child count 1
heartbeat: 2007/12/17_12:03:34 debug: takeover_from_node: other now stable
heartbeat: 2007/12/17_12:03:34 debug: Process [req_our_resources(ask)]
started pid 14821
heartbeat: 2007/12/17_12:03:34 debug:
req_our_resources(/usr/lib/heartbeat/ResourceManager listkeys Sys1)
heartbeat: 2007/12/17_12:03:34 debug: req_our_resources(): running
[/usr/lib/heartbeat/req_resource 192.168.232.20/24/eth0]
heartbeat: 2007/12/17_12:03:34 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 start
heartbeat: 2007/12/17_12:03:34 debug: StartNextRemoteRscReq(): child count 2
heartbeat: 2007/12/17_12:03:34 debug: Sending hold resources msg: all,
stable=1 # req_our_resources()
heartbeat: 2007/12/17_12:03:34 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 2, other_is_stable: 1, takeover_in_progress: 1,
going_standby: 3, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:03:34 debug: RscMgmtProc 'req_our_resources(ask)'
exited code 0
heartbeat: 2007/12/17_12:03:34 debug: StartNextRemoteRscReq(): child count 1
ls: /var/lib/heartbeat/rsctmp/IPaddr/eth0:*: No such file or directory
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 start done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 start
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 start done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 start
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 start done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 start
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 start done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 start
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 start done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 start
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 start done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/drbddisk
start
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/drbddisk  start
done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /shared ext3 start
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/Filesystem
/dev/drbd0 /shared ext3 start done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting
/etc/ha.d/resource.d/drbdlinks  start
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/drbdlinks  start
done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/init.d/apache  start
Starting web server: apache.
heartbeat: 2007/12/17_12:03:36 debug: /etc/init.d/apache  start done. RC=0
heartbeat: 2007/12/17_12:03:36 debug: Starting /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS start
heartbeat: 2007/12/17_12:03:45 debug: hb_rsc_recover_dead_resources: other
now stable
heartbeat: 2007/12/17_12:04:04 debug: /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS start done. RC=0
heartbeat: 2007/12/17_12:04:04 debug: Sending standby [done] msg
heartbeat: 2007/12/17_12:04:04 debug: Received standby message done from
Sys1 in state 0 
heartbeat: 2007/12/17_12:04:04 debug: RscMgmtProc 'go_standby' exited code 0
heartbeat: 2007/12/17_12:04:04 debug: StartNextRemoteRscReq() - calling hook
heartbeat: 2007/12/17_12:04:04 debug: notify_world: invoking harc: OLD
status: active
heartbeat: 2007/12/17_12:04:04 debug: Process [status] started pid 15683
heartbeat: 2007/12/17_12:04:04 debug: Starting notify process [status]
heartbeat: 2007/12/17_12:04:04 debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
heartbeat: 2007/12/17_12:04:04 debug: notify_world: Running harc status
heartbeat: 2007/12/17_12:04:04 debug: process_resources(3):  other now
stable
heartbeat: 2007/12/17_12:04:04 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:04:04 debug: RscMgmtProc 'status' exited code 0
heartbeat: 2007/12/17_12:04:04 debug: StartNextRemoteRscReq() - calling hook
heartbeat: 2007/12/17_12:04:04 debug: notify_world: invoking harc: OLD
status: active
heartbeat: 2007/12/17_12:04:04 debug: Process [ip-request-resp] started pid
15706
heartbeat: 2007/12/17_12:04:04 debug: Starting notify process
[ip-request-resp]
heartbeat: 2007/12/17_12:04:04 debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
heartbeat: 2007/12/17_12:04:04 debug: notify_world: Running harc
ip-request-resp
heartbeat: 2007/12/17_12:04:04 debug: Starting /etc/ha.d/resource.d/drbddisk
start
heartbeat: 2007/12/17_12:04:04 debug: /etc/ha.d/resource.d/drbddisk  start
done. RC=0
heartbeat: 2007/12/17_12:04:04 debug: Starting /etc/init.d/apache  start
*****************
*****************
** Heres the problem
*****************
Starting web server: apache failed
heartbeat: 2007/12/17_12:04:04 debug: /etc/init.d/apache  start done. RC=1
heartbeat: 2007/12/17_12:04:04 debug: Starting /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS stop
heartbeat: 2007/12/17_12:04:32 debug: /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS stop done. RC=0
heartbeat: 2007/12/17_12:04:32 debug: Starting /etc/init.d/apache  stop
Stopping web server: apache.
heartbeat: 2007/12/17_12:04:32 debug: /etc/init.d/apache  stop done. RC=0
heartbeat: 2007/12/17_12:04:32 debug: Starting
/etc/ha.d/resource.d/drbdlinks  stop
heartbeat: 2007/12/17_12:04:32 debug: /etc/ha.d/resource.d/drbdlinks  stop
done. RC=0
heartbeat: 2007/12/17_12:04:32 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /shared ext3 stop
heartbeat: 2007/12/17_12:04:32 debug: /etc/ha.d/resource.d/Filesystem
/dev/drbd0 /shared ext3 stop done. RC=0
heartbeat: 2007/12/17_12:04:32 debug: Starting /etc/ha.d/resource.d/drbddisk
stop
heartbeat: 2007/12/17_12:04:33 debug: /etc/ha.d/resource.d/drbddisk  stop
done. RC=0
heartbeat: 2007/12/17_12:04:33 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:04:33 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:04:33 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:04:33 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:04:33 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:04:33 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:04:33 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:04:33 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:04:33 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:04:33 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:04:33 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:04:33 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:04:33 debug: RscMgmtProc 'ip-request-resp' exited
code 0
*****************************
*** Sys2 shutdown completed
*** problems with resource take over
*****************************


Logs from Sys2 hearbeat debug level 1
******************************
*** fail Sys1 (primary)
*** to Sys2 (slave)
******************************
heartbeat: 2007/12/17_11:55:38 debug: process_resources(2):  other now
unstable
heartbeat: 2007/12/17_11:55:38 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_11:56:07 debug: process_resources(4):  other now
stable - T_SHUTDONE
heartbeat: 2007/12/17_11:56:07 debug: Process [go_standby] started pid 7661
heartbeat: 2007/12/17_11:56:07 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 3, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_11:56:07 debug: StartNextRemoteRscReq(): child count 1
heartbeat: 2007/12/17_11:56:07 debug: takeover_from_node: other now stable
heartbeat: 2007/12/17_11:56:07 debug: Process [req_our_resources(ask)]
started pid 7662
heartbeat: 2007/12/17_11:56:07 debug:
req_our_resources(/usr/lib/heartbeat/ResourceManager listkeys Sys2)
heartbeat: 2007/12/17_11:56:07 debug: Sending hold resources msg: all,
stable=1 # req_our_resources()
heartbeat: 2007/12/17_11:56:07 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 2, other_is_stable: 1, takeover_in_progress: 1,
going_standby: 3, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_11:56:07 debug: RscMgmtProc 'req_our_resources(ask)'
exited code 0
heartbeat: 2007/12/17_11:56:07 debug: StartNextRemoteRscReq(): child count 1
heartbeat: 2007/12/17_11:56:07 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 start
ls: /var/lib/heartbeat/rsctmp/IPaddr/eth0:*: No such file or directory
heartbeat: 2007/12/17_11:56:07 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 start done. RC=0
heartbeat: 2007/12/17_11:56:07 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 start
heartbeat: 2007/12/17_11:56:07 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 start done. RC=0
heartbeat: 2007/12/17_11:56:07 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 start
heartbeat: 2007/12/17_11:56:07 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 start done. RC=0
heartbeat: 2007/12/17_11:56:07 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 start
heartbeat: 2007/12/17_11:56:07 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 start done. RC=0
heartbeat: 2007/12/17_11:56:07 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 start
heartbeat: 2007/12/17_11:56:07 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 start done. RC=0
heartbeat: 2007/12/17_11:56:08 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 start
heartbeat: 2007/12/17_11:56:08 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 start done. RC=0
heartbeat: 2007/12/17_11:56:08 debug: Starting /etc/ha.d/resource.d/drbddisk
start
heartbeat: 2007/12/17_11:56:08 debug: /etc/ha.d/resource.d/drbddisk  start
done. RC=0
heartbeat: 2007/12/17_11:56:08 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /shared ext3 start
heartbeat: 2007/12/17_11:56:08 debug: /etc/ha.d/resource.d/Filesystem
/dev/drbd0 /shared ext3 start done. RC=0
heartbeat: 2007/12/17_11:56:08 debug: Starting
/etc/ha.d/resource.d/drbdlinks  start
heartbeat: 2007/12/17_11:56:08 debug: /etc/ha.d/resource.d/drbdlinks  start
done. RC=0
heartbeat: 2007/12/17_11:56:08 debug: Starting /etc/init.d/apache  start
Starting apache 1.3 web server....
heartbeat: 2007/12/17_11:56:09 debug: /etc/init.d/apache  start done. RC=0
heartbeat: 2007/12/17_11:56:09 debug: Starting /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS start
heartbeat: 2007/12/17_11:56:17 debug: hb_rsc_recover_dead_resources: other
now stable
heartbeat: 2007/12/17_11:56:37 debug: /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS start done. RC=0
heartbeat: 2007/12/17_11:56:37 debug: Sending standby [done] msg
heartbeat: 2007/12/17_11:56:37 debug: Received standby message done from
Sys2 in state 0
heartbeat: 2007/12/17_11:56:37 debug: RscMgmtProc 'go_standby' exited code 0
heartbeat: 2007/12/17_11:56:37 debug: StartNextRemoteRscReq() - calling hook
heartbeat: 2007/12/17_11:56:37 debug: notify_world: invoking harc: OLD
status: active
heartbeat: 2007/12/17_11:56:37 debug: Process [status] started pid 8503
heartbeat: 2007/12/17_11:56:37 debug: Starting notify process [status]
heartbeat: 2007/12/17_11:56:37 debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
heartbeat: 2007/12/17_11:56:37 debug: notify_world: Running harc status
heartbeat: 2007/12/17_11:56:37 debug: Starting /etc/ha.d/resource.d/drbddisk
start
heartbeat: 2007/12/17_11:56:37 debug: /etc/ha.d/resource.d/drbddisk  start
done. RC=0
heartbeat: 2007/12/17_11:56:37 debug: Starting /etc/init.d/apache  start
Starting apache 1.3 web server....
heartbeat: 2007/12/17_11:56:37 debug: /etc/init.d/apache  start done. RC=0
heartbeat: 2007/12/17_11:56:37 debug: Starting /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS start
heartbeat: 2007/12/17_11:57:05 debug: /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS start done. RC=0
heartbeat: 2007/12/17_11:57:05 debug: process_resources(3):  other now
stable
heartbeat: 2007/12/17_11:57:05 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_11:57:05 debug: RscMgmtProc 'status' exited code 0
******************************
*** Sys1 failure completed
*** Sys2 standint complete
******************************

******************************
*** Sys1 heartbeat start
*** Sys1 will be slave
******************************
heartbeat: 2007/12/17_12:00:11 debug: Status seqno: 1 msgtime: 1197910810
heartbeat: 2007/12/17_12:00:11 debug: StartNextRemoteRscReq() - calling hook
heartbeat: 2007/12/17_12:00:11 debug: notify_world: invoking harc: OLD
status: active
heartbeat: 2007/12/17_12:00:11 debug: Process [status] started pid 8802
heartbeat: 2007/12/17_12:00:11 debug: Starting notify process [status]
heartbeat: 2007/12/17_12:00:11 debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
heartbeat: 2007/12/17_12:00:11 debug: notify_world: Running harc status
heartbeat: 2007/12/17_12:00:11 debug: RscMgmtProc 'status' exited code 0
heartbeat: 2007/12/17_12:00:30 debug: Status seqno: 22 msgtime: 1197910830
heartbeat: 2007/12/17_12:00:30 debug: StartNextRemoteRscReq() - calling hook
heartbeat: 2007/12/17_12:00:30 debug: notify_world: invoking harc: OLD
status: active
heartbeat: 2007/12/17_12:00:30 debug: Process [status] started pid 8806
heartbeat: 2007/12/17_12:00:30 debug: Starting notify process [status]
heartbeat: 2007/12/17_12:00:30 debug: process_resources: other now unstable
heartbeat: 2007/12/17_12:00:30 debug: Sending hold resources msg: all,
stable=1 # <none>
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:00:30 debug: process_resources(2):  other now
unstable
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 0, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:00:30 debug: Sending hold resources msg: all,
stable=1 # <none>
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:00:30 debug: Calling PerformAutoFailback()
heartbeat: 2007/12/17_12:00:30 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:00:30 debug: notify_world: setting SIGCHLD Handler
to SIG_DFL
heartbeat: 2007/12/17_12:00:30 debug: notify_world: Running harc status
heartbeat: 2007/12/17_12:00:30 debug: RscMgmtProc 'status' exited code 0
***************************
*** Sys1 heartbeat complete
*** Sys1 is slave
***************************

***************************
*** Sys2 shutdown
*** Sys1 assume resources
****************************
heartbeat: 2007/12/17_12:03:06 debug: Process 7625 processing SIGTERM
heartbeat: 2007/12/17_12:03:06 debug: hb_initiate_shutdown() called.
heartbeat: 2007/12/17_12:03:06 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 4
heartbeat: 2007/12/17_12:03:06 debug: Sending hold resources msg: none,
stable=0 # shutdown
heartbeat: 2007/12/17_12:03:06 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat: 2007/12/17_12:03:06 debug: Process [hb_giveup_resources] started
pid 8814
heartbeat: 2007/12/17_12:03:06 debug: Starting /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS stop
heartbeat: 2007/12/17_12:03:34 debug: /etc/ha.d/resource.d/MailTo
[EMAIL PROTECTED] SYS stop done. RC=0
heartbeat: 2007/12/17_12:03:34 debug: Starting /etc/init.d/apache  stop
Stopping apache 1.3 web server....
heartbeat: 2007/12/17_12:03:34 debug: /etc/init.d/apache  stop done. RC=0
heartbeat: 2007/12/17_12:03:34 debug: Starting
/etc/ha.d/resource.d/drbdlinks  stop
heartbeat: 2007/12/17_12:03:34 debug: /etc/ha.d/resource.d/drbdlinks  stop
done. RC=0
heartbeat: 2007/12/17_12:03:34 debug: Starting
/etc/ha.d/resource.d/Filesystem /dev/drbd0 /shared ext3 stop
heartbeat: 2007/12/17_12:03:34 debug: /etc/ha.d/resource.d/Filesystem
/dev/drbd0 /shared ext3 stop done. RC=0
heartbeat: 2007/12/17_12:03:34 debug: Starting /etc/ha.d/resource.d/drbddisk
stop
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/drbddisk  stop
done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.25/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.24/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.23/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.22/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.21/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Starting /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 stop
SIOCDELRT: No such process
heartbeat: 2007/12/17_12:03:35 debug: /etc/ha.d/resource.d/IPaddr
192.168.232.20/24/eth0 stop done. RC=0
heartbeat: 2007/12/17_12:03:35 debug: Sending T_SHUTDONE.
heartbeat: 2007/12/17_12:03:35 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 1, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat: 2007/12/17_12:03:35 debug: Received T_SHUTDONE from us.
heartbeat: 2007/12/17_12:03:35 debug: Calling hb_mcp_final_shutdown in a
second.
heartbeat: 2007/12/17_12:03:35 debug: RscMgmtProc 'hb_giveup_resources'
exited code 0
heartbeat: 2007/12/17_12:03:35 debug: hb_mcp_final_shutdown() phase 0
heartbeat: 2007/12/17_12:03:35 debug: hb_rsc_isstable:
ResourceMgmt_child_count: 0, other_is_stable: 1, takeover_in_progress: 0,
going_standby: 0, standby running(ms): 0, resourcestate: 5
heartbeat: 2007/12/17_12:03:36 debug: Signing client 7644 off
heartbeat: 2007/12/17_12:03:36 debug: EOF from client pid 7644
heartbeat: 2007/12/17_12:03:36 debug: G_remove_client(pid=7644, reason='EOF'
gsource=0x80e5d40) {
heartbeat: 2007/12/17_12:03:36 debug: api_remove_client_int: removing pid
[7644] reason: EOF
heartbeat: 2007/12/17_12:03:36 debug: }/*G_remove_client;*/
heartbeat: 2007/12/17_12:03:36 debug: hb_mcp_final_shutdown() phase 1
heartbeat: 2007/12/17_12:03:36 debug: Process 7632 processing SIGTERM
heartbeat: 2007/12/17_12:03:36 debug: Exiting from pid 7632 [rc=15]
heartbeat: 2007/12/17_12:03:36 debug: Process 7633 processing SIGTERM
heartbeat: 2007/12/17_12:03:36 debug: Exiting from pid 7633 [rc=15]
heartbeat: 2007/12/17_12:03:36 debug: Process 7634 processing SIGTERM
heartbeat: 2007/12/17_12:03:36 debug: Exiting from pid 7634 [rc=15]
heartbeat: 2007/12/17_12:03:36 debug: Process 7635 processing SIGTERM
heartbeat: 2007/12/17_12:03:36 debug: Exiting from pid 7635 [rc=15]
heartbeat: 2007/12/17_12:03:36 debug: Process 7636 processing SIGTERM
heartbeat: 2007/12/17_12:03:36 debug: Exiting from pid 7636 [rc=15]
heartbeat: 2007/12/17_12:03:36 debug: Exiting from pid 7625 [rc=0]
****************************
*** Sys2 shutdown complete
*****************************


Apache Server Log (loglevel debug)
[Mon Dec 17 12:03:35 2007] [info] created shared memory segment #1048577
[Mon Dec 17 12:03:35 2007] [notice] Apache configured -- resuming normal
operations
[Mon Dec 17 12:03:35 2007] [info] Server built: Aug 27 2006 16:34:48
[Mon Dec 17 12:03:36 2007] [notice] Accept mutex: sysvsem (Default: sysvsem)
[Mon Dec 17 12:04:04 2007] [info] removed PID file /var/run/apache.pid
(pid=20655)
[Mon Dec 17 12:04:04 2007] [notice] caught SIGTERM, shutting down

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to