Pushkar, here are logs from both servers.

They seem to both think that "other holds resources", that's my read
on the situation.

Any help will be appreciated.

Thank you

Igor

========================================================================

r...@pfs-srv3:~# tail -40  /var/log/ha-log
Jul 27 12:03:38 pfs-srv3 heartbeat: [1430]: info: heartbeat: version 3.0.2
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info: Heartbeat
generation: 1279723736
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info: glib: UDP Broadcast
heartbeat started on port 12694 (12694) interface eth1
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info: glib: UDP Broadcast
heartbeat closed on port 12694 interface eth1 - Status: 1
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info: Local status now set to: 'up'
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info: Link pfs-srv3:eth1 up.
Jul 27 12:03:39 pfs-srv3 heartbeat: [1430]: info: Managed
write_hostcachedata process 1483 exited with return code 0.
Jul 27 12:03:41 pfs-srv3 heartbeat: [1430]: info: Link pfs-srv4:eth1 up.
Jul 27 12:03:41 pfs-srv3 heartbeat: [1430]: info: Status update for
node pfs-srv4: status up
Jul 27 12:03:41 pfs-srv3 heartbeat: [1430]: info: Managed
write_hostcachedata process 1486 exited with return code 0.
Jul 27 12:03:42 pfs-srv3 harc[1485]: [1492]: info: Running
/etc/ha.d//rc.d/status status
Jul 27 12:03:42 pfs-srv3 heartbeat: [1430]: info: Managed status
process 1485 exited with return code 0.
Jul 27 12:03:42 pfs-srv3 heartbeat: [1430]: info: Comm_now_up():
updating status to active
Jul 27 12:03:42 pfs-srv3 heartbeat: [1430]: info: Local status now set
to: 'active'
Jul 27 12:03:42 pfs-srv3 heartbeat: [1430]: info: Managed
write_hostcachedata process 1498 exited with return code 0.
Jul 27 12:03:43 pfs-srv3 heartbeat: [1430]: info: Managed
write_delcachedata process 1499 exited with return code 0.
Jul 27 12:03:43 pfs-srv3 heartbeat: [1430]: info: Status update for
node pfs-srv4: status active
Jul 27 12:03:43 pfs-srv3 heartbeat: [1430]: info:
AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0))
Jul 27 12:03:43 pfs-srv3 heartbeat: [1430]: info: STATE 1 => 3
Jul 27 12:03:43 pfs-srv3 heartbeat: [1430]: info: STATE 3 => 2
Jul 27 12:03:43 pfs-srv3 heartbeat: [1430]: info: other_holds_resources: 0
Jul 27 12:03:43 pfs-srv3 heartbeat: [1430]: info: STATE 2 => 3
Jul 27 12:03:43 pfs-srv3 harc[1500]: [1506]: info: Running
/etc/ha.d//rc.d/status status
Jul 27 12:03:43 pfs-srv3 heartbeat: [1430]: info: Managed status
process 1500 exited with return code 0.
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info: local resource
transition completed.
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Jul 27 12:03:53 pfs-srv3 heartbeat: [1512]: info: 1 local resources
from [/usr/share/heartbeat/ResourceManager listkeys pfs-srv3]
Jul 27 12:03:53 pfs-srv3 heartbeat: [1512]: info: Local Resource
acquisition completed.
Jul 27 12:03:53 pfs-srv3 heartbeat: [1512]: info: FIFO message [type
resource] written rc=81
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info: other_holds_resources: 0
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info: remote resource
transition completed.
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info: other_holds_resources: 1
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info: other_holds_resources: 1
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
Jul 27 12:03:53 pfs-srv3 heartbeat: [1430]: info: Managed
req_our_resources(ask) process 1512 exited with return code 0.

==================================================================================
r...@pfs-srv4:~# tail -40  /var/log/ha-log
Jul 27 12:03:34 pfs-srv4 heartbeat: [1249]: info: heartbeat: version 3.0.2
Jul 27 12:03:35 pfs-srv4 heartbeat: [1249]: info: Heartbeat
generation: 1279723741
Jul 27 12:03:35 pfs-srv4 heartbeat: [1249]: info: glib: UDP Broadcast
heartbeat started on port 12694 (12694) interface eth1
Jul 27 12:03:35 pfs-srv4 heartbeat: [1249]: info: glib: UDP Broadcast
heartbeat closed on port 12694 interface eth1 - Status: 1
Jul 27 12:03:35 pfs-srv4 heartbeat: [1249]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jul 27 12:03:35 pfs-srv4 heartbeat: [1249]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jul 27 12:03:35 pfs-srv4 heartbeat: [1249]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jul 27 12:03:35 pfs-srv4 heartbeat: [1249]: info: Local status now set to: 'up'
Jul 27 12:03:35 pfs-srv4 heartbeat: [1249]: info: Managed
write_hostcachedata process 1284 exited with return code 0.
Jul 27 12:03:36 pfs-srv4 heartbeat: [1249]: info: Link pfs-srv4:eth1 up.
Jul 27 12:03:36 pfs-srv4 heartbeat: [1249]: info: Link pfs-srv3:eth1 up.
Jul 27 12:03:36 pfs-srv4 heartbeat: [1249]: info: Status update for
node pfs-srv3: status up
Jul 27 12:03:36 pfs-srv4 heartbeat: [1249]: info: Managed
write_hostcachedata process 1286 exited with return code 0.
Jul 27 12:03:36 pfs-srv4 harc[1285]: [1292]: info: Running
/etc/ha.d//rc.d/status status
Jul 27 12:03:36 pfs-srv4 heartbeat: [1249]: info: Managed status
process 1285 exited with return code 0.
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: Comm_now_up():
updating status to active
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: Local status now set
to: 'active'
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: Status update for
node pfs-srv3: status active
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info:
AnnounceTakeover(local 0, foreign 1, reason 'HB_R_BOTHSTARTING' (0))
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: STATE 1 => 3
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: STATE 3 => 2
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: Managed
write_hostcachedata process 1298 exited with return code 0.
Jul 27 12:03:37 pfs-srv4 harc[1297]: [1305]: info: Running
/etc/ha.d//rc.d/status status
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: Managed status
process 1297 exited with return code 0.
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: other_holds_resources: 0
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: STATE 2 => 3
Jul 27 12:03:37 pfs-srv4 heartbeat: [1249]: info: Managed
write_delcachedata process 1299 exited with return code 0.
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info: remote resource
transition completed.
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info: other_holds_resources: 1
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info: remote resource
transition completed.
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (0))
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info: Initial resource
acquisition complete (T_RESOURCES(us))
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(them)' (1))
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info: STATE 3 => 4
Jul 27 12:03:53 pfs-srv4 heartbeat: [1315]: info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys pfs-srv4] to acquire.
Jul 27 12:03:53 pfs-srv4 heartbeat: [1315]: info: FIFO message [type
resource] written rc=81
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info:
AnnounceTakeover(local 1, foreign 1, reason 'T_RESOURCES(us)' (1))
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info: Managed
req_our_resources(ask) process 1315 exited with return code 0.
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info: other_holds_resources: 1
Jul 27 12:03:53 pfs-srv4 heartbeat: [1249]: info: other_holds_resources: 1

On Mon, Jul 26, 2010 at 1:04 PM, Pushkar Pradhan <[email protected]> wrote:
>
> ________________________________
>
> From: [email protected] on behalf of Igor Chudov
> Sent: Mon 7/26/2010 7:16 AM
> To: [email protected]
> Subject: [Linux-HA] Big progress with Heartbeat,but simultaneous reboot 
> leaves services unprovided
>
>
>
> I am setting up a two node cluster, using drbd and Heartbeat. I use
> standard packages on Ubuntu Hardy.
>
> The services being provided externally is a NFS and samba share that
> is on top of the DRBD filesystem, and the service IP address.
>
> I am not using corosync at the moment.
>
> At this point, most things work great: the shared services and IP
> address are passed around when servers reboot or are unplugged, etc.
>
> However, I HAVE ONE PROBLEM: if I simultaneously reboot both servers
> by typing reboot in both sessions, and then hitting ENTER in both at
> about the same time, neither of the servers acquires shared services,
> so they remain unprovided.
>
> If, after that, I reboot one of the servers again, then the unrebooted
> one acquires services. What exactly am I doing wrong?
>
> Here is my ha.cf and haresources:
>
> ==>cat ha.cf
> use_logd on
> udpport 12694
> keepalive 1
> warntime 15
> deadtime 20
> debug 1
> initdead 60
> bcast eth1
> node pfs-srv3
> node pfs-srv4
> auto_failback on
> crm off
>
>
> ==>cat haresources
> pfs-srv3 drbddisk::r0 Filesystem::/dev/drbd0::/pfs::ext3 10.1.8.45/24
> nfs-kernel-server smbd
>
>
>
>
> Did you see the logs? Does HA try to start the resources on the preferred 
> node? Can you check what is the status reported by HB script 
> (/etc/init.d/heartbeat status)?
> Also can you run cl_status with various arguments e.g. nodestatus, hbstatus?
> You can also run the individual resource scripts with the status argument to 
> check what it reports (started/stopped)?
> pushkar
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to