Deja,
The logs.
Notice the setting may be some differrent from what I discribed because I am
doing investigation.
Also, the time is not synced up between the 2 servers. The real sequence is:
1. start heartbeat on server f1 (whose drbd is inconsistant, and peer drbd
not online) --> messages.f1.hb_start
2. after f1 failed to stand by, start drbd on
f2 --> log
ommitted
3. start heartbeat on f2 (f2 will do nothing, instead of taking over
resources) --> messages.f3.hb_start
4. run /usr/lib/heartbeat/hb_takeover manually on f2, all resources starting
up on f2. --> messages.f3.hb_takeover
Thanks
On Jan 16, 2008 1:12 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On Tue, Jan 15, 2008 at 10:51:48PM +0800, Tonglu Yi wrote:
> > Hi Thomas
> > Thanks.
> >
> > But I am using heartbeat 2.1.2 with release 1 style configuration, and
> it is
> > impossible to change to use release 2 configuration.
> >
> > Actually what I expect heartbeat to do is that at startup time,
> heartbeat
> > goes to stanby if there is any resource failed to start, and let the
> other
> > node to take over resources when the other server starts up.
> >
> > Currently the problem is that heartbeat does give up all resources, but
> > after giving up all resources, it still thinks itself holding all
> resources,
> > thus the peer is unable to take over resources when starting up.
>
> Could you provide the logs.
>
> Thanks,
>
> Dejan
> >
> >
> > On Jan 15, 2008 10:39 PM, Thomas Glanzmann <[EMAIL PROTECTED]> wrote:
> >
> > > Hello Tonglu,
> > > the following setup works with heartbeat-2.1.3 and drbd-8.1.3:
> > >
> > > cibadmin -U -X '
> > > <configuration>
> > > <resources>
> > > <master_slave id="ms-drbd0">
> > > <meta_attributes id="ma-ms-drbd0">
> > > <attributes>
> > > <nvpair id="ma-ms-drbd0-1"
> > > name="clone_max" value="2"/>
> > > <nvpair id="ma-ms-drbd0-2"
> > > name="clone_node_max" value="1"/>
> > > <nvpair id="ma-ms-drbd0-3"
> > > name="master_max" value="1"/>
> > > <nvpair id="ma-ms-drbd0-4"
> > > name="master_node_max" value="1"/>
> > > <nvpair id="ma-ms-drbd0-5"
> > > name="notify" value="yes"/>
> > > <nvpair id="ma-ms-drbd0-6"
> > > name="globally_unique" value="false"/>
> > > </attributes>
> > > </meta_attributes>
> > > <primitive id="drbd0" class="ocf"
> > > provider="heartbeat" type="drbd">
> > > <instance_attributes id="ia-drbd0">
> > > <attributes>
> > > <nvpair id="ia-drbd0-1"
> > > name="drbd_resource" value="postgres"/>
> > > </attributes>
> > > </instance_attributes>
> > > <operations>
> > > <op id="op-ms-drbd2-1"
> > > name="monitor" interval="60s" timeout="60s" start_delay="30s"
> > > role="Master"/>
> > > <op id="op-ms-drbd2-2"
> > > name="monitor" interval="61s" timeout="60s" start_delay="30s"
> role="Slave"/>
> > > </operations>
> > > </primitive>
> > > </master_slave>
> > >
> > > <group id="nfs-cluster">
> > > <primitive class="ocf" provider="heartbeat"
> > > type="Filesystem" id="fs0">
> > > <instance_attributes id="ia-fs0">
> > > <attributes>
> > > <nvpair id="ia-fs0-1"
> > > name="fstype" value="ext3"/>
> > > <nvpair
> name="directory"
> > > id="ia-fs0-2" value="/srv/gcl"/>
> > > <nvpair id="ia-fs0-3"
> > > name="device" value="/dev/drbd0"/>
> > > </attributes>
> > > </instance_attributes>
> > > <operations>
> > > <op id="fs0-monitor0"
> > > name="monitor" interval="60s" timeout="120s" start_delay="1m"/>
> > > </operations>
> > > </primitive>
> > > </group>
> > > </resources>
> > >
> > > <constraints>
> > > <rsc_location id="drbd0-placement-1" rsc="ms-drbd0">
> > > <rule id="drbd0-rule-1" score="-INFINITY">
> > > <expression id="exp-01" value="ha-1"
> > > attribute="#uname" operation="ne"/>
> > > <expression id="exp-02" value="ha-2"
> > > attribute="#uname" operation="ne"/>
> > > </rule>
> > > </rsc_location>
> > >
> > > <rsc_order id="nfs_promotes_ms-drbd0"
> from="nfs-cluster"
> > > action="start" to="ms-drbd0" to_action="promote"/>
> > > <rsc_colocation id="nfs_on_drbd0" to="ms-drbd0"
> > > to_role="master" from="nfs-cluster" score="infinity"/>
> > > </constraints>
> > > </configuration>
> > > '
> > >
> > > Thomas
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
Jan 15 19:29:40 f1 kernel: drbd0: receiver terminated
Jan 15 19:29:40 f1 kernel: drbd0: receiver (re)started
Jan 15 19:29:40 f1 kernel: drbd0: conn( Unconnected -> WFConnection )
Jan 15 19:30:05 f1 logd: [28325]: info: logd started with default configuration.
Jan 15 19:30:05 f1 logd: [28325]: info: G_main_add_SignalHandler: Added signal
handler for signal 15
Jan 15 19:30:05 f1 logd: [28326]: info: G_main_add_SignalHandler: Added signal
handler for signal 15
heartbeat[28366]: 2008/01/15_19:30:05 info: Version 2 support: false
heartbeat[28366]: 2008/01/15_19:30:05 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[28366]: 2008/01/15_19:30:05 info: **************************
heartbeat[28366]: 2008/01/15_19:30:05 info: Configuration validated. Starting
heartbeat 2.1.3
heartbeat[28367]: 2008/01/15_19:30:05 info: heartbeat: version 2.1.3
heartbeat[28367]: 2008/01/15_19:30:05 info: Heartbeat generation: 1189000674
heartbeat[28367]: 2008/01/15_19:30:05 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth1
heartbeat[28367]: 2008/01/15_19:30:05 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth1 - Status: 1
heartbeat[28367]: 2008/01/15_19:30:05 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth2
heartbeat[28367]: 2008/01/15_19:30:05 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth2 - Status: 1
heartbeat[28367]: 2008/01/15_19:30:05 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[28367]: 2008/01/15_19:30:05 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[28367]: 2008/01/15_19:30:05 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[28367]: 2008/01/15_19:30:05 info: Local status now set to: 'up'
heartbeat[28367]: 2008/01/15_19:30:06 info: Link f1:eth1 up.
heartbeat[28367]: 2008/01/15_19:30:06 info: Link f1:eth2 up.
heartbeat[28367]: 2008/01/15_19:30:52 WARN: node f3: is dead
heartbeat[28367]: 2008/01/15_19:30:52 info: Comm_now_up(): updating status to
active
heartbeat[28367]: 2008/01/15_19:30:52 info: Local status now set to: 'active'
heartbeat[28367]: 2008/01/15_19:30:52 WARN: No STONITH device configured.
heartbeat[28367]: 2008/01/15_19:30:52 WARN: Shared disks are not protected.
heartbeat[28367]: 2008/01/15_19:30:52 info: Resources being acquired from f3.
harc[28377]: 2008/01/15_19:30:52 info: Running /etc/ha.d/rc.d/status status
heartbeat[28378]: 2008/01/15_19:30:52 info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys f1] to acquire.
heartbeat[28367]: 2008/01/15_19:30:52 info: Initial resource acquisition
complete (T_RESOURCES(us))
mach_down[28405]: 2008/01/15_19:30:52 info: Taking over resource group
drbddisk
ResourceManager[28430]: 2008/01/15_19:30:52 info: Acquiring resource group: f3
drbddisk IPaddr::10.80.1.2/16
ResourceManager[28430]: 2008/01/15_19:30:52 info: Running
/etc/ha.d/resource.d/drbddisk start
Jan 15 19:30:53 f1 kernel: drbd0: State change failed: Refusing to be Primary
without at least one UpToDate disk
Jan 15 19:30:53 f1 kernel: drbd0: state = { cs:WFConnection
st:Secondary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:53 f1 kernel: drbd0: wanted = { cs:WFConnection
st:Primary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:54 f1 kernel: drbd0: State change failed: Refusing to be Primary
without at least one UpToDate disk
Jan 15 19:30:54 f1 kernel: drbd0: state = { cs:WFConnection
st:Secondary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:54 f1 kernel: drbd0: wanted = { cs:WFConnection
st:Primary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:55 f1 kernel: drbd0: State change failed: Refusing to be Primary
without at least one UpToDate disk
Jan 15 19:30:55 f1 kernel: drbd0: state = { cs:WFConnection
st:Secondary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:55 f1 kernel: drbd0: wanted = { cs:WFConnection
st:Primary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:56 f1 kernel: drbd0: State change failed: Refusing to be Primary
without at least one UpToDate disk
Jan 15 19:30:56 f1 kernel: drbd0: state = { cs:WFConnection
st:Secondary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:56 f1 kernel: drbd0: wanted = { cs:WFConnection
st:Primary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:57 f1 kernel: drbd0: State change failed: Refusing to be Primary
without at least one UpToDate disk
Jan 15 19:30:57 f1 kernel: drbd0: state = { cs:WFConnection
st:Secondary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:57 f1 kernel: drbd0: wanted = { cs:WFConnection
st:Primary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:58 f1 kernel: drbd0: State change failed: Refusing to be Primary
without at least one UpToDate disk
Jan 15 19:30:58 f1 kernel: drbd0: state = { cs:WFConnection
st:Secondary/Unknown ds:Inconsistent/DUnknown r--u }
Jan 15 19:30:58 f1 kernel: drbd0: wanted = { cs:WFConnection
st:Primary/Unknown ds:Inconsistent/DUnknown r--u }
ResourceManager[28430]: 2008/01/15_19:30:58 ERROR: Return code 20 from
/etc/ha.d/resource.d/drbddisk
ResourceManager[28430]: 2008/01/15_19:30:58 CRIT: Giving up resources due to
failure of drbddisk
ResourceManager[28430]: 2008/01/15_19:30:58 info: Releasing resource group: f3
drbddisk IPaddr::10.80.1.2/16
ResourceManager[28430]: 2008/01/15_19:30:58 info: Running
/etc/ha.d/resource.d/IPaddr 10.80.1.2/16 stop
IPaddr[28518]: 2008/01/15_19:30:58 INFO: Success
ResourceManager[28430]: 2008/01/15_19:30:58 info: Running
/etc/ha.d/resource.d/drbddisk stop
mach_down[28405]: 2008/01/15_19:30:58 info:
/usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[28405]: 2008/01/15_19:30:58 info: mach_down takeover complete
for node f3.
heartbeat[28367]: 2008/01/15_19:30:58 info: mach_down takeover complete.
heartbeat[28367]: 2008/01/15_19:31:02 info: Local Resource acquisition
completed. (none)
heartbeat[28367]: 2008/01/15_19:31:02 info: local resource transition completed.
hb_standby[28605]: 2008/01/15_19:31:28 Going standby [foreign].
heartbeat[28367]: 2008/01/15_19:31:28 info: f1 wants to go standby [foreign]
heartbeat[28367]: 2008/01/15_19:31:38 WARN: No reply to standby request.
Standby request cancelled.
Jan 15 19:31:46 f1 kernel: drbd0: conn( WFConnection -> WFReportParams )
Jan 15 19:31:46 f1 kernel: drbd0: Handshake successful: DRBD Network Protocol
version 86
Jan 15 19:31:46 f1 kernel: drbd0: Becoming sync target due to disk states.
Jan 15 19:31:46 f1 kernel: drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
Jan 15 19:31:46 f1 kernel: drbd0: Writing meta data super block now.
Jan 15 19:31:46 f1 kernel: drbd0: conn( WFBitMapT -> WFSyncUUID )
Jan 15 19:31:46 f1 kernel: drbd0: conn( WFSyncUUID -> PausedSyncT )
Jan 15 19:31:46 f1 kernel: drbd0: Began resync as PausedSyncT (will sync 64188
KB [16047 bits set]).
Jan 15 19:31:46 f1 kernel: drbd0: Writing meta data super block now.
Jan 15 19:32:33 f1 kernel: drbd0: conn( PausedSyncT -> SyncTarget ) user_isp( 1
-> 0 )
Jan 15 19:32:33 f1 kernel: drbd0: Syncer continues.
Jan 15 19:32:34 f1 kernel: drbd0: Resync done (total 48 sec; paused 47 sec;
64188 K/sec)
Jan 15 19:32:34 f1 kernel: drbd0: conn( SyncTarget -> Connected ) disk(
Inconsistent -> UpToDate )
Jan 15 19:32:34 f1 kernel: drbd0: Writing meta data super block now.
heartbeat[28367]: 2008/01/15_19:33:34 info: Link f3:eth1 up.
heartbeat[28367]: 2008/01/15_19:33:34 info: Status update for node f3: status
init
heartbeat[28367]: 2008/01/15_19:33:34 info: Link f3:eth2 up.
heartbeat[28367]: 2008/01/15_19:33:34 info: Status update for node f3: status up
harc[28626]: 2008/01/15_19:33:34 info: Running /etc/ha.d/rc.d/status status
harc[28642]: 2008/01/15_19:33:34 info: Running /etc/ha.d/rc.d/status status
heartbeat[28367]: 2008/01/15_19:33:34 info: Status update for node f3: status
active
harc[28657]: 2008/01/15_19:33:34 info: Running /etc/ha.d/rc.d/status status
heartbeat[28367]: 2008/01/15_19:33:34 info: remote resource transition
completed.
Jan 15 23:14:51 f3 logd: [26586]: info: logd started with default configuration.
Jan 15 23:14:51 f3 logd: [26586]: info: G_main_add_SignalHandler: Added signal
handler for signal 15
Jan 15 23:14:51 f3 logd: [26587]: info: G_main_add_SignalHandler: Added signal
handler for signal 15
heartbeat[26627]: 2008/01/15_23:14:52 info: Version 2 support: false
heartbeat[26627]: 2008/01/15_23:14:52 WARN: Logging daemon is disabled
--enabling logging daemon is recommended
heartbeat[26627]: 2008/01/15_23:14:52 info: **************************
heartbeat[26627]: 2008/01/15_23:14:52 info: Configuration validated. Starting
heartbeat 2.1.3
heartbeat[26628]: 2008/01/15_23:14:52 info: heartbeat: version 2.1.3
heartbeat[26628]: 2008/01/15_23:14:52 info: Heartbeat generation: 1189000113
heartbeat[26628]: 2008/01/15_23:14:52 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth1
heartbeat[26628]: 2008/01/15_23:14:52 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth1 - Status: 1
heartbeat[26628]: 2008/01/15_23:14:52 info: glib: UDP Broadcast heartbeat
started on port 694 (694) interface eth2
heartbeat[26628]: 2008/01/15_23:14:52 info: glib: UDP Broadcast heartbeat
closed on port 694 interface eth2 - Status: 1
heartbeat[26628]: 2008/01/15_23:14:52 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[26628]: 2008/01/15_23:14:52 info: G_main_add_TriggerHandler: Added
signal manual handler
heartbeat[26628]: 2008/01/15_23:14:52 info: G_main_add_SignalHandler: Added
signal handler for signal 17
heartbeat[26628]: 2008/01/15_23:14:52 info: Local status now set to: 'up'
heartbeat[26628]: 2008/01/15_23:14:53 info: Link f1:eth1 up.
heartbeat[26628]: 2008/01/15_23:14:53 info: Status update for node f1: status
active
heartbeat[26628]: 2008/01/15_23:14:53 info: Link f1:eth2 up.
heartbeat[26628]: 2008/01/15_23:14:53 info: Link f3:eth1 up.
heartbeat[26628]: 2008/01/15_23:14:53 info: Link f3:eth2 up.
harc[26637]: 2008/01/15_23:14:53 info: Running /etc/ha.d/rc.d/status status
heartbeat[26628]: 2008/01/15_23:14:53 info: Comm_now_up(): updating status to
active
heartbeat[26628]: 2008/01/15_23:14:53 info: Local status now set to: 'active'
heartbeat[26628]: 2008/01/15_23:14:53 info: remote resource transition
completed.
heartbeat[26628]: 2008/01/15_23:14:53 info: remote resource transition
completed.
heartbeat[26628]: 2008/01/15_23:14:53 info: Local Resource acquisition
completed. (none)
heartbeat[26628]: 2008/01/15_23:14:53 info: Initial resource acquisition
complete (T_RESOURCES(them))
harc[26669]: 2008/01/15_23:15:50 info: Running /etc/ha.d/rc.d/hb_takeover
hb_takeover
heartbeat[26628]: 2008/01/15_23:15:51 info: f1 wants to go standby [all]
heartbeat[26628]: 2008/01/15_23:15:52 info: standby: acquire [all] resources
from f1
heartbeat[26684]: 2008/01/15_23:15:52 info: acquire all HA resources (standby).
ResourceManager[26697]: 2008/01/15_23:15:52 info: Acquiring resource group: f3
drbddisk IPaddr::10.80.1.2/16
ResourceManager[26697]: 2008/01/15_23:15:52 info: Running
/etc/ha.d/resource.d/drbddisk start
Jan 15 23:15:52 f3 kernel: drbd0: role( Secondary -> Primary )
Jan 15 23:15:52 f3 kernel: drbd0: Writing meta data super block now.
Jan 15 23:15:52 f3 kernel: kjournald starting. Commit interval 5 seconds
Jan 15 23:15:52 f3 kernel: EXT3 FS on drbd0, internal journal
Jan 15 23:15:52 f3 kernel: EXT3-fs: mounted filesystem with ordered data mode.
IPaddr[26752]: 2008/01/15_23:15:52 INFO: Resource is stopped
ResourceManager[26697]: 2008/01/15_23:15:52 info: Running
/etc/ha.d/resource.d/IPaddr 10.80.1.2/16 start
IPaddr[26841]: 2008/01/15_23:15:53 INFO: Using calculated nic for 10.80.1.2:
eth3
IPaddr[26841]: 2008/01/15_23:15:53 INFO: Using calculated netmask for
10.80.1.2: 255.255.0.0
IPaddr[26841]: 2008/01/15_23:15:53 INFO: eval ifconfig eth3:0 10.80.1.2
netmask 255.255.0.0 broadcast 10.80.255.255
IPaddr[26815]: 2008/01/15_23:15:53 INFO: Success
heartbeat[26684]: 2008/01/15_23:15:53 info: all HA resource acquisition
completed (standby).
heartbeat[26628]: 2008/01/15_23:15:53 info: Standby resource acquisition done
[all].
heartbeat[26628]: 2008/01/15_23:15:54 info: remote resource transition
completed.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems