Re: [Linux-HA] Still problems with split brain

Dejan Muhamedagic Fri, 09 May 2008 03:35:22 -0700

Hi,

On Fri, May 09, 2008 at 11:46:29AM +0200, Stallmann, Andreas wrote:
> Hi there,
> 
> were still in deep sh** with heartbeat and drbd in a split brain
> szenario.
> 
> We have the following set up:
> 
> - A two node active/passive cluster (heartbeat 2.1.3 without crm)
> - Dopd with drbd-peer-outdater (the newest ones, patched).
> - Ipfail
> 
> Still, if we disconnect one host from the network, both nodes become
> primary. However we see a different behavior, if the primary or the
> secondary get's disconnected:
> 
> - If we disconnect the primary (conetapp01 in our case) from the
> network, the hosts stay in "WFConnection", and reconnect after the
> primary's interfaces on the switch are up again. No sync happens, thus
> changes made on the conetapp02 (which was temporarily primary during the
> split brain) get lost.
> 
> - If we disconnect the secondary from the network, the host run in to a
> "Standalone" situation. They stay there even after the secondary has
> been attached to the network again and have to be connected manually by
> "drbdadm connect all".
> 
> Dopd seems to be of no obvious use here: Both nodes outdate their peer.
> What I like to see happening, is that the unattached peer outdates
> itself! Isn't that the intended behavior?
> 
> An other thing that annoys me quite a lot, ist that heartbeat (well,
> ipfail) notices that a disconnected host is dead. Still heartbeat
> triggers a failover! 
> 
> If anyone has a working configuration, or any suggestions how we can get
> ours working or any other suggestions (even if it involves money for,
> let's say, a stonith device) please let me know.


Stonith is indispensable if you care about your data.

> Attached you find the messages file from conetapp01 (our active primary)
> and from conetapp02 (our standby secondary) and further down the
> heartbeat and drbd configuration. We disconnected conetapp01 from the
> network.
> 
> Thanks for your help!

You should be better off with the drbd forums.

Thanks,

Dejan

> ~~~~~~~~~~~~~~~~~~~~~~~~~~/var/log/messages from
> conetapp01~~~~~~~~~~~~~~~~~~~~~~~~
> 
> May  9 10:23:44 conetapp01 kernel: bnx2: eth0 NIC Link is Down
> May  9 10:23:44 conetapp01 kernel: bnx2: eth1 NIC Link is Down
> May  9 10:23:51 conetapp01 kernel: drbd0: PingAck did not arrive in
> time.
> May  9 10:23:51 conetapp01 kernel: drbd0: peer( Secondary -> Unknown )
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> May  9 10:23:51 conetapp01 kernel: drbd0: asender terminated
> May  9 10:23:51 conetapp01 kernel: drbd0: Terminating asender thread
> May  9 10:23:51 conetapp01 kernel: drbd0: short read expecting header on
> sock: r=-512
> May  9 10:23:51 conetapp01 kernel: drbd0: Creating new current UUID
> May  9 10:23:51 conetapp01 kernel: drbd0: Writing meta data super block
> now.
> May  9 10:23:51 conetapp01 kernel: drbd0: tl_clear()
> May  9 10:23:51 conetapp01 kernel: drbd0: Connection closed
> May  9 10:23:51 conetapp01 kernel: drbd0: helper command: /sbin/drbdadm
> outdate-peer
> May  9 10:23:51 conetapp01 drbd-peer-outdater: [21355]: debug: drbd
> peer: conetapp02
> May  9 10:23:51 conetapp01 drbd-peer-outdater: [21355]: debug: drbd
> resource: r0
> May  9 10:23:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> Connecting channel
> May  9 10:23:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> Client outdater (0x80522b0) connected
> May  9 10:23:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> invoked: outdater
> May  9 10:23:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> Processing msg from outdater
> May  9 10:23:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug: Got
> message from (drbd-peer-outdater). (peer: conetapp02, res :r0)
> May  9 10:23:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> Starting node walk
> May  9 10:23:52 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug: node
> conetapp02 found
> May  9 10:23:52 conetapp01 /usr/lib/heartbeat/dopd: [19616]: info:
> sending start_outdate message to the other node conetapp01 -> conetapp02
> May  9 10:23:52 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> sending [start_outdate res: r0] to node: conetapp02
> May  9 10:23:52 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> Processed 1 messages
> May  9 10:24:04 conetapp01 heartbeat: [19538]: WARN: node 192.168.111.1:
> is dead
> May  9 10:24:04 conetapp01 heartbeat: [19538]: WARN: node 192.168.111.8:
> is dead
> May  9 10:24:04 conetapp01 heartbeat: [19538]: debug:
> StartNextRemoteRscReq(): child count 1
> May  9 10:24:04 conetapp01 heartbeat: [19538]: info: Link
> 192.168.111.1:192.168.111.1 dead.
> May  9 10:24:04 conetapp01 heartbeat: [19538]: info: Link
> 192.168.111.8:192.168.111.8 dead.
> May  9 10:24:04 conetapp01 ipfail: [19615]: info: Status update: Node
> 192.168.111.1 now has status dead
> May  9 10:24:04 conetapp01 heartbeat: [21429]: debug: notify_world:
> setting SIGCHLD Handler to SIG_DFL
> May  9 10:24:04 conetapp01 harc[21429]: info: Running
> /etc/ha.d/rc.d/status status
> May  9 10:24:05 conetapp01 heartbeat: [21454]: debug: notify_world:
> setting SIGCHLD Handler to SIG_DFL
> May  9 10:24:05 conetapp01 harc[21454]: info: Running
> /etc/ha.d/rc.d/status status
> May  9 10:24:05 conetapp01 heartbeat: [19538]: WARN: node conetapp02: is
> dead
> May  9 10:24:05 conetapp01 heartbeat: [19538]: WARN: No STONITH device
> configured.
> May  9 10:24:05 conetapp01 heartbeat: [19538]: WARN: Shared disks are
> not protected.
> May  9 10:24:05 conetapp01 heartbeat: [19538]: info: Resources being
> acquired from conetapp02.
> May  9 10:24:05 conetapp01 heartbeat: [19538]: info: Link
> conetapp02:eth0 dead.
> May  9 10:24:05 conetapp01 heartbeat: [19538]: info: Link
> conetapp02:eth1 dead.
> May  9 10:24:05 conetapp01 heartbeat: [21485]: debug: notify_world:
> setting SIGCHLD Handler to SIG_DFL
> May  9 10:24:05 conetapp01 harc[21485]: info: Running
> /etc/ha.d/rc.d/status status
> May  9 10:24:05 conetapp01 mach_down[21514]: info:
> /usr/share/heartbeat/mach_down: nice_failback: foreign resources
> acquired
> May  9 10:24:05 conetapp01 mach_down[21514]: info: mach_down takeover
> complete for node conetapp02.
> May  9 10:24:05 conetapp01 heartbeat: [19538]: info: mach_down takeover
> complete.
> May  9 10:24:05 conetapp01 heartbeat: [19538]: debug:
> StartNextRemoteRscReq(): child count 1
> May  9 10:24:05 conetapp01 IPaddr[21563]: INFO:  Running OK
> May  9 10:24:05 conetapp01 heartbeat: [21486]: info: Local Resource
> acquisition completed.
> May  9 10:24:05 conetapp01 ipfail: [19615]: info: NS: We are dead. :<
> May  9 10:24:05 conetapp01 ipfail: [19615]: info: Status update: Node
> 192.168.111.8 now has status dead
> May  9 10:24:07 conetapp01 ipfail: [19615]: info: NS: We are dead. :<
> May  9 10:24:07 conetapp01 ipfail: [19615]: info: Link Status update:
> Link 192.168.111.1/192.168.111.1 now has status dead
> May  9 10:24:09 conetapp01 ipfail: [19615]: info: We are dead. :<
> May  9 10:24:09 conetapp01 ipfail: [19615]: info: Asking other side for
> ping node count.
> May  9 10:24:09 conetapp01 ipfail: [19615]: debug: Message [num_ping]
> sent.
> May  9 10:24:09 conetapp01 ipfail: [19615]: info: Link Status update:
> Link 192.168.111.8/192.168.111.8 now has status dead
> May  9 10:24:11 conetapp01 ipfail: [19615]: info: We are dead. :<
> May  9 10:24:11 conetapp01 ipfail: [19615]: info: Asking other side for
> ping node count.
> May  9 10:24:11 conetapp01 ipfail: [19615]: debug: Message [num_ping]
> sent.
> May  9 10:24:11 conetapp01 ipfail: [19615]: info: Status update: Node
> conetapp02 now has status dead
> May  9 10:24:13 conetapp01 ipfail: [19615]: info: NS: We are dead. :<
> May  9 10:24:13 conetapp01 ipfail: [19615]: info: Link Status update:
> Link conetapp02/eth0 now has status dead
> May  9 10:24:15 conetapp01 ipfail: [19615]: info: We are dead. :<
> May  9 10:24:15 conetapp01 ipfail: [19615]: info: Asking other side for
> ping node count.
> May  9 10:24:15 conetapp01 ipfail: [19615]: debug: Message [num_ping]
> sent.
> May  9 10:24:15 conetapp01 ipfail: [19615]: info: Link Status update:
> Link conetapp02/eth1 now has status dead
> May  9 10:24:17 conetapp01 ipfail: [19615]: info: We are dead. :<
> May  9 10:24:17 conetapp01 ipfail: [19615]: info: Asking other side for
> ping node count.
> May  9 10:24:17 conetapp01 ipfail: [19615]: debug: Message [num_ping]
> sent.
> May  9 10:24:51 conetapp01 drbd-peer-outdater: [21355]: WARN: error:
> could not connect to dopd after 60 seconds: timeout reached
> May  9 10:24:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> invoked: outdater
> May  9 10:24:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> Processed 0 messages
> May  9 10:24:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> destroying connection: r0
> May  9 10:24:51 conetapp01 /usr/lib/heartbeat/dopd: [19616]: debug:
> Deleting outdater (0x80522b0) from mainloop
> May  9 10:24:51 conetapp01 kernel: drbd0: outdate-peer helper returned 5
> May  9 10:24:51 conetapp01 kernel: drbd0: pdsk( DUnknown -> Outdated )
> May  9 10:24:51 conetapp01 kernel: drbd0: conn( NetworkFailure ->
> Unconnected )
> May  9 10:24:51 conetapp01 kernel: drbd0: receiver terminated
> May  9 10:24:51 conetapp01 kernel: drbd0: receiver (re)started
> May  9 10:24:51 conetapp01 kernel: drbd0: conn( Unconnected ->
> WFConnection )
> May  9 10:24:51 conetapp01 kernel: drbd0: Writing meta data super block
> now.
>  
> ~~~~~~~~~~~~~~~~~~~~~~~~~/var/log/messages from conetapp02
> ~~~~~~~~~~~~~~~~~~~~~
> 
> May  9 10:23:52 conetapp02 kernel: drbd0: PingAck did not arrive in
> time.
> May  9 10:23:52 conetapp02 kernel: drbd0: peer( Primary -> Unknown )
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> May  9 10:23:52 conetapp02 kernel: drbd0: asender terminated
> May  9 10:23:52 conetapp02 kernel: drbd0: Terminating asender thread
> May  9 10:23:52 conetapp02 kernel: drbd0: short read expecting header on
> sock: r=-512
> May  9 10:23:52 conetapp02 kernel: drbd0: Writing meta data super block
> now.
> May  9 10:23:52 conetapp02 kernel: drbd0: tl_clear()
> May  9 10:23:52 conetapp02 kernel: drbd0: Connection closed
> May  9 10:23:52 conetapp02 kernel: drbd0: conn( NetworkFailure ->
> Unconnected )
> May  9 10:23:52 conetapp02 kernel: drbd0: receiver terminated
> May  9 10:23:52 conetapp02 kernel: drbd0: receiver (re)started
> May  9 10:23:52 conetapp02 kernel: drbd0: conn( Unconnected ->
> WFConnection )
> May  9 10:24:00 conetapp02 sshd[15249]: Accepted publickey for root from
> 192.168.111.38 port 46930 ssh2
> May  9 10:24:00 conetapp02 sshd[15251]: Accepted publickey for root from
> 192.168.111.38 port 46932 ssh2
> May  9 10:24:00 conetapp02 sshd[15250]: Accepted publickey for root from
> 192.168.111.38 port 46931 ssh2
> May  9 10:24:05 conetapp02 heartbeat: [14098]: WARN: node conetapp01: is
> dead
> May  9 10:24:05 conetapp02 heartbeat: [14098]: WARN: No STONITH device
> configured.
> May  9 10:24:05 conetapp02 heartbeat: [14098]: WARN: Shared disks are
> not protected.
> May  9 10:24:05 conetapp02 heartbeat: [14098]: info: Resources being
> acquired from conetapp01.
> May  9 10:24:05 conetapp02 heartbeat: [14098]: info: Link
> conetapp01:eth0 dead.
> May  9 10:24:05 conetapp02 heartbeat: [14098]: info: Link
> conetapp01:eth1 dead.
> May  9 10:24:05 conetapp02 ipfail: [14130]: info: Status update: Node
> conetapp01 now has status dead
> May  9 10:24:05 conetapp02 heartbeat: [15371]: debug: notify_world:
> setting SIGCHLD Handler to SIG_DFL
> May  9 10:24:05 conetapp02 harc[15371]: info: Running
> /etc/ha.d/rc.d/status status
> May  9 10:24:05 conetapp02 heartbeat: [15372]: info: No local resources
> [/usr/share/heartbeat/ResourceManager listkeys conetapp02] to acquire.
> May  9 10:24:05 conetapp02 heartbeat: [14098]: debug:
> StartNextRemoteRscReq(): child count 1
> May  9 10:24:05 conetapp02 mach_down[15400]: info: Taking over resource
> group 192.168.111.34
> May  9 10:24:05 conetapp02 ResourceManager[15426]: info: Acquiring
> resource group: conetapp01 192.168.111.34 drbddisk::r0
> Filesystem::/dev/drbd0::/drbd::ext3 tomcat
> May  9 10:24:05 conetapp02 IPaddr[15453]: INFO:  Resource is stopped
> May  9 10:24:05 conetapp02 ResourceManager[15426]: info: Running
> /etc/ha.d/resource.d/IPaddr 192.168.111.34 start
> May  9 10:24:05 conetapp02 ResourceManager[15426]: debug: Starting
> /etc/ha.d/resource.d/IPaddr 192.168.111.34 start
> May  9 10:24:05 conetapp02 IPaddr[15529]: INFO: Using calculated nic for
> 192.168.111.34: eth0
> May  9 10:24:05 conetapp02 IPaddr[15529]: INFO: Using calculated netmask
> for 192.168.111.34: 255.255.255.0
> May  9 10:24:05 conetapp02 IPaddr[15529]: DEBUG: Using calculated
> broadcast for 192.168.111.34: 192.168.111.255
> May  9 10:24:05 conetapp02 IPaddr[15529]: INFO: eval ifconfig eth0:0
> 192.168.111.34 netmask 255.255.255.0 broadcast 192.168.111.255
> May  9 10:24:05 conetapp02 IPaddr[15529]: DEBUG: Sending Gratuitous Arp
> for 192.168.111.34 on eth0:0 [eth0]
> May  9 10:24:05 conetapp02 IPaddr[15512]: INFO:  Success
> May  9 10:24:05 conetapp02 ResourceManager[15426]: debug:
> /etc/ha.d/resource.d/IPaddr 192.168.111.34 start done. RC=0
> May  9 10:24:05 conetapp02 ResourceManager[15426]: info: Running
> /etc/ha.d/resource.d/drbddisk r0 start
> May  9 10:24:05 conetapp02 ResourceManager[15426]: debug: Starting
> /etc/ha.d/resource.d/drbddisk r0 start
> May  9 10:24:05 conetapp02 kernel: drbd0: helper command: /sbin/drbdadm
> outdate-peer
> May  9 10:24:05 conetapp02 drbd-peer-outdater: [15670]: debug: drbd
> peer: conetapp01
> May  9 10:24:05 conetapp02 drbd-peer-outdater: [15670]: debug: drbd
> resource: r0
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> Connecting channel
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> Client outdater (0x80522b0) connected
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> invoked: outdater
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> Processing msg from outdater
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug: Got
> message from (drbd-peer-outdater). (peer: conetapp01, res :r0)
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> Starting node walk
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: WARN:
> Cluster node: conetapp01: status: dead
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> Processed 1 messages
> May  9 10:24:05 conetapp02 drbd-peer-outdater: [15670]: debug: message:
> outdater_rc, conetapp02
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> invoked: outdater
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> Processed 0 messages
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> destroying connection: (null)
> May  9 10:24:05 conetapp02 /usr/lib/heartbeat/dopd: [14131]: debug:
> Deleting outdater (0x80522b0) from mainloop
> May  9 10:24:05 conetapp02 ipfail: [14130]: debug: Found ping node
> 192.168.111.8!
> May  9 10:24:05 conetapp02 kernel: drbd0: outdate-peer helper returned 5
> May  9 10:24:05 conetapp02 kernel: drbd0: role( Secondary -> Primary )
> pdsk( DUnknown -> Outdated )
> May  9 10:24:05 conetapp02 kernel: drbd0: Creating new current UUID
> May  9 10:24:05 conetapp02 kernel: drbd0: Writing meta data super block
> now.
> May  9 10:24:05 conetapp02 ResourceManager[15426]: debug:
> /etc/ha.d/resource.d/drbddisk r0 start done. RC=0
> May  9 10:24:05 conetapp02 Filesystem[15685]: INFO:  Resource is stopped
> May  9 10:24:05 conetapp02 ResourceManager[15426]: info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext3 start
> May  9 10:24:05 conetapp02 ResourceManager[15426]: debug: Starting
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext3 start
> May  9 10:24:05 conetapp02 Filesystem[15766]: INFO: Running start for
> /dev/drbd0 on /drbd
> May  9 10:24:05 conetapp02 kernel: (fs/jbd/recovery.c, 255):
> journal_recover: JBD: recovery, exit status 0, recovered transactions
> 156112 to 156124
> May  9 10:24:05 conetapp02 kernel: (fs/jbd/recovery.c, 257):
> journal_recover: JBD: Replayed 107 and revoked 0/0 blocks
> May  9 10:24:05 conetapp02 kernel: kjournald starting.  Commit interval
> 5 seconds
> May  9 10:24:05 conetapp02 kernel: EXT3 FS on drbd0, internal journal
> May  9 10:24:05 conetapp02 kernel: EXT3-fs: recovery complete.
> May  9 10:24:05 conetapp02 kernel: EXT3-fs: mounted filesystem with
> ordered data mode.
> May  9 10:24:05 conetapp02 Filesystem[15755]: INFO:  Success
> May  9 10:24:05 conetapp02 ResourceManager[15426]: debug:
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext3 start done. RC=0
> May  9 10:24:05 conetapp02 ResourceManager[15426]: info: Running
> /etc/ha.d/resource.d/tomcat  start
> May  9 10:24:05 conetapp02 ResourceManager[15426]: debug: Starting
> /etc/ha.d/resource.d/tomcat  start
> May  9 10:24:06 conetapp02 ResourceManager[15426]: debug:
> /etc/ha.d/resource.d/tomcat  start done. RC=0
> May  9 10:24:06 conetapp02 mach_down[15400]: info:
> /usr/share/heartbeat/mach_down: nice_failback: foreign resources
> acquired
> May  9 10:24:06 conetapp02 mach_down[15400]: info: mach_down takeover
> complete for node conetapp01.
> May  9 10:24:06 conetapp02 heartbeat: [14098]: info: mach_down takeover
> complete.
> May  9 10:24:06 conetapp02 ipfail: [14130]: debug: Found ping node
> 192.168.111.1!
> May  9 10:24:06 conetapp02 ipfail: [14130]: info: NS: We are still
> alive!
> May  9 10:24:06 conetapp02 ipfail: [14130]: info: Link Status update:
> Link conetapp01/eth0 now has status dead
> May  9 10:24:07 conetapp02 ipfail: [14130]: debug: Found ping node
> 192.168.111.8!
> May  9 10:24:08 conetapp02 ipfail: [14130]: debug: Found ping node
> 192.168.111.1!
> May  9 10:24:08 conetapp02 sshd[15913]: Accepted publickey for root from
> 192.168.111.38 port 46948 ssh2
> May  9 10:24:09 conetapp02 ipfail: [14130]: info: Asking other side for
> ping node count.
> May  9 10:24:09 conetapp02 ipfail: [14130]: debug: Message [num_ping]
> sent.
> May  9 10:24:09 conetapp02 ipfail: [14130]: info: Checking remote count
> of ping nodes.
> May  9 10:24:09 conetapp02 ipfail: [14130]: info: Link Status update:
> Link conetapp01/eth1 now has status dead
> May  9 10:24:09 conetapp02 ipfail: [14130]: debug: Found ping node
> 192.168.111.8!
> May  9 10:24:10 conetapp02 ipfail: [14130]: debug: Found ping node
> 192.168.111.1!
> May  9 10:24:11 conetapp02 kernel: JBD: barrier-based sync failed on
> drbd0 - disabling barriers
> May  9 10:24:11 conetapp02 ipfail: [14130]: info: Asking other side for
> ping node count.
> May  9 10:24:11 conetapp02 ipfail: [14130]: debug: Message [num_ping]
> sent.
> May  9 10:24:11 conetapp02 ipfail: [14130]: info: Checking remote count
> of ping nodes.
> 
> 
> Here's our configuration:
> 
> ~~~~~~~~~~~~~~~~~~~/etc/ha.d/ha.cf on conetapp01
> (active)~~~~~~~~~~~~~~~~~~~~~~
> udpport 694
> ucast eth0 192.168.111.36
> ucast eth1 192.168.1.22
> keepalive 2
> warntime 5
> deadtime 20
> initdead 60
> node conetapp01
> node conetapp02
> auto_failback on
> ping 192.168.111.1 192.168.111.8
> respawn hacluster /usr/lib/heartbeat/ipfail
> respawn hacluster /usr/lib/heartbeat/dopd
> apiauth dopd gid=haclient uid=hacluster
> logfacility daemon
> 
> ~~~~~~~~~~~~~~~~~~~/etc/ha.d/ha.cf on conetapp01
> (active)~~~~~~~~~~~~~~~~~~~~~~
> udpport 694
> ucast eth0 192.168.111.35
> ucast eth1 192.168.1.21
> keepalive 2
> warntime 5
> deadtime 20
> initdead 60
> node conetapp01
> node conetapp02
> auto_failback on
> ping 192.168.111.1 192.168.111.8
> respawn hacluster /usr/lib/heartbeat/ipfail
> respawn hacluster /usr/lib/heartbeat/dopd
> apiauth dopd gid=haclient uid=hacluster
> logfacility daemon
> 
> ~~~~~~~~~~~~~~~~/etc/ha.d/haressources on both
> nodes~~~~~~~~~~~~~~~~~~~~~~~~~
> conetapp01 192.168.111.34 drbddisk::r0
> Filesystem::/dev/drbd0::/drbd::ext3 tomcat
> 
> ~~~~~~~~~~~~~~~~  /etc/drbd.conf on both nodes
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> global {
>         usage-count no;
> }
> common {
>         syncer { rate 100M; }
> }
> resource r0 {
>         protocol C;
>         handlers {
>                 outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater";
>         }
>         startup {
>                 wfc-timeout  0;
>                 degr-wfc-timeout 120;    # 2 minutes.
>         }
>         disk {
>                 on-io-error   detach;
>                 fencing resource-only;
>         }
>         net {
>                 timeout         50;
>                 connect-int     10;
>                 ping-int        10;
>                 max-buffers     2048;
>                 max-epoch-size  2048;
>                 ko-count        0;
>                 after-sb-0pri discard-least-changes;
>                 after-sb-1pri consensus;
>                 after-sb-2pri disconnect;
>                 rr-conflict disconnect;
>         }
>         syncer {
>                 rate 100M;
>         }
>         on conetapp01 {
>                 device     /dev/drbd0;
>                 disk      /dev/sda3;
>                 address   192.168.1.21:7789;
>                 meta-disk internal;
>         }
>         on conetapp02 {
>                 device    /dev/drbd0;
>                 disk      /dev/sda3;
>                 address   192.168.1.22:7789;
>                 meta-disk internal;
>       }
> }     
> 
> -- 
> CONET Solutions GmbH
> Andreas Stallmann, Senior Berater
> 
> 
> -----------------------------------
> CONET Solutions GmbH, Theodor-Heuss-Allee 19, 53773 Hennef
> Registergericht/Registration Court: Amtsgericht Siegburg (HRB Nr. 9136)
> Geschaftsfuhrer/Managing Directors: Dipl.-Inform. Rudiger Zeyen 
> (Sprecher/Chairman), 
> Dipl.-Betriebsw. Wilfried Putz und Dipl.-Inform. Jurgen Zender 
> Vorsitzender des Aufsichtsrates/Chairman of the Supervisory Board: 
> Dipl.-Math. Hans-Jurgen Niemeier
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Still problems with split brain

Reply via email to