I Was following a few tutorials and had everything ready to test. The
tutorial said to yank the power cord but my sysadmin did not want me
to do that so I simply pulled out the ethernet cables (one was to a
switched network the other was a crossover between the two nodes for
heartbeats). Server 2 did not take over ( I had an init script error
that I forgot to fix). I plugged the cables back in and restarted
both machines. A cat of /proc/drbd displayed this:
GIT-hash: bd3e2c922f95c4fa0dca57a4f8c24bf8b249cc02 build by
[EMAIL PROTECTED], 2008-02-01 07:33:35
0: cs:StandAlone st:Primary/Unknown ds:UpToDate/DUnknown r---
ns:0 nr:0 dw:4 dr:249 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
act_log: used:0/257 hits:1 misses:0 starving:0 dirty:0 changed:0
I looked in dmesg and saw this:
-----------------------------------------------------------------------------------------------------
drbd: initialised. Version: 8.0.8 (api:86/proto:86)
drbd: GIT-hash: bd3e2c922f95c4fa0dca57a4f8c24bf8b249cc02 build by
[EMAIL PROTECTED], 2008-02-01 07:33:35
drbd: registered as block device major 147
drbd: minor_table @ 0xffff810073c19680
drbd0: disk( Diskless -> Attaching )
drbd0: Found 6 transactions (276 active extents) in activity log.
drbd0: max_segment_size ( = BIO size ) = 32768
drbd0: drbd_bm_resize called with capacity == 1953042632
drbd0: resync bitmap: bits=244130329 words=3814537
drbd0: size = 931 GB (976521316 KB)
drbd0: reading of bitmap took 580 jiffies
drbd0: recounting of set bits took additional 28 jiffies
drbd0: 4 KB (1 bits) marked out-of-sync by on disk bit-map.
drbd0: disk( Attaching -> UpToDate )
drbd0: Writing meta data super block now.
drbd0: conn( StandAlone -> Unconnected )
drbd0: receiver (re)started
drbd0: conn( Unconnected -> WFConnection )
drbd0: Handshake successful: DRBD Network Protocol version 86
drbd0: conn( WFConnection -> WFReportParams )
drbd0: Split-Brain detected, dropping connection!
drbd0: self 6A01F5A6BB510A26:19D4D8C195E805A7:483E5FB7A5527AAD:0000000000000004
drbd0: peer 9C679A93F13525CE:19D4D8C195E805A6:483E5FB7A5527AAC:0000000000000004
drbd0: conn( WFReportParams -> Disconnecting )
drbd0: helper command: /sbin/drbdadm split-brain
drbd0: error receiving ReportState, l: 4!
drbd0: asender terminated
drbd0: tl_clear()
drbd0: Connection closed
drbd0: conn( Disconnecting -> StandAlone )
drbd0: receiver terminated
drbd0: role( Secondary -> Primary )
drbd0: Writing meta data super block now.
kjournald starting. Commit interval 5 seconds
EXT3 FS on drbd0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
-------------------------------------------------------------------------
So I searched in the mailing list for split-brain and many posts I
find say that doing what I did (yanking both cables) will cause a
split-brain WTF ??
I am using drbd 8.0.8, heartbeat 2.1.3_3 version 1 haresources style.
I am really confused. I am following a tutorial and I go right into a
split brain. I can't see how it would have been any different if I
yanked the power cord versus yanking the cables. I thought this is
what heartbeat was supposed to handle?
How do I recover? No data was lost as I was just doing an initial test.
What is it I need to do to prevent a split-brain from happening again?
Is there a good place to go to read about avoiding this situation?
Much of the info I have found jumps right in as if you are already a
master of this stuff.
In case you need them my configs follow:
regards,
Douglas Lochart
haresources:
capestor1 IPaddr::10.3.120.140/24/eth0 drbddisk::r0
Filesystem::/dev/drbd0::/capestor::ext3 capestor-server
drbd.conf
-------------
global {
usage-count yes;
}
common {
syncer { rate 10M; }
}
resource r0 {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
# these were commented out in the examples
# outdate-peer "/usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11
192.168.23.11 on alf 192.168.22.12 192.168.23.12";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
#pri-lost "echo pri-lost. Have a look at the log files. | mail -s
'DRBD Alert' root";
# Notify someone in case DRBD split brained.
#split-brain "echo split-brain. drbdadm -- --discard-my-data
connect $DRBD_RESOURCE ? | mail -s 'DRBD Alert' root";
}
startup {
degr-wfc-timeout 120; # 2 minutes.
# become-primary-on both;
}
disk {
on-io-error detach;
# fencing resource-only;
# size 10G;
}
net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 10M;
# should tis be 263168 for one terra byte?
al-extents 257;
}
##################################################################
# Setup capestor1
##################################################################
on capestor1 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.3.120.134:7788;
flexible-meta-disk internal;
}
on capestor2 {
device /dev/drbd0;
disk /dev/sdb1;
address 10.3.120.135:7788;
meta-disk internal;
}
}
--
What profits a man if he gains the whole world yet loses his soul?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems