Re: [DRBD-user] Default Split Brain Behaviour

Lew Wed, 26 Jan 2011 17:33:35 -0800

Thanks again Felix,

> > common {
> >     protocol A;
> >
> >     handlers {
> >             pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh";
> >             pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh";
> >             echo o > /proc/sysrq-trigger ; halt -f";
> 
> The above looks..."funny" to me. What's wrong here? Copy/Paste error?
> 
> Did you modify any notify-* scripts?
Ah, I see what you mean; just a cut paste error I missed (apologies, a stupid 
mistake); it should have read...
common {
        protocol A;


        handlers {
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh";
                local-io-error "/usr/lib/drbd/notify-io-error.sh";
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                        }
I think from memory, that I hashed the original line in the default global 
config ...
#local-io-error "/usr/lib/drbd/notify-io-error.sh; 
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt 
-f";
and replaced it with the line as seen above ...

local-io-error "/usr/lib/drbd/notify-io-error.sh";

I didn't want any situations where an extreme load induced io-error would 
generate an emergency shutdown, as it's a virtualization server.
I did want to be notified though.

date stamps on the notify-* scripts are all uniform (predating the system 
build) & I don't recall modifying them at all.

>From the logs, I'm curious about the lines...
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044910] block drbd9: 0 KB (0 bits) 
marked out-of-sync by on disk bit-map.
Jan 23 15:07:16 emlsurit-v4 kernel: [   15.044929] block drbd9: Marked 
additional 508 MB as out-of-sync based on AL.

...then a little further down
Jan 23 15:53:01 emlsurit-v4 kernel: [ 2756.121108] block drbd9: role( Secondary 
-> Primary )

These errors where only seen on the rebooted node that was primary. The log 
entries for the two nodes where ostensibly the same other than this. 

This node was always primary and the KVM virtual machine running off it, does 
not even exist on the other node; yet it has reversed the resource roles 
(primary vs secondary).

The nodes of the resource where in a disconnected state prior to the reboot of 
the primary node.
The secondary (disconnected) node remained on and the there is no HA setup 
associated with either node on any resource.

I did note a clock skew of 3 minutes between the nodes, due to an incorrect ntp 
source.

On both nodes, I also noticed ... block drbd9: helper command: /sbin/drbdadm 
split-brain minor-9 exit code 127 (0x7f00)

Somehow the (508 Mb?) data has rolled back, & while I'm sad I've likely lost 
the data, I can't afford to release this system to production until I'm content 
it won't happen again.

The userland tools are ...

drbdadm --version
DRBDADM_BUILDTAG=GIT-hash:\ ea9e28dbff98e331a62bcbcc63a6135808fe2917\ build\ 
by\ buildd@yellow\,\ 2010-06-01\ 11:06:12
DRBDADM_API_VERSION=88
DRBD_KERNEL_VERSION_CODE=0x080307
DRBDADM_VERSION_CODE=0x080307
DRBDADM_VERSION=8.3.7

Any assistance to help me dig a little deeper here, will be greatly appreciated.

Cheers,

Lew






 
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Default Split Brain Behaviour

Reply via email to