On Fri, Dec 16, 2011 at 01:31:32PM +0100, Ulrich Windl wrote:
> Hi!
>
> I have some troubel with OCFS on top of DRBD that seems to be timing-related:
> OCFS is working on the DRBD when DRBD itself wants to vhange something it
> seems:
>
> ...
> Dec 16 11:39:58 h06 kernel: [ 122.426174] block drbd0: role( Secondary ->
> Primary )
> Dec 16 11:39:58 h06 multipathd: drbd0: update path write_protect to '0'
> (uevent)
> Dec 16 11:40:29 h06 ocfs2_controld: start_mount: uuid
> "FD32E504527742CEA7DA6DB272D5D7B2", device "/dev/drbd_r0", service "ocfs2"
> ...
> Dec 16 11:40:29 h06 kernel: [ 152.837615] block drbd0: peer( Secondary ->
> Primary )
> Dec 16 11:40:29 h06 ocfs2_hb_ctl[19177]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -P
> -d /dev/drbd_r0
> Dec 16 11:43:50 h06 kernel: [ 354.559240] block drbd0: State change failed:
> Device is held open by someone
> Dec 16 11:43:50 h06 kernel: [ 354.559244] block drbd0: state = {
> cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate r----- }
> Dec 16 11:43:50 h06 kernel: [ 354.559246] block drbd0: wanted = {
> cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate r----- }
> Dec 16 11:43:50 h06 drbd[28754]: [28786]: ERROR: r0: Called drbdadm -c
> /etc/drbd.conf secondary r0
The resource agent was told to demote.
That fails, as DRBD is still/already in use (by ocfs2 or other).
> Dec 16 11:43:50 h06 drbd[28754]: [28789]: ERROR: r0: Exit code 11
>
> A little bit later DRBD did it's own fencing (the machine rebooted)
I very much doubt that. At least, from the above log excerpt,
I can not imagine a scenario for any of the below cited handlers to trigger,
unless you throw multiple failures in the mix.
But you apparently get a "demote failure", and possibly then a "stop
failure" as well, which may trigger a stonith event.
Or maybe IO is blocked for "too long" so OCFS2 decides to self-fence.
Guess you have to improve your logging.
> Is there a way to let the cluster do the fencing instead of writing to
> sysctl? Those handlers are used:
> handlers {
> pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
> pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ;
> reboot -f";
> local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ;
> halt -f";
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems