Re: [DRBD-user] Failover Behavior in Server-Crash Scenario

Robinson, Eric Thu, 06 Dec 2012 15:54:22 -0800

> >> Any concurrent log entries in your kernel log, from the 
> drbd0 device?
> >>
> > 
> > 
> > In fact, there are...
> > 
> > Dec  6 13:51:17 ha09a kernel: d-con ha02_mysql: conn( 
> Unconnected -> 
> > WFConnection ) Dec  6 13:51:19 ha09a root: drbd SA notify
> > Dec  6 13:51:19 ha09a crm_node[25546]:   notice: 
> crm_add_logfile: Additional logging available in /var/log/corosync.log
> > Dec  6 13:51:19 ha09a crm_attribute[25547]:   notice: 
> crm_add_logfile: Additional logging available in /var/log/corosync.log
> > Dec  6 13:51:20 ha09a root: drbd SA notify
> > Dec  6 13:51:20 ha09a crm_node[25577]:   notice: 
> crm_add_logfile: Additional logging available in /var/log/corosync.log
> > Dec  6 13:51:20 ha09a crm_attribute[25578]:   notice: 
> crm_add_logfile: Additional logging available in /var/log/corosync.log
> > Dec  6 13:51:21 ha09a crmd[3066]:   notice: 
> process_lrm_event: LRM operation p_drbd0_notify_0 (call=500, 
> rc=0, cib-update=0, confirmed=true) ok
> > Dec  6 13:51:21 ha09a crmd[3066]:   notice: 
> process_lrm_event: LRM operation p_drbd1_notify_0 (call=502, 
> rc=0, cib-update=0, confirmed=true) ok
> > Dec  6 13:51:22 ha09a root: drbd SA notify Dec  6 13:51:23 
> ha09a root: 
> > drbd SA notify
> > Dec  6 13:51:24 ha09a crmd[3066]:   notice: 
> process_lrm_event: LRM operation p_drbd0_notify_0 (call=506, 
> rc=0, cib-update=0, confirmed=true) ok
> > Dec  6 13:51:24 ha09a crmd[3066]:   notice: 
> process_lrm_event: LRM operation p_drbd1_notify_0 (call=508, 
> rc=0, cib-update=0, confirmed=true) ok
> > Dec  6 13:51:25 ha09a root: drbd SA promote Dec  6 13:51:25 ha09a 
> > kernel: d-con ha01_mysql: helper command: /sbin/drbdadm fence-peer 
> > ha01_mysql Dec  6 13:51:25 ha09a kernel: d-con ha01_mysql: helper 
> > command: /sbin/drbdadm fence-peer ha01_mysql exit code 127 (0x7f00) 
> > Dec  6 13:51:25 ha09a kernel: d-con ha01_mysql: fence-peer helper 
> > broken, returned 127
> 
> Your DRBD refuses to promote because it's unable to get a 
> meaningful response from the fence-peer handler. That in turn 
> is because it's failing with a "command not found" error. 
> (Try typing "foobarblatch; echo $?" in a shell.) Check your 
> "fence-peer" setting in the handlers section of your DRBD 
> config, and see whether it points to a non-existing script. 
> If that script does exist, examine whether it _invokes_ 
> something that doesn't.
> 
> Cheers,
> Florian
>



It turns out that the fence-peer handler script does not exist. This is 
certainly because I copied the drbd.conf file from a preious cluster running 
drbd 8.3.12. 

I am now sure that there are other problems in the config file waiting to bite 
me. Following is what my drbd.conf file looks like. Please tell tell me if you 
see anywhere ELSE that I have shot myself in the foot.



# drbd.conf

global {
    usage-count no;
}

common {
  syncer {
    verify-alg sha1;
    rate 30M;
    al-extents 3389;
  }
}

resource ha01_mysql {
  protocol C;
  handlers {
    pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot 
-f";
    pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot 
-f";
    local-io-error "/usr/lib/drbd/notify-io-error.sh; 
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt 
-f";
    out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
    # pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    # pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
    # local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
    # outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
    # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    # after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
    #pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD 
Alert' root";
    # split-brain "echo split-brain. drbdadm -- --discard-my-data connect 
$DRBD_RESOURCE ? | mail -s 'DRBD Alert' [email protected]";
    #out-of-sync "echo out-of-sync. drbdadm down $DRBD_RESOURCE. drbdadm ::::0 
set-gi $DRBD_RESOURCE. drbdadm up $DRBD_RESOURCE. | mail -s 'DRBD Alert' root";
  }

  startup {
    wfc-timeout  0;          # infinite
    degr-wfc-timeout 120;    # 2 minutes.
  }

  disk {
    on-io-error   detach;
    fencing resource-only;
  }
  net {
    cram-hmac-alg "sha1";
    shared-secret "removed";
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }
  on ha09a {
    device     /dev/drbd0;
    disk       /dev/vg00/lv00;
    address    198.51.100.58:7788;
    meta-disk  internal;
  }
  on ha09b {
    device     /dev/drbd0;
    disk       /dev/vg00/lv00;
    address   198.51.100.59:7788;
    meta-disk  internal;
  }
}

resource ha02_mysql {
  protocol C;
  handlers {
    pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot 
-f";
    pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot 
-f";
    local-io-error "/usr/lib/drbd/notify-io-error.sh; 
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt 
-f";
    out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
    split-brain "/usr/lib/drbd/notify-split-brain.sh root";
    fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
    # pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
    # pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
    # local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
    # outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
    # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
    # after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
    #pri-lost "echo pri-lost. Have a look at the log files. | mail -s 'DRBD 
Alert' root";
    # split-brain "echo split-brain. drbdadm -- --discard-my-data connect 
$DRBD_RESOURCE ? | mail -s 'DRBD Alert' [email protected]";
    #out-of-sync "echo out-of-sync. drbdadm down $DRBD_RESOURCE. drbdadm ::::0 
set-gi $DRBD_RESOURCE. drbdadm up $DRBD_RESOURCE. | mail -s 'DRBD Alert' root";
  }

  startup {
    wfc-timeout  0;          # infinite
    degr-wfc-timeout 120;    # 2 minutes.
  }

  disk {
    on-io-error   detach;
    fencing resource-only;
  }
  net {
    cram-hmac-alg "sha1";
    shared-secret "removed";
    after-sb-0pri disconnect;
    after-sb-1pri disconnect;
    after-sb-2pri disconnect;
    rr-conflict disconnect;
  }
  on ha09a {
    device     /dev/drbd1;
    disk       /dev/vg00/lv01;
    address    198.51.100.58:7789;
    meta-disk  internal;
  }
  on ha09b {
    device     /dev/drbd1;
    disk       /dev/vg00/lv01;
    address   198.51.100.59:7789;
    meta-disk  internal;
  }
}



Disclaimer - December 6, 2012 
This email and any files transmitted with it are confidential and intended 
solely for Florian Haas,[email protected]. If you are not the named 
addressee you should not disseminate, distribute, copy or alter this email. Any 
views or opinions presented in this email are solely those of the author and 
might not represent those of Physicians' Managed Care or Physician Select 
Management. Warning: Although Physicians' Managed Care or Physician Select 
Management has taken reasonable precautions to ensure no viruses are present in 
this email, the company cannot accept responsibility for any loss or damage 
arising from the use of this email or attachments. 
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Failover Behavior in Server-Crash Scenario

Reply via email to