Hi! MHO: The correct time to wait is in an interval bounded by these two values: 1: An I/O delay that may occur during normal operation that is never allowed to trigger fencing 2: The maximum value to are willing to accept to wait for fencing to occur
Many people thing making 1 close to zero and 2 as small as possible is the best solution. But imagine one of your SBD disks has some read problem, and the operation has be be retried a few times. Or think about "online" upgrading your disk firmware, etc.: Usually I/Os are stopped for a short time (typically less than one minute). So once you have determined you timeout value for your environment, you can configure SBD. We have a rather long timeout, so SBD fencing can take some time. That means usually fencing takes place in a few seconds, but the cluster waits the longer time to make sure the node must have processed the SBD fencing command (fencing is not confirmed at the SBD level: You send the fencing command on SBD, then you expect that every node reads the command after some delay (and thus performs the command). Unfortunately the SBD syntax is a real mess, and there is not manual page (AFAIK) for SBD. YOu can change the SBD parameters (on disk) online, but to be effective, the SBD daemon has to be restarted. I hope this helps. Regards, Ulrich >>> Muhammad Sharfuddin <m.sharfud...@nds.com.pk> schrieb am 15.01.2015 um >>> 16:33 in Nachricht <54b7ddd2.3000...@nds.com.pk>: > I have to put this 2 node active/passive cluster in production very soon > and I have tested the resource migration > works perfectly in case of the node running the resource goes > down(abruptly/forcefully). > > I have always read and heard to increase msgwait and watchdog timeout > when sbd is a multipath disk, but in my case > I have just created the disk via > sbd -d /dev/mapper/mpathe create > > and I have following resource for sbd > primitive sbd_stonith stonith:external/sbd \ > op monitor interval="3000" timeout="120" start-delay="21" \ > op start interval="0" timeout="120" \ > op stop interval="0" timeout="120" \ > params sbd_device="/dev/mapper/mpathe" > > as of now I am quite satisfied, but should I increase the msgwait and > watchdog timeouts ? > > also I am using the start-delay=21 for "op monitor interval" should I > also use the start-delay=11 for "op start interval" > > Please recommend > > -- > Regards, > > Muhammad Sharfuddin > Cell: +92-3332144823 | UAN: +92(21) 111-111-142 ext: 113 | NDS.COM.PK > <http://www.nds.com.pk> > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems