Re: [Linux-HA] Cluster node hanging upon access to ocfs2 fs when second cluster node dies ?

Lars Marowsky-Bree Tue, 03 Apr 2012 08:10:59 -0700

On 2012-04-03T15:59:00, Rainer Krienke <[email protected]> wrote:


> Hi Lars,
> 
> this was something I detected already. And I changed the timeout in the
> cluster configuration to 200sec. So the log I posted was the result of
> the configuration below (200sec). Is this still to small?
> 
> $ crm configure show
> ...
> primitive stonith_sbd stonith:external/sbd \
>         op monitor interval="200" timeout="200" start-delay="200" \
>         params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"

This is not what I meant. I meant to change the setting stonith-timeout,
not the settings on the primitive ;-) In fact, monitoring sbd is quite
unnecessary, and you actually don't need to specify sbd_device anymore,
you can just do:

primitive stonith_sbd stonith:external/sbd

and leave it at this. But, back to your timeout! Run this:

crm configure property stonith-timeout=240s

(And yes, it needs to be over 10% higher than the msgwait timeout,
because of how stonith-ng internally allocates the stonith-timeout value
to various stages in the stonith process. Sorry about that, that's a
pacemaker issue.)

You will still see IO freeze for approx. 3 minutes until the fence
completes. That's a side-effect of the sbd values you have configured,
in particular watchdog and msgwait.


Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Cluster node hanging upon access to ocfs2 fs when second cluster node dies ?

Reply via email to