Am 03.04.2012 17:06, schrieb Lars Marowsky-Bree: > On 2012-04-03T15:59:00, Rainer Krienke <[email protected]> wrote: > >> Hi Lars, >> >> this was something I detected already. And I changed the timeout in the >> cluster configuration to 200sec. So the log I posted was the result of >> the configuration below (200sec). Is this still to small? >> >> $ crm configure show >> ... >> primitive stonith_sbd stonith:external/sbd \ >> op monitor interval="200" timeout="200" start-delay="200" \ >> params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1" > > This is not what I meant. I meant to change the setting stonith-timeout, > not the settings on the primitive ;-) In fact, monitoring sbd is quite > unnecessary, and you actually don't need to specify sbd_device anymore, > you can just do: > > primitive stonith_sbd stonith:external/sbd > > and leave it at this. But, back to your timeout! Run this: > > crm configure property stonith-timeout=240s > > (And yes, it needs to be over 10% higher than the msgwait timeout, > because of how stonith-ng internally allocates the stonith-timeout value > to various stages in the stonith process. Sorry about that, that's a > pacemaker issue.) > > You will still see IO freeze for approx. 3 minutes until the fence > completes. That's a side-effect of the sbd values you have configured, > in particular watchdog and msgwait.
Hi Lars, thanks a lot for finding the problem. The wrong set timeout value was really the causing the trouble. Now it works. I lowered the timeout values to avoid to long freezing of the clustered filesystem and it works fine. There is one basic thing however I do not understand: My setup involves only a clustered filesystem. What I do not understand is why a stonith resource is needed at all in this case which after all causes freezes of the cl-filesystem depending on the timeout values. Basically in a cluster fs it should be not important if a node dies. Its the nature of a cluster fs that many nodes can acces it. If one dies this is of no meaning to the other nodes that still can access the filesystem. So my question comes down to this: Why do I have to fence a node (in case it failes) in a cluster that has nothing else but a cluster filesystem. What could go wrong without fencing in this case? Thanks a lot Rainer -- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1 56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312 PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287 1001312 _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
