Re: [Linux-HA] Cluster node hanging upon access to ocfs2 fs when second cluster node dies ?

Rainer Krienke Wed, 04 Apr 2012 02:28:48 -0700

Am 03.04.2012 17:06, schrieb Lars Marowsky-Bree:
> On 2012-04-03T15:59:00, Rainer Krienke <[email protected]> wrote:
> 
>> Hi Lars,
>>
>> this was something I detected already. And I changed the timeout in the
>> cluster configuration to 200sec. So the log I posted was the result of
>> the configuration below (200sec). Is this still to small?
>>
>> $ crm configure show
>> ...
>> primitive stonith_sbd stonith:external/sbd \
>>         op monitor interval="200" timeout="200" start-delay="200" \
>>         params sbd_device="/dev/disk/by-id/scsi-259316a7265713551-part1"
> 
> This is not what I meant. I meant to change the setting stonith-timeout,
> not the settings on the primitive ;-) In fact, monitoring sbd is quite
> unnecessary, and you actually don't need to specify sbd_device anymore,
> you can just do:
> 
> primitive stonith_sbd stonith:external/sbd
> 
> and leave it at this. But, back to your timeout! Run this:
> 
> crm configure property stonith-timeout=240s
> 
> (And yes, it needs to be over 10% higher than the msgwait timeout,
> because of how stonith-ng internally allocates the stonith-timeout value
> to various stages in the stonith process. Sorry about that, that's a
> pacemaker issue.)
> 
> You will still see IO freeze for approx. 3 minutes until the fence
> completes. That's a side-effect of the sbd values you have configured,
> in particular watchdog and msgwait.


Hi Lars,

thanks a lot for finding the problem. The wrong set timeout value was
really the causing the trouble. Now it works. I lowered the timeout
values to avoid to long freezing of the clustered filesystem and it
works fine.

There is one basic thing however I do not understand: My setup involves
only a clustered filesystem. What I do not understand is why a stonith
resource is needed at all in this case which after all causes  freezes
of the cl-filesystem depending on the timeout values.

Basically in a cluster fs it should be not important if a node dies. Its
the nature of a cluster fs that many nodes can acces it. If one dies
this is of no meaning to the other nodes that still can access the
filesystem.

So my question comes down to this: Why do I have to fence a node (in
case it failes) in a cluster that has nothing else but a cluster
filesystem. What could go wrong without fencing in this case?

Thanks a lot
Rainer
-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287
1001312
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Cluster node hanging upon access to ocfs2 fs when second cluster node dies ?

Reply via email to