Hi,
Dejan Muhamedagic wrote:
Hi,
On Wed, Mar 26, 2008 at 02:38:00PM +0100, Danny Sternkopf wrote:
Hi,
we have a simple config: (hav2)
- 2 nodes <active-active>
- 6 FILESYSTEM resources as one group + 1 Stonith resource on each node
(Use contraints to score them)
- Default Resource stickiness is 0
- Default Resource failover stickiness is 0
As we have seen in past is that if a filesystem has failed the whole
group is moved to the other node and the failing node is stonith'ed due
to the filesystem could not be unmounted properly.
But this filesystem could not be mounted on both nodes anymore. So the
group was moved from one node to another and nodes got reset all the
time.
If it was not possible to mount the filesystem then how/why did
the cluster try to unmount it? Also, if the filesystem's not
mounted then the stop operation should've succeeded. Or did you
see different behaviour?
The Filesystem was mounted fine. But then an problem occurred, let's
assume the device is gone due to a HW issue. So the monitor will
detected it and initiate a movement to the other node. Umount fails on
the current node and maint fails on the new node so to say.
In our case the device was still there, but the mount was stucking. The
operation timed out. Filesystem stop always failed, even if the
filesystem was mounted may be due to the stucking mount command.
I've to check the Fileystem script next time.
Start and Stop of the Filesystem always ended up with a Timeout
(of 120 s).
How can we treat that issue within HA? What happens with a resource
which can not run anymore? When will HA give up to run it?
Depends on the start-failure-is-fatal crm_config parameter. If
it's set to true, the CRM should give up after the first failed
start operation. Of course, in case the machine is rebooted it
will try again.
Yes, exactly it doesn't help.
Meanwhile we implemented a function in our stonith script to check when
was the last stonith operation. If it was not longer that 15 minutes ago
we will set the failing node to stand-by and perform the stonith reset then.
Best regards,
Danny
--
Danny Sternkopf http://www.nec.de/hpc [EMAIL PROTECTED]
HPCE Division Germany phone: +49-711-68770-35 fax: +49-711-6877145
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NEC Deutschland GmbH, Hansaallee 101, 40549 Düsseldorf
Geschäftsführer Makoto Tsukakoshi
Handelsregister Düsseldorf HRB 57941; VAT ID DE129424743
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems