Hi,

On Fri, Feb 22, 2008 at 01:58:54PM +0100, Johan Hoeke wrote:
> Dejan Muhamedagic wrote:
> > Hi,
> > 
> > On Thu, Feb 21, 2008 at 09:00:35PM +0100, Johan Hoeke wrote:
> >> Dejan Muhamedagic wrote:
> >>> Hi,
> >>>
> >>> On Thu, Feb 21, 2008 at 04:09:19PM +0100, Johan Hoeke wrote:
> >>>> Dejan Muhamedagic wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On Thu, Feb 21, 2008 at 01:26:12PM +0100, Johan Hoeke wrote:
> >>>>>> Dejan Muhamedagic wrote:
> 
> <snip>
> 
> >> OK, I understand. I'll change from monitor on_fail=fence to stop
> >> on_fail=fence and test,test,test.
> > 
> > on_fail=fence is default for stop operations as those failures
> > are dangerous.
> 
> OK, good to know. Is this in the DTD? I looked for it just now but
> didn't find it.

I guess not. But I'm sure that it is the case.

> 
> > 
> >> I have to be super careful that the
> >> SAN filesystem doesn't get corrupted again. That happened the other day
> >> by accident when a wrong ipfilter config was pushed by mistake. The
> >> heartbeat interface was filtered out, a split brain situation occurred
> >> and the SAN filesystem was corrupted. Stonith didn't save us for
> >> whatever reason.
> > 
> > You have to have a reliable stonith device. Do you think that
> > on_fail=fence in the monitor op would have made the situation
> > better?
> 
> No probably not, just ignorance on my part.
> 
> > 
> >> The application managers don't have much confidence in
> >> heartbeat since then. :(
> > 
> > That's a shame.
> 
> I was overreacting. Tests have gone well since then. Confidence of the
> application managers is back on the rise. I'm due to test the cluster
> again this afternoon. We're going to pull out the heartbeat cable to
> test and make sure the data doesn't get corrupted. Stonith / riloe has
> worked well, except that one time apparently.

But the stonith resource is monitored? How did it fail?

Thanks,

Dejan

> I'll be sure to keep logs
> and run hb_report if anything strange happens this time.
> 
> 
> regards,
> Johan
> 



> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to