On 2007-11-08T01:25:05, Yan Fitterer <[EMAIL PROTECTED]> wrote:

> > If your software cannot withstand a crash, then it cannot be made
> > highly-available - end of story.  Crashes will happen.  Be prepared.
> This is a fine argument from an engineering perspective, but not much
> use from a sysadmin POV.

It is. it's called "vendor liability" and "not my fault". Honestly: if
eDirectory indeed corrupts its database on crashes, our engineering
group needs to get going.

(If you can send me the relevant bugzilla numbers privately, I'd be
happy to give it a good push.)

> Heartbeat should (can and does!) help on any kind of software. I'm
> simply pointing out that (for less perfect software, amongst other
> reasons) the less STONITH (hard reset) potential cases we have, the
> better. :) Anything to avoid STONITH (in particular when a node isn't
> quite dead from the workload perspective).

That is true, but it can still eat your data - node suicide looks to the
application exactly like an external STONITH op. This is not helping in
the way you appear to expect it to.

> > My suggestion for this would be to implement a full communications
> > plugin module that sends packets through disk areas.  If you do this
> > right, then the communications will remain fully up for all purposes.
> > We've had people start this effort in the past, but it's never been
> > finished and all the bugs driven out AFAIK.
> 
> Agreed. Since I can't make much headway with my other approach(es)...

Xinwei has been working on this. He posted some code to me by mail, and
it looked good. I hope he posts it to the -dev list very soon. And I'm
sure he'll appreciate help.


Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to