On 2006-04-24T13:29:10, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> >>The recovery seemed a bit harsh, however we already move all healthy
> >>resources away before fencing a node so it isn't so horrible.
> >Ah, this would be impossible in Alan's proposal. The node would be
> >blocked at the CCM level, and thus we couldn't communicate with said
> >node to move any resources away.
> oh... missed that connection :-(
Well, it would still be "correct", just "harsh".
> >Handling it at the CRM level would be better for this exact reason.
> could we perhaps use attrd and a rsc_location rule to stop the "bad"
> side?
> i haven't thought this through at all, but you'd get dampening, it
> would be crm-driven, and you avoid fencing (unless the stop fails).
>
> the hard bit is deciding who should send what values to attrd.
That's an interesting idea.
Actually, I think it's fairly easy in general. Everybody should sent the
count of nodes it can talk to (successfully connected nodes etc as
viewed by OCFS2); this should automatically make sure it gets run on the
largest group and stopped on everywhere else.
(This is assuming that attrd interacts with clones.)
Now, there's one hard part, and that's as always - if there's a tie. ie,
if we have 1+1 nodes, each will send "I can talk to 1 nodes (that is,
myself) (or 0, depends on how you count)" [this could happen with drbd,
or even with OCFS2, where N+N could happen too]. And we/attrd wouldn't
know that these are in fact two partitions.
Hrm. Needs a bit more thought, but shows promise.
Sincerely,
Lars Marowsky-Brée
--
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/