Re: [Linux-ha-dev] Re: Quorum in split-site cluster

Alan Robertson Sat, 20 May 2006 07:47:52 -0700

Lars Marowsky-Bree wrote:

On 2006-05-19T13:28:08, Alan Robertson <[EMAIL PROTECTED]> wrote:


Just a quick note because I don't want to appear unresponsive, but I
don't think I'll have too much time for this thread within the next 1-2
weeks at least, or none which can do the discussion justice, because of

our release schedules.

(Which also causes my gut to feel that we should still be mostly in a
consolidation phase and be polishing/finishing 2.0.x and clean up the
code base instead of already starting to develop major new features, but
that's just me and may be caused by release stress.)

Probably. But, we have development schedules to meet. We can't sit onour hands until SUSE gets their act together ;-)

With that disclaimer, just some real quick notes:

I believe it is more appropriate to treat these clusters as
stacked/layered clusters.
You mean a clusters of clusters I think? My first guess is that thiswill likely have to wait until release 3.x. One could design a newseparate system for this on top of our current code if one had theresources. We don't.


I'm not entirely sure I agree here, for two reasons.

The first is that I'm not thinking that transforming our stack to
support layered clusters will take 5 staff years. The initial phases
don't require much more but that the top-level cluster treats each site
or virtualized cluster-member as a resource, and that the lower-level
cluster has some way of tieing into the top-level for fencing and some
membership computation.  Actually, we already provide some of these
mechanisms (ie, for a top-level cluster to treat us as a resource, say
using CIM to control us).

OK. I _know_ it will cost about 3 staff months just to update the coreheartbeat code. I've looked at it several times in the past. SimilarlyI would expect it to cost a similar amount for each of our other dozenor so other components - for the same reason. Now we're up to 3 staffyears just to get started.

And probably another 2 staff months to make all the pieces fit backtogether again, and a couple more to fix the problems that show up intesting.

Now, you get to start the part of writing the cluster of clusters code,and 40 staff months have already been spent. And, this is a HUGEinstability issue.

(The one thing which is somewhat more difficult right now was if we
wanted to run instances from several layers within the same OS instance,
because our various communication channels/fifos etc don't support that,
but that doesn't occur for Xen/VMware style layered clusters and could
also be addressed quick&dirty for DR clusters by having the "top-level"
run on different nodes. But, that's mostly an implementation detail and
not a major design consideration.)

AFAIK, this wouldn't work. It's my understanding that the top layer hasto be a member of the lower layer(s). In fact, it probably has to be a"resource" of the lower layer...

The second argument is more one of priorities: if we don't have the
resources to do it properly, we might declare it "out of scope" for the
time being (say, 2.0.x) and instead focus our resources on furthering
what we're really good at, namely being a robust cluster for a single
site. (But, say, encourage others to continue to develop software to
integrate with us as such.)

whose name I forgot. Stretch clusters are, to the best of my knowledge,
somewhat limitted because they essentially pretend that it is a flat
structure, but it isn't in practice...

What kinds of limitations did you have in mind here?


Basically, most of them stem from treating a discontinuous stretched
cluster as a single one - sort of like treating an NC-NUMA architecture

as CC-SMP.

It beings with the heartbeating over the WAN link overhead and different
latencies compared to the LAN heartbeat, to the differences to STONITH,
that sites may differ in their configuration (rarely are they 100%
identical - though we can provide for this better than other stretched
clusters by rule-affected instance attributes), fault isolation between
the sites, and ends somewhere with the fact that you want them to be
independent for anything local.

STONITH doesn't work in a stretch cluster. Or at least not reliably.Other techniques have to be used. It is worth noting that havingSTONITH failure be consider a non-blocker for going on is still a goodoption to consider for local clusters (it has been requested before).

Actually, your initial steps plus some of the optional ones (human
override, among others) I consider useful regardless of their
application to stretched clusters.


Me too :-)  Thanks!

[Resource level quorums (quora?) would be an even nicer extension, thatwould be awesome for stretch clusters but I'm not yet ready for that(nor ready to ask Andrew for it) either]


Resource level quorum, or better put: a quorum hierarchy affecting
different parts of the system, actually _is_ a sort-of layered cluster,
just all globbed into a single flat domain (for everything but quorum)
;-)

This is already quite nice for a single site though - where you don't
want the fact that some nodes went down affect resources which wouldn't
have touched those nodes anyway. So yes, I guess that's actually a quite
reasonable thing to ask.

If it really takes 5-10 staff years to implement clusters of clusters,
achieving the same functionality by extending the various parts so that
a stretched cluster will work better will likely not be all that much
cheaper.

It is cheaper. It is _much_ cheaper. IBM does a lot of stretchclusters for customers. I've asked some of the deployment experts Iknow of (and they'd REALLY good at this) to look over the proposal.

And, pointing to prior art, not just in how other products implement
this (and I know "Blueprints for High Availability" has something very
useful to say on that topic too, but I have given away my copy to a
colleague right now :-/), but in how other areas have implemented this
(think Internet Routing Architecture, grid architecture, or CPU
schedulers/memory allocators for NC-NUMA), you'll find that _very_ few
scenarios have opted to pretend that a discontinuous/hierarchial
structure is flat.

I'll look for this in "Blueprints" I still have my copy. That's agreat suggestion. Thanks!

(Even the catholic church eventually conceeded that the earth is round
;-)


s/catholic/Roman Catholic/    ;-)

<pedantic-note>catholic (lower case) just means "universal". RomanCatholic is the proper name for your intent</pedantic-note>

But, of course, I'm fine with these enhancements being done anyway, as
long as they don't negatively impact what I still consider our primary
focus (local-area clusters), neither in code nor in the resource
staffing.

As I said, I don't really have time for this discussion in the depth it
deserves right now, and for that I apologize. But, I _do_ have opinions
I'd like to bring in ;-) I hope it is not too urgent.

This is a topic I've brought up in the past. It has been on our radarscreen for a year or more. You think the things we're going to do willbe useful anyway. I agree.

I'm going to have stretch cluster experts evaluate our proposal anyway -because they speak from extensive personal experience. Their opinion isworth much more than mine - and probably even worth a little more thanyours (if you'll forgive me for saying so).


Let's see what they have to say.

--
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship... Let meclaim from you at all times your undisguised opinions." - WilliamWilberforce

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Re: Quorum in split-site cluster

Reply via email to