On 2006-05-19T13:28:08, Alan Robertson <[EMAIL PROTECTED]> wrote:

Just a quick note because I don't want to appear unresponsive, but I
don't think I'll have too much time for this thread within the next 1-2
weeks at least, or none which can do the discussion justice, because of
our release schedules. 

(Which also causes my gut to feel that we should still be mostly in a
consolidation phase and be polishing/finishing 2.0.x and clean up the
code base instead of already starting to develop major new features, but
that's just me and may be caused by release stress.)

With that disclaimer, just some real quick notes:

> >I believe it is more appropriate to treat these clusters as
> >stacked/layered clusters.
> You mean a clusters of clusters I think?  My first guess is that this 
> will likely have to wait until release 3.x.  One could design a new 
> separate system for this on top of our current code if one had the 
> resources.  We don't.

I'm not entirely sure I agree here, for two reasons.

The first is that I'm not thinking that transforming our stack to
support layered clusters will take 5 staff years. The initial phases
don't require much more but that the top-level cluster treats each site
or virtualized cluster-member as a resource, and that the lower-level
cluster has some way of tieing into the top-level for fencing and some
membership computation.  Actually, we already provide some of these
mechanisms (ie, for a top-level cluster to treat us as a resource, say
using CIM to control us).

(The one thing which is somewhat more difficult right now was if we
wanted to run instances from several layers within the same OS instance,
because our various communication channels/fifos etc don't support that,
but that doesn't occur for Xen/VMware style layered clusters and could
also be addressed quick&dirty for DR clusters by having the "top-level"
run on different nodes. But, that's mostly an implementation detail and
not a major design consideration.)

The second argument is more one of priorities: if we don't have the
resources to do it properly, we might declare it "out of scope" for the
time being (say, 2.0.x) and instead focus our resources on furthering
what we're really good at, namely being a robust cluster for a single
site. (But, say, encourage others to continue to develop software to
integrate with us as such.)

> >whose name I forgot. Stretch clusters are, to the best of my knowledge,
> >somewhat limitted because they essentially pretend that it is a flat
> >structure, but it isn't in practice...
> What kinds of limitations did you have in mind here?

Basically, most of them stem from treating a discontinuous stretched
cluster as a single one - sort of like treating an NC-NUMA architecture
as CC-SMP. 

It beings with the heartbeating over the WAN link overhead and different
latencies compared to the LAN heartbeat, to the differences to STONITH,
that sites may differ in their configuration (rarely are they 100%
identical - though we can provide for this better than other stretched
clusters by rule-affected instance attributes), fault isolation between
the sites, and ends somewhere with the fact that you want them to be
independent for anything local.

Actually, your initial steps plus some of the optional ones (human
override, among others) I consider useful regardless of their
application to stretched clusters.

> [Resource level quorums (quora?) would be an even nicer extension, that 
> would be awesome for stretch clusters but I'm not yet ready for that 
> (nor ready to ask Andrew for it) either]

Resource level quorum, or better put: a quorum hierarchy affecting
different parts of the system, actually _is_ a sort-of layered cluster,
just all globbed into a single flat domain (for everything but quorum)
;-)

This is already quite nice for a single site though - where you don't
want the fact that some nodes went down affect resources which wouldn't
have touched those nodes anyway. So yes, I guess that's actually a quite
reasonable thing to ask.

If it really takes 5-10 staff years to implement clusters of clusters,
achieving the same functionality by extending the various parts so that
a stretched cluster will work better will likely not be all that much
cheaper.

And, pointing to prior art, not just in how other products implement
this (and I know "Blueprints for High Availability" has something very
useful to say on that topic too, but I have given away my copy to a
colleague right now :-/), but in how other areas have implemented this
(think Internet Routing Architecture, grid architecture, or CPU
schedulers/memory allocators for NC-NUMA), you'll find that _very_ few
scenarios have opted to pretend that a discontinuous/hierarchial
structure is flat. 

(Even the catholic church eventually conceeded that the earth is round
;-)

But, of course, I'm fine with these enhancements being done anyway, as
long as they don't negatively impact what I still consider our primary
focus (local-area clusters), neither in code nor in the resource
staffing.

As I said, I don't really have time for this discussion in the depth it
deserves right now, and for that I apologize. But, I _do_ have opinions
I'd like to bring in ;-) I hope it is not too urgent.


Sincerely,
    Lars Marowsky-Brée

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business     -- Charles Darwin
"Ignorance more frequently begets confidence than does knowledge"

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to