Re: [Linux-ha-dev] Re: [Linux-ha-cvs] Linux-HA CVS: heartbeat by andrew from

Alan Robertson Fri, 27 Jan 2006 12:12:51 -0800

Andrew Beekhof wrote:

On Jan 27, 2006, at 3:23 PM, Alan Robertson wrote:
Andrew Beekhof wrote:
On 1/26/06, Alan Robertson <[EMAIL PROTECTED]> wrote:
Andrew Beekhof wrote:
On 1/26/06, Alan Robertson <[EMAIL PROTECTED]> wrote:
Andrew Beekhof wrote:
CTS testers please note this commit.

In order to run the same tests as you used to, you need to  specify:
  enable_config_writes off
in ha.cf
Why is this an ha.cf option.  It's clearly a CIB option - so I  would
think it belongs in the CIB. It makes no sense there... Wediscussed
some things, but I don't remember this one.
Four reasons:
 - the CIB is intended to be policy free (and at the moment is  IIRC)
BUT this is a CIB policy - hence it must be enforced and carriedout by
the CIB.
- correct interpretation of options in the CIB requires linkingagainst the PE
   (or worse, duplicating slabs of its code)
I don't follow this at all. It's the CIB that writes the CIB,isn't it?
But it doesn't know what its writing.  Same way the LRM doesn't know
what its starting.
But, the LRM does have to make special cases which make it somewhatconceptually impure.
Remember all options can be time, host and phase of the moon  dependent.
In order to understand what the option is actually set to, it  needs to
be able to evaluate all those expressions and rule sets - a fair  chunk
of the PE.
Plus its a waste to do this every time the CIB is updated.
This sure looks like a combination of the false dichotomy and strawman logical fallacies.
But, perhaps I'm missing something - because you are in fact theexpert on the CIB.
So, why wouldn't calling get_xml_attr_nested() and friends return thedata you want?
you would know if you paid attention for even half a second:
Remember all options can be time, host and phase of the moon  dependent.
If you say because the XML section you'd choose to put it inside ofhas complicated semantics, then don't do that. If you added a<cibopts/> section, that would certainly solve any potential problemof complexity - and it would be readily extensible to new things asthey come up.
The environment variables can't create a complicated policy
do or do not write to disk... gee thats complicated

So, in this you agree with me. It would be helpful if you read what Iwrote, and not the opposite of what I wrote.

- so saying you _have_ to have a complicated policy
if its a cluster option then it has the same properties as all theothers including resource stickiness which you seemed to be rather fondof being able to set differently depending on the time.

But, again, you didn't read what I wrote. Sigh... If making it a"cluster option" doesn't work, then don't make it a cluster option.It's the false dichotomy peeking it's head in again. Making it a partof the cluster options section isn't the only answer of where to put it.

I suggested a <cibopts> section rooted immediately below <cib>. Makeyour own name if you don't like mine - but _whatever_ you do - don't putit in the existing cluster options section. There's no point in youarguing against your straw man proposal - because I don't care if someirrelevant straw-man proposal doesn't work. So what?

If you invent a new section, it doesn't have to have all the complexityyou want to get away from. You can make it as simple as you like.

now that you move it into the CIB doesn't obviously follow. If youdidn't need it before, you don't need it now.
we did need it was before... it was broken and I just didnt know it yet.

I don't think you read what I wrote. These words were in response tosomething I didn't say. I'm not sure exactly what, but I can't see anyrelationship to anything I said. So, it's impossible to respond to this.

There may be in fact, really convincing arguments you haven'tpresented so far. But, you're going to have to do something betterthan wave your hands and say "trust me I'm the expert here".
i didnt do that. i tried to explain it and you threatened to back outthe changes.


For those not on IRC earlier today:  The conversation went like this:
        Alan:   it's a bug
        Andrew: it's not a bug
        repeat above 2 lines ad nauseum

There wasn't much give and take going on on either side.

But, in any case, I wasn't referring to IRC, I was referring to theemail chain - since it doesn't make sense to drag in IRC without anykind of references to what was going on.

The lines in your original email offered no explanations other than "itwould be hard". Your lines above offer nothing new.

And, since the lack of these options appears to affect STONITHbehavior in an undocumented way, there's also a lot more here thanyou've talked about. I'd be very interested to hear more on thatsubject as well.
ooo here's a node we dont know about... what should we do?
we should know what we dont know and shoot the thing so that we do know.

I'm not sure how one can "know what we don't know". If I don't know it,I don't know it. How can I know that thing which I know that I don'tknow? I suspect I'm parsing your sentence wrong.

if we dont write to disk... then we have no record of any other nodesdo we (unless the admin included them in the on-disk version) so thereis no-one to shoot .


So... Let me see if I understand this...

(1) If you have an empty node list in the on-disk CIB, then when thefirst DC comes up, it looks to see who it can't contact that is in thenode list. It doesn't see any nodes (besides itself), so it can't findanyone to STONITH.

(2) If you enable disk writes of the CIB, the status section issuppressed, but the automatically-generated node list _is_ written todisk. So, when the cluster starts up from scratch it now has anon-empty list of nodes, which allows the STONITH code to STONITHeveryone else who hasn't come up yet - which it couldn't do in the othercase.

So, I think I understand now. Answering this question was very helpful,and was much appreciated. So, it's clear that it's not a bug. It'sweird - but it's not a bug in heartbeat. Sounds like a bug in theStartOneByOne test. It also sounds like a bug in the membership layer- that it doesn't report those other nodes as missing immediately (itknows they're missing even in this case).

However, announcing the change in an obscurely-titled email sent to amailing list isn't probably the best way to document a major change tothe test facilities in the middle of a code freeze that your boss iscomplaining to me is late. This has delayed the release at least two days.

Have you updated the /CTS page on the web site to explain these newprocedures?

Did you file a bugzilla against the StartOneByOne test? [This isactually not to hard to fix - but only if it's documented as a bug]


Did you file a bugzilla against the membership layer?


--
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship... Let meclaim from you at all times your undisguised opinions." - WilliamWilberforce

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Re: [Linux-ha-cvs] Linux-HA CVS: heartbeat by andrew from

Reply via email to