Andrew Beekhof wrote:

On Jan 27, 2006, at 3:23 PM, Alan Robertson wrote:

Andrew Beekhof wrote:
On 1/26/06, Alan Robertson <[EMAIL PROTECTED]> wrote:
Andrew Beekhof wrote:
On 1/26/06, Alan Robertson <[EMAIL PROTECTED]> wrote:
Andrew Beekhof wrote:
CTS testers please note this commit.

In order to run the same tests as you used to, you need to  specify:
  enable_config_writes off
in ha.cf
Why is this an ha.cf option.  It's clearly a CIB option - so I  would
think it belongs in the CIB. It makes no sense there... We discussed
some things, but I don't remember this one.
Four reasons:
 - the CIB is intended to be policy free (and at the moment is  IIRC)
BUT this is a CIB policy - hence it must be enforced and carried out by
the CIB.

- correct interpretation of options in the CIB requires linking against the PE
   (or worse, duplicating slabs of its code)
I don't follow this at all. It's the CIB that writes the CIB, isn't it?
But it doesn't know what its writing.  Same way the LRM doesn't know
what its starting.

But, the LRM does have to make special cases which make it somewhat conceptually impure.

Remember all options can be time, host and phase of the moon  dependent.
In order to understand what the option is actually set to, it  needs to
be able to evaluate all those expressions and rule sets - a fair  chunk
of the PE.
Plus its a waste to do this every time the CIB is updated.

This sure looks like a combination of the false dichotomy and straw man logical fallacies.

But, perhaps I'm missing something - because you are in fact the expert on the CIB.

So, why wouldn't calling get_xml_attr_nested() and friends return the data you want?

you would know if you paid attention for even half a second:

Remember all options can be time, host and phase of the moon  dependent.


If you say because the XML section you'd choose to put it inside of has complicated semantics, then don't do that. If you added a <cibopts/> section, that would certainly solve any potential problem of complexity - and it would be readily extensible to new things as they come up.

The environment variables can't create a complicated policy

do or do not write to disk... gee thats complicated

So, in this you agree with me. It would be helpful if you read what I wrote, and not the opposite of what I wrote.

- so saying you _have_ to have a complicated policy

if its a cluster option then it has the same properties as all the others including resource stickiness which you seemed to be rather fond of being able to set differently depending on the time.

But, again, you didn't read what I wrote. Sigh... If making it a "cluster option" doesn't work, then don't make it a cluster option. It's the false dichotomy peeking it's head in again. Making it a part of the cluster options section isn't the only answer of where to put it.

I suggested a <cibopts> section rooted immediately below <cib>. Make your own name if you don't like mine - but _whatever_ you do - don't put it in the existing cluster options section. There's no point in you arguing against your straw man proposal - because I don't care if some irrelevant straw-man proposal doesn't work. So what?

If you invent a new section, it doesn't have to have all the complexity you want to get away from. You can make it as simple as you like.

now that you move it into the CIB doesn't obviously follow. If you didn't need it before, you don't need it now.

we did need it was before... it was broken and I just didnt know it yet.

I don't think you read what I wrote. These words were in response to something I didn't say. I'm not sure exactly what, but I can't see any relationship to anything I said. So, it's impossible to respond to this.

There may be in fact, really convincing arguments you haven't presented so far. But, you're going to have to do something better than wave your hands and say "trust me I'm the expert here".

i didnt do that. i tried to explain it and you threatened to back out the changes.

For those not on IRC earlier today:  The conversation went like this:
        Alan:   it's a bug
        Andrew: it's not a bug
        repeat above 2 lines ad nauseum

There wasn't much give and take going on on either side.

But, in any case, I wasn't referring to IRC, I was referring to the email chain - since it doesn't make sense to drag in IRC without any kind of references to what was going on.

The lines in your original email offered no explanations other than "it would be hard". Your lines above offer nothing new.

And, since the lack of these options appears to affect STONITH behavior in an undocumented way, there's also a lot more here than you've talked about. I'd be very interested to hear more on that subject as well.


ooo here's a node we dont know about... what should we do?
we should know what we dont know and shoot the thing so that we do know.

I'm not sure how one can "know what we don't know". If I don't know it, I don't know it. How can I know that thing which I know that I don't know? I suspect I'm parsing your sentence wrong.

if we dont write to disk... then we have no record of any other nodes do we (unless the admin included them in the on-disk version) so there is no-one to shoot .

So... Let me see if I understand this...

(1) If you have an empty node list in the on-disk CIB, then when the first DC comes up, it looks to see who it can't contact that is in the node list. It doesn't see any nodes (besides itself), so it can't find anyone to STONITH.

(2) If you enable disk writes of the CIB, the status section is suppressed, but the automatically-generated node list _is_ written to disk. So, when the cluster starts up from scratch it now has a non-empty list of nodes, which allows the STONITH code to STONITH everyone else who hasn't come up yet - which it couldn't do in the other case.

So, I think I understand now. Answering this question was very helpful, and was much appreciated. So, it's clear that it's not a bug. It's weird - but it's not a bug in heartbeat. Sounds like a bug in the StartOneByOne test. It also sounds like a bug in the membership layer - that it doesn't report those other nodes as missing immediately (it knows they're missing even in this case).

However, announcing the change in an obscurely-titled email sent to a mailing list isn't probably the best way to document a major change to the test facilities in the middle of a code freeze that your boss is complaining to me is late. This has delayed the release at least two days.

Have you updated the /CTS page on the web site to explain these new procedures?

Did you file a bugzilla against the StartOneByOne test? [This is actually not to hard to fix - but only if it's documented as a bug]

Did you file a bugzilla against the membership layer?


--
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions." - William Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to