Andrew Beekhof wrote:
On Jan 27, 2006, at 3:23 PM, Alan Robertson wrote:
Andrew Beekhof wrote:
On 1/26/06, Alan Robertson <[EMAIL PROTECTED]> wrote:
Andrew Beekhof wrote:
On 1/26/06, Alan Robertson <[EMAIL PROTECTED]> wrote:
Andrew Beekhof wrote:
CTS testers please note this commit.
In order to run the same tests as you used to, you need to specify:
enable_config_writes off
in ha.cf
Why is this an ha.cf option. It's clearly a CIB option - so I would
think it belongs in the CIB. It makes no sense there... We
discussed
some things, but I don't remember this one.
Four reasons:
- the CIB is intended to be policy free (and at the moment is IIRC)
BUT this is a CIB policy - hence it must be enforced and carried
out by
the CIB.
- correct interpretation of options in the CIB requires linking
against the PE
(or worse, duplicating slabs of its code)
I don't follow this at all. It's the CIB that writes the CIB,
isn't it?
But it doesn't know what its writing. Same way the LRM doesn't know
what its starting.
But, the LRM does have to make special cases which make it somewhat
conceptually impure.
Remember all options can be time, host and phase of the moon dependent.
In order to understand what the option is actually set to, it needs to
be able to evaluate all those expressions and rule sets - a fair chunk
of the PE.
Plus its a waste to do this every time the CIB is updated.
This sure looks like a combination of the false dichotomy and straw
man logical fallacies.
But, perhaps I'm missing something - because you are in fact the
expert on the CIB.
So, why wouldn't calling get_xml_attr_nested() and friends return the
data you want?
you would know if you paid attention for even half a second:
Remember all options can be time, host and phase of the moon dependent.
If you say because the XML section you'd choose to put it inside of
has complicated semantics, then don't do that. If you added a
<cibopts/> section, that would certainly solve any potential problem
of complexity - and it would be readily extensible to new things as
they come up.
The environment variables can't create a complicated policy
do or do not write to disk... gee thats complicated
So, in this you agree with me. It would be helpful if you read what I
wrote, and not the opposite of what I wrote.
- so saying you _have_ to have a complicated policy
if its a cluster option then it has the same properties as all the
others including resource stickiness which you seemed to be rather fond
of being able to set differently depending on the time.
But, again, you didn't read what I wrote. Sigh... If making it a
"cluster option" doesn't work, then don't make it a cluster option.
It's the false dichotomy peeking it's head in again. Making it a part
of the cluster options section isn't the only answer of where to put it.
I suggested a <cibopts> section rooted immediately below <cib>. Make
your own name if you don't like mine - but _whatever_ you do - don't put
it in the existing cluster options section. There's no point in you
arguing against your straw man proposal - because I don't care if some
irrelevant straw-man proposal doesn't work. So what?
If you invent a new section, it doesn't have to have all the complexity
you want to get away from. You can make it as simple as you like.
now that you move it into the CIB doesn't obviously follow. If you
didn't need it before, you don't need it now.
we did need it was before... it was broken and I just didnt know it yet.
I don't think you read what I wrote. These words were in response to
something I didn't say. I'm not sure exactly what, but I can't see any
relationship to anything I said. So, it's impossible to respond to this.
There may be in fact, really convincing arguments you haven't
presented so far. But, you're going to have to do something better
than wave your hands and say "trust me I'm the expert here".
i didnt do that. i tried to explain it and you threatened to back out
the changes.
For those not on IRC earlier today: The conversation went like this:
Alan: it's a bug
Andrew: it's not a bug
repeat above 2 lines ad nauseum
There wasn't much give and take going on on either side.
But, in any case, I wasn't referring to IRC, I was referring to the
email chain - since it doesn't make sense to drag in IRC without any
kind of references to what was going on.
The lines in your original email offered no explanations other than "it
would be hard". Your lines above offer nothing new.
And, since the lack of these options appears to affect STONITH
behavior in an undocumented way, there's also a lot more here than
you've talked about. I'd be very interested to hear more on that
subject as well.
ooo here's a node we dont know about... what should we do?
we should know what we dont know and shoot the thing so that we do know.
I'm not sure how one can "know what we don't know". If I don't know it,
I don't know it. How can I know that thing which I know that I don't
know? I suspect I'm parsing your sentence wrong.
if we dont write to disk... then we have no record of any other nodes
do we (unless the admin included them in the on-disk version) so there
is no-one to shoot .
So... Let me see if I understand this...
(1) If you have an empty node list in the on-disk CIB, then when the
first DC comes up, it looks to see who it can't contact that is in the
node list. It doesn't see any nodes (besides itself), so it can't find
anyone to STONITH.
(2) If you enable disk writes of the CIB, the status section is
suppressed, but the automatically-generated node list _is_ written to
disk. So, when the cluster starts up from scratch it now has a
non-empty list of nodes, which allows the STONITH code to STONITH
everyone else who hasn't come up yet - which it couldn't do in the other
case.
So, I think I understand now. Answering this question was very helpful,
and was much appreciated. So, it's clear that it's not a bug. It's
weird - but it's not a bug in heartbeat. Sounds like a bug in the
StartOneByOne test. It also sounds like a bug in the membership layer
- that it doesn't report those other nodes as missing immediately (it
knows they're missing even in this case).
However, announcing the change in an obscurely-titled email sent to a
mailing list isn't probably the best way to document a major change to
the test facilities in the middle of a code freeze that your boss is
complaining to me is late. This has delayed the release at least two days.
Have you updated the /CTS page on the web site to explain these new
procedures?
Did you file a bugzilla against the StartOneByOne test? [This is
actually not to hard to fix - but only if it's documented as a bug]
Did you file a bugzilla against the membership layer?
--
Alan Robertson <[EMAIL PROTECTED]>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/