On Nov 19, 2007, at 11:57 AM, Sebastian Reitenbach wrote:

Hi,

Zhen Huang <[EMAIL PROTECTED]> wrote:
Hi,

The DC node should try to connect to the quorumd sever periodically.
If not, it should be a bug.


Why the DC node? Shouldn't it be the CCM leader? The DC shouldn't have anything to do with this.

Thanks for clarifying, I'll retest later today when I'm back at home, when I
can reproduce, I'll open a bugzilla entry.

kind regards
Sebastian



Alan Robertson <[EMAIL PROTECTED]>
11/14/2007 03:13 AM

To
Sebastian Reitenbach <[EMAIL PROTECTED]>
cc
[email protected], Zhen Huang/China/[EMAIL PROTECTED]
Subject
Re: [Linux-HA] question regarding quorumd






Sebastian Reitenbach wrote:
Hi,

Andrew Beekhof <[EMAIL PROTECTED]> wrote:
On Nov 13, 2007, at 11:13 AM, Sebastian Reitenbach wrote:

Hi,

Andrew Beekhof <[EMAIL PROTECTED]> wrote:
On Nov 9, 2007, at 4:34 PM, Sebastian Reitenbach wrote:

Hi,

I did some tests with a two node cluster and a third one running a
quorumd.

I started the quorumd, and then the two cluster nodes.
The one that became DC, started to communicate with the remote
quorumd.
The CRM (and thus the "DC") doesn't know anything about quorumd
I believe this is purely the domain of the CCM and I've no idea how
that works :-)

We just consume membership data from it...

So anyway, my point is that the fact that a node is the DC is
irrelevant when it comes to quorumd.
but somehow the cluster knows, as only the DC is communicating with
the
external quorumd.
I think that its just a co-incidence that it happens to be the DC...
at least I hope it is.
I thought I read somewhere, that the DC is the one in charge of
communicating with the remote quorumd, but I may be wrong here.

I just do not understand, why the cluster does not retry
to re-contact the quorumd after it lost connection to it. This was
what I
assumed, after a disconnect to the remote quorumd, the cluster nodes
should
try to contact it, and when the contact is there again, use it again.
I agree - but I've never seen that code. You'll have to contact alan
or file a bug for him.
Alan, in case you think this is a bug, I'll go create a bug report for
it.
Please let me know.

I killed the DC, saw the other becoming DC, and start communicating
to the remote quorumd, all fine, cluster still with quorum.
Then I killed the quorumd itself, the DC recognized, and started to
stop
all resource, because of the quorum_policy, as it lost quorum.

Then I restarted the quorumd again, but the DC, still without
quorum,
did not tried to communicate to the quorumd again.
I'd expect the still living DC to try to contact the quorumd, in
case it
comes back.

If there is a good reason, why the DC is not trying to reconnect to
the
remote quorumd I'd really like to get enlightened from someone who
knows.

It should be trying to reconnect. It _does_ communicate w/quorumd from a single machine/cluster. I think that it's coincidence that it's the
DC.  Huang Zhen wrote the code.  I've CCed him.  I'm at the LISA
conference this week - if HZ doesn't get back to you by next Monday,
I'll look into it.


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to