On Nov 26, 2007, at 2:38 PM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]
> wrote:
Hi Andrew,
I just modified my stonith device to work in both online and offline
mode. The stonith operation (standby -> active) is successful with the
active node cable unplugged and it seems the standby node tries to
start
the resource, but fails. Log is attached. But there's not enough
logs to
find out whats going on. It just prints:
pengine[15900]: 2007/11/26_18:11:27 WARN: unpack_rsc_op: Processing
failed op (Proxy_10_114_31_238_start_0) for Proxy_10_114_31_238 on
standby
pengine[15900]: 2007/11/26_18:11:27 WARN: unpack_rsc_op: Handling
failed
start for Proxy_10_114_31_238 on standby
Is there a way to enable more log messages in HA at run-time? The
debug
log and regular log seem to have the same amount of messages.
add this to ha.cf
debug 1
beyond that i can't help much as i've not had much to do with stonith
Thanks again,
Abhi
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew
Beekhof
Sent: Monday, November 26, 2007 2:59 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Fencing prevents resource from failing over
On Nov 26, 2007, at 9:56 AM, <[EMAIL PROTECTED]>
<[EMAIL PROTECTED] > wrote:
Thanks Andrew,
My comments are inline...
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew
Beekhof
Sent: Monday, November 26, 2007 1:44 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Fencing prevents resource from failing over
On Nov 26, 2007, at 6:25 AM, <[EMAIL PROTECTED]>
<[EMAIL PROTECTED] > wrote:
Hi,
I've a 2 node active/passive cluster ( active node=>active , passive
node=>standby) using heartbeat 2.0.8 . I recently enabled stonith .
The
stonith device is an rsh device that tries to restart the cluster
node.
However, something that used to work with stonith disabled has
stopped
working now ; Node failover on network cable disconnection. I
believe
since the stonith device uses the network, the stonith fails and
hence
the resource is left wherever it was running.
correct. the cluster will not start anything until it can verify the
node is truly dead (with a successful stonith operation) this is
how a
stonith enabled cluster is supposed to work and is why IP-based
stonith modules are not a great idea.
Can anyone please help resolve this problem (this is probably not a
problem and this is how stonith is expected to work )? I would like
to
know if there's anyway to tell the passive (currently active node)
to
give up trying to stonith and then start the resource.
by design - no.
I've attached my
cib file and logs from the passive when cable is disconnected.
I've no problem both nodes running the resource as active is anyway
cut-off from network and can't do any damage.
if thats truly the case, then you may not need stonith.
ABHI: But, if the Active comes online again it's a very bad thing for
both nodes to be running the resources.
the crm will detect that and stop one of them.
however there will always be a period of time (even with your proposal
below) where they are both active and both connected to the network
Can we configure two stonith
devices and make the node think stonith is successful if either of
the
stonith operations return success.Is their some kind of resource
constraint that I can use in this case ?
1. Online stonith device: That uses IP to reset the other node.
2. Offline stonith device: That is just dummy and on reset always
returns success.
if you're lucky, this might work 9 times out of 10.
but its likely that when it doesn't work, that its going to _really_
hurt you.
"tricking" the cluster almost always leads to pain.
my advice... get a real stonith device...
The standby log seems to
say it has quorum
2-node clusters always have quorum, so the value is meaningless...
but it makes me wonder why it doesnt start the resources , inspite
of
the following evident from the logs.
1. Standby marks active unclean
2. Standby has quorum
3. Standby tries to move resources back to standby
Thanks in advance,
Abhi.
The information contained in this electronic message and any
attachments to this message are intended for the exclusive use of
the
addressee(s) and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you should not
disseminate, distribute or copy this e-mail. Please notify the
sender
immediately and destroy all copies of this message and any
attachments.
WARNING: Computer viruses can be transmitted via email. The
recipient
should check this email and any attachments for the presence of
viruses. The company accepts no liability for any damage caused by
any
virus transmitted by this email.
www.wipro.com<ha-log-
standby.txt><cib.xml>_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
The information contained in this electronic message and any
attachments to this message are intended for the exclusive use of the
addressee(s) and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately and destroy all copies of this message and any
attachments.
WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of
viruses. The company accepts no liability for any damage caused by
any
virus transmitted by this email.
www.wipro.com
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
The information contained in this electronic message and any
attachments to this message are intended for the exclusive use of
the addressee(s) and may contain proprietary, confidential or
privileged information. If you are not the intended recipient, you
should not disseminate, distribute or copy this e-mail. Please
notify the sender immediately and destroy all copies of this message
and any attachments. WARNING: Computer viruses can be transmitted
via email. The recipient should check this email and any attachments
for the presence of viruses. The company accepts no liability for
any damage caused by any virus transmitted by this email. www.wipro.com
<standby-log.txt>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems