Hi, On Mon, Nov 26, 2007 at 04:14:07PM +0100, Andrew Beekhof wrote: > > On Nov 26, 2007, at 2:38 PM, <[EMAIL PROTECTED]> > <[EMAIL PROTECTED] > wrote: > > > > >Hi Andrew, > >I just modified my stonith device to work in both online and offline > >mode. The stonith operation (standby -> active) is successful with the > >active node cable unplugged and it seems the standby node tries to > >start > >the resource, but fails. Log is attached. But there's not enough > >logs to > >find out whats going on. It just prints: > > > >pengine[15900]: 2007/11/26_18:11:27 WARN: unpack_rsc_op: Processing > >failed op (Proxy_10_114_31_238_start_0) for Proxy_10_114_31_238 on > >standby > >pengine[15900]: 2007/11/26_18:11:27 WARN: unpack_rsc_op: Handling > >failed > >start for Proxy_10_114_31_238 on standby > > > >Is there a way to enable more log messages in HA at run-time? The > >debug > >log and regular log seem to have the same amount of messages. > > add this to ha.cf > > debug 1 >
I think that sending some signals should also help: - USR1 to increase the debug level - USR2 the opposite Thanks, Dejan > beyond that i can't help much as i've not had much to do with stonith > > > > > > >Thanks again, > >Abhi > > > >-----Original Message----- > >From: [EMAIL PROTECTED] > >[mailto:[EMAIL PROTECTED] On Behalf Of Andrew > >Beekhof > >Sent: Monday, November 26, 2007 2:59 PM > >To: General Linux-HA mailing list > >Subject: Re: [Linux-HA] Fencing prevents resource from failing over > > > > > >On Nov 26, 2007, at 9:56 AM, <[EMAIL PROTECTED]> > ><[EMAIL PROTECTED] > wrote: > > > >> > >>Thanks Andrew, > >>My comments are inline... > >>-----Original Message----- > >>From: [EMAIL PROTECTED] > >>[mailto:[EMAIL PROTECTED] On Behalf Of Andrew > >>Beekhof > >>Sent: Monday, November 26, 2007 1:44 PM > >>To: General Linux-HA mailing list > >>Subject: Re: [Linux-HA] Fencing prevents resource from failing over > >> > >> > >>On Nov 26, 2007, at 6:25 AM, <[EMAIL PROTECTED]> > >><[EMAIL PROTECTED] > wrote: > >> > >>> > >>>Hi, > >>>I've a 2 node active/passive cluster ( active node=>active , passive > >>>node=>standby) using heartbeat 2.0.8 . I recently enabled stonith . > >>>The > >>>stonith device is an rsh device that tries to restart the cluster > >>>node. > >>>However, something that used to work with stonith disabled has > >>>stopped > >> > >>>working now ; Node failover on network cable disconnection. I > >>>believe > > > >>>since the stonith device uses the network, the stonith fails and > >>>hence > >> > >>>the resource is left wherever it was running. > >> > >>correct. the cluster will not start anything until it can verify the > >>node is truly dead (with a successful stonith operation) this is > >>how a > > > >>stonith enabled cluster is supposed to work and is why IP-based > >>stonith modules are not a great idea. > >> > >> > >> > >>>Can anyone please help resolve this problem (this is probably not a > >>>problem and this is how stonith is expected to work )? I would like > >>>to > >> > >>>know if there's anyway to tell the passive (currently active node) > >>>to > > > >>>give up trying to stonith and then start the resource. > >> > >>by design - no. > >> > >>>I've attached my > >>>cib file and logs from the passive when cable is disconnected. > >>>I've no problem both nodes running the resource as active is anyway > >>>cut-off from network and can't do any damage. > >> > >>if thats truly the case, then you may not need stonith. > >> > >>ABHI: But, if the Active comes online again it's a very bad thing for > >>both nodes to be running the resources. > > > >the crm will detect that and stop one of them. > >however there will always be a period of time (even with your proposal > >below) where they are both active and both connected to the network > > > >>Can we configure two stonith > >>devices and make the node think stonith is successful if either of > >>the > > > >>stonith operations return success.Is their some kind of resource > >>constraint that I can use in this case ? > >>1. Online stonith device: That uses IP to reset the other node. > >>2. Offline stonith device: That is just dummy and on reset always > >>returns success. > > > >if you're lucky, this might work 9 times out of 10. > >but its likely that when it doesn't work, that its going to _really_ > >hurt you. > > > >"tricking" the cluster almost always leads to pain. > > > > > >my advice... get a real stonith device... > > > >>>The standby log seems to > >>>say it has quorum > >> > >>2-node clusters always have quorum, so the value is meaningless... > >> > >>>but it makes me wonder why it doesnt start the resources , inspite > >>>of > > > >>>the following evident from the logs. > >>> > >>>1. Standby marks active unclean > >>>2. Standby has quorum > >>>3. Standby tries to move resources back to standby > >>> > >>> > >>>Thanks in advance, > >>>Abhi. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>>The information contained in this electronic message and any > >>>attachments to this message are intended for the exclusive use of > >>>the > >>>addressee(s) and may contain proprietary, confidential or privileged > >>>information. If you are not the intended recipient, you should not > >>>disseminate, distribute or copy this e-mail. Please notify the > >>>sender > > > >>>immediately and destroy all copies of this message and any > >>>attachments. > >>> > >>>WARNING: Computer viruses can be transmitted via email. The > >>>recipient > > > >>>should check this email and any attachments for the presence of > >>>viruses. The company accepts no liability for any damage caused by > >>>any > >> > >>>virus transmitted by this email. > >>> > >>>www.wipro.com<ha-log- > >>>standby.txt><cib.xml>_______________________________________________ > >>>Linux-HA mailing list > >>>[email protected] > >>>http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>>See also: http://linux-ha.org/ReportingProblems > >> > >>_______________________________________________ > >>Linux-HA mailing list > >>[email protected] > >>http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>See also: http://linux-ha.org/ReportingProblems > >> > >> > >>The information contained in this electronic message and any > >>attachments to this message are intended for the exclusive use of the > >>addressee(s) and may contain proprietary, confidential or privileged > >>information. If you are not the intended recipient, you should not > >>disseminate, distribute or copy this e-mail. Please notify the sender > >>immediately and destroy all copies of this message and any > >>attachments. > >> > >>WARNING: Computer viruses can be transmitted via email. The recipient > >>should check this email and any attachments for the presence of > >>viruses. The company accepts no liability for any damage caused by > >>any > > > >>virus transmitted by this email. > >> > >>www.wipro.com > >>_______________________________________________ > >>Linux-HA mailing list > >>[email protected] > >>http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>See also: http://linux-ha.org/ReportingProblems > > > >_______________________________________________ > >Linux-HA mailing list > >[email protected] > >http://lists.linux-ha.org/mailman/listinfo/linux-ha > >See also: http://linux-ha.org/ReportingProblems > > > > > >The information contained in this electronic message and any > >attachments to this message are intended for the exclusive use of > >the addressee(s) and may contain proprietary, confidential or > >privileged information. If you are not the intended recipient, you > >should not disseminate, distribute or copy this e-mail. Please > >notify the sender immediately and destroy all copies of this message > >and any attachments. WARNING: Computer viruses can be transmitted > >via email. The recipient should check this email and any attachments > >for the presence of viruses. The company accepts no liability for > >any damage caused by any virus transmitted by this email. www.wipro.com > ><standby-log.txt> > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
