On Mon, Jul 28, 2008 at 09:34, Gerard Petersen <[EMAIL PROTECTED]> wrote: > Dear Andrew, > > Nice one ... But I'm into python and not into C coding... ;-)
except its a bash script :) > > Seriously, where my conclusions far of, no Installing Xen and drbd on the third node is probably the simplest option > because I'm a bit at a loss here. > > Thanx again. > > Regards, > > Gerard. > > Andrew Beekhof wrote: >> >> On Mon, Jul 28, 2008 at 08:56, Gerard Petersen <[EMAIL PROTECTED]> wrote: >>> >>> Dear Andrew, >>> >>> Thanx for your response. >>> >>> I see two options/conclusions on which I would like your feedback: >>> >>> - Enable stonith so the attempt to start the resources on the third node, >>> shall be 'naturally' disabled and therefore moved back to the first two >>> nodes by the cluster software. >>> >>> - Install Xen (and drbd) on the third node, so the cluster software get's >>> a >>> change to initialise some commands and get a proper answer to see that >>> the >>> resources don't belong here. >> >> I think you missed the most preferable option... fix the RA to return >> OCF_NOT_INSTALLED in such cases and send us a patch :-) >> >>> >>> Kind regards, >>> >>> Gerard. >>> >>> Andrew Beekhof wrote: >>>> >>>> On Thu, Jul 24, 2008 at 16:29, Gerard Petersen <[EMAIL PROTECTED]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I'm trying to add a third node to a two node working cluster >>>>> withresources >>>>> in the form of mirrored Xen (and underlying drbd) virtual servers. The >>>>> two >>>>> node setup works great and as expected. (On failure, the drbd mirrors >>>>> switch master/slave roles, XenU's migrate automatically, etc). The goal >>>>> is >>>>> to manually spread master slave combinations of the XenU's over the >>>>> three >>>>> pysical nodes. >>>>> >>>>> The third node is already added to heartbeat config, and in standby >>>>> mode. >>>>> We have contraints in place (full log and config will follow), that >>>>> work >>>>> with the +INF, 'zero' and -INF values, respectively as Master location, >>>>> Slave location and 'Never' location constraints. >>>>> >>>>> When we take the third node online, where the current XenU's according >>>>> to >>>>> the constraints are not allowed, the resources somehow all are moved to >>>>> the third node, where no xen or drbd is present yet. It seems some of >>>>> the >>>>> constraints are completely ignored. We have tried this, among other >>>>> things, with the symmetric_cluster value True and False, but no luck. >>>>> >>>>> Furthermore the log shows that the resources become 'to active', and >>>>> after >>>>> that they become unmanaged. >>>>> >>>> When a new node joins the cluster, we check to see if its running any >>>> of the cluster resources. >>>> These checks occur regardless of any location constraints (precisely >>>> so that we can enforce them for you). >>>> >>>> What can happen however, is that these checks may fail. >>>> Sometimes they fail because the service was unexpectedly found to be >>>> active on the node. >>>> Sometimes its because the resource agent (or the software it tries to >>>> talk to) isnt installed. >>>> >>>> in your case, it seems the RA is misbehaving and incorrectly telling >>>> the cluster that the resources are active >>>> eg. >>>> <lrm_rsc_op id="server128_monitor_0" operation="monitor" >>>> crm-debug-origin="build_active_RAs" >>>> transition_key="15:10:c195d63f-e91f-4162-8454-f6dde2c71ef1" >>>> transition_magic="0:0;15:10:c195d63f-e91f-4162-8454-f6dde2c71ef1" >>>> call_id="6" crm_feature_set="2.0" rc_code="0" op_status="0" >>>> interval="0" op_digest="78122685b830dcb8197c65561be6d6a5"/> >>>> >>>> rc_code="0" being the relevant piece of information >>>> >>>> The cluster then thinks that the service is active on more than one >>>> node and tries to recover. >>>> But the RA then compounds the initial problem by failing to stop the >>>> service: >>>> >>>> <lrm_rsc_op id="server128_stop_0" operation="stop" >>>> crm-debug-origin="build_active_RAs" >>>> transition_key="25:11:c195d63f-e91f-4162-8454-f6dde2c71ef1" >>>> transition_magic="0:1;25:11:c195d63f-e91f-4162-8454-f6dde2c71ef1" >>>> call_id="12" crm_feature_set="2.0" rc_code="1" op_status="0" >>>> interval="0" op_digest="78122685b830dcb8197c65561be6d6a5"/> >>>> >>>> again, rc_code="1" being the part indicating failure. >>>> >>>> at which point the cluster can do nothing (since stonith is disabled) >>>> >>>> >>>>> Some notes to clearify the setup (and make the log more readable): >>>>> >>>>> We run heartbeat version 2.1.3-5~bpo40+1 from debian backports. At the >>>>> time of testing, one node was still on 2.1.3-2~bpo40+1. >>>>> >>>>> Fysical nodes: >>>>> server010 (still to be added) >>>>> server011 >>>>> server012 >>>>> >>>>> Virtual servers (the resources): >>>>> server128 - server133 >>>>> >>>>> All resources have contraints allowing a primary role on server011 and >>>>> secondary role on server012 (or viceversa). And are not allowed on >>>>> server010. >>>>> >>>>> # Attached files are: >>>>> >>>>> - cleancib.xml >>>>> The one we started of with. >>>>> >>>>> - fullcib.xml >>>>> The most recent full dump (with counters etc. added by the cluster >>>>> software itself). >>>>> >>>>> - syslog.clusterlog.080722.full(.tgz) >>>>> A cleaned up syslog wherein, with different values for >>>>> symmetric_cluster, >>>>> the trail can be followed how all resources became to active, and end >>>>> up >>>>> unmanaged on server010 >>>>> >>>>> - syslog.clusterlog.080722.part(.tgz) >>>>> A stripped version of the previous one with only one trail, hopefully >>>>> isolation enough information, for easier analyses. >>>>> >>>>> It looks like the behaviour deviates from what the docs describe in >>>>> relation to the symmetric_cluster directive, or it's just a very ugly >>>>> typo >>>>> somewhere .. :-) >>>>> >>>>> I sincerely hope somebody can pinpoint the weakspot. >>>>> >>>>> Thanx a lot!! >>>>> >>>>> >>>>> Kind regards, >>>>> >>>>> Gerard. >>>>> >>>>> >>>>> >>>>> -- >>>>> ~ >>>>> ~ >>>>> :wq! >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>>> >>> >>> -- >>>>>> >>>>>> urls >>> >>> {'fun': 'www.zonderbroodje.nl', 'tech': 'www.gp-net.nl'} >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > -- >>>> urls > {'fun': 'www.zonderbroodje.nl', 'tech': 'www.gp-net.nl'} > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
