Re: [Linux-HA] Xen-HA - SLES X86_64

Rene Purcell Tue, 08 May 2007 09:09:43 -0700

Ok.. so there something I probably don't understand.. each node should have
right privilege ( if we are talking about filesystem permission ) because
when one of my two node failed(node1), the ressource(vm1) can start on the
available node..(node2) the problem happend when the failed node comeback
online(node1).. the ressoruce(vm1) is supposed to shutdown on node2 and
restart on the node1 isn'it ?


We already try this setup with a SLES 32bits everything was working.. I just
want to know where the problem can be.. is it my configuration ? it's
supposed to be exactly the same as my old setup.. is it the 64bits version
of SLES ?


when I set in the default config:
symetric cluster = yes
default ressource stickiness = INFINITY

and I add a place constraints, score INFINITY,expression #uname eq node1

my ressource is not supposed to go back to his original node ?? like if I
set  auto_failback option in heartbeat V1 ??

I'm sorry if my previous post was not clear..


On 5/8/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:


On 5/8/07, Rene Purcell <[EMAIL PROTECTED]> wrote:
> On 5/8/07, Rene Purcell <[EMAIL PROTECTED]> wrote:
> >
> >
> >
> > On 5/8/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> > >
> > > grep ERROR logfile
> > >
> > > try this for starters:
> > >
> > > May  7 16:31:41 qclsles01 lrmd: [5020]: info: RA output:
> > > (resource_qclvmsles02:stop:stderr) Error: the domain
> > > 'resource_qclvmsles02'
> > > does not exist.
> > > May  7 16:31:41 qclsles01 lrmd: [5020]: info: RA output:
> > > (resource_qclvmsles02:stop:stdout) Domain resource_qclvmsles02
> > > terminated
> > > May  7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event:
lrm.cLRM
> > > operation (35) stop_0 on resource_qclvmsles02 Error: (4)
insufficient
> > > privileges
> >
> >
> > yup I saw that.. it's weird. Heartbeat shutdown the vm, then say these
> > errors.. and if I cleanup the ressource he restart on the correct
node..
> > There should be something I missed lol
> >
> >
> > On 5/7/07, Rene Purcell <[EMAIL PROTECTED] > wrote:
> > > > I would like to know if someone had tried the Novell setup
described
> > > in "
> > > > http://www.novell.com/linux/technical_library/has.pdf " with a
x86_64
> > > arch ?
> > > >
> > > > I've tested this setup with a classic x86 arch and everything was
> > > ok... but
> > > > I doublechecked my config and everything look good but my VM never
> > > start on
> > > > his original node when it come back online... and I can't find
why!
> > > >
> > > >
> > > > here's the log when my node1 come back.. we can see the VM
shutting
> > > down and
> > > > after that nothing happend in the other node..
> > > >
> > > > May  7 16:31:25 qclsles01 cib: [22024]: info:
> > > > cib_diff_notify:notify.cUpdate (client: 6403, call:13):
> > > > 0.65.1020 -> 0.65.1021 (ok)
> > > > May  7 16:31:25 qclsles01 tengine: [22591]: info:
> > > > te_update_diff:callbacks.cProcessing diff (cib_update):
> > > > 0.65.1020 -> 0.65.1021
> > > > May  7 16:31:25 qclsles01 tengine: [22591]: info:
> > > > extract_event:events.cAborting on transient_attributes changes
> > > > May  7 16:31:25 qclsles01 tengine: [22591]: info:
> > > update_abort_priority:
> > > > utils.c Abort priority upgraded to 1000000
> > > > May  7 16:31:25 qclsles01 tengine: [22591]: info:
> > > update_abort_priority:
> > > > utils.c Abort action 0 superceeded by 2
> > > > May  7 16:31:26 qclsles01 cib: [22024]: info: activateCibXml: io.cCIB
> > > size
> > > > is 161648 bytes (was 158548)
> > > > May  7 16:31:26 qclsles01 cib: [22024]: info:
> > > > cib_diff_notify:notify.cUpdate (client: 6403, call:14):
> > > > 0.65.1021 -> 0.65.1022 (ok)
> > > > May  7 16:31:26 qclsles01 haclient: on_event:evt:cib_changed
> > > > May  7 16:31:26 qclsles01 tengine: [22591]: info:
> > > > te_update_diff:callbacks.cProcessing diff (cib_update):
> > > > 0.65.1021 -> 0.65.1022
> > > > May  7 16:31:26 qclsles01 tengine: [22591]: info:
> > > > match_graph_event: events.cAction resource_qclvmsles02_stop_0 (9)
> > > > confirmed
> > > > May  7 16:31:26 qclsles01 cib: [25889]: info: write_cib_contents:
io.cWrote
> > > > version 0.65.1022 of the CIB to disk (digest:
> > > > e71c271759371d44c4bad24d50b2421d)
> > > > May  7 16:31:39 qclsles01 kernel: xenbr0: port 3(vif12.0) entering
> > > disabled
> > > > state
> > > > May  7 16:31:39 qclsles01 kernel: device vif12.0 left promiscuous
mode
> > > > May  7 16:31:39 qclsles01 kernel: xenbr0: port 3( vif12.0)
entering
> > > disabled
> > > > state
> > > > May  7 16:31:39 qclsles01 logger: /etc/xen/scripts/vif-bridge:
offline
> > > > XENBUS_PATH=backend/vif/12/0
> > > > May  7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove
> > > > XENBUS_PATH=backend/vbd/12/768
> > > > May  7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove
> > > > XENBUS_PATH=backend/vbd/12/832
> > > > May  7 16:31:40 qclsles01 logger: /etc/xen/scripts/block: remove
> > > > XENBUS_PATH=backend/vbd/12/5632
> > > > May  7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge:
brctl
> > > delif
> > > > xenbr0 vif12.0 failed
> > > > May  7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge:
> > > ifconfig
> > > > vif12.0 down failed
> > > > May  7 16:31:40 qclsles01 logger: /etc/xen/scripts/vif-bridge:
> > > Successful
> > > > vif-bridge offline for vif12.0, bridge xenbr0.
> > > > May  7 16:31:40 qclsles01 logger:
> > > /etc/xen/scripts/xen-hotplug-cleanup:
> > > > XENBUS_PATH=backend/vbd/12/5632
> > > > May  7 16:31:40 qclsles01 logger:
> > > /etc/xen/scripts/xen-hotplug-cleanup:
> > > > XENBUS_PATH=backend/vbd/12/768
> > > > May  7 16:31:40 qclsles01 ifdown:     vif12.0
> > > > May  7 16:31:40 qclsles01 logger:
> > > /etc/xen/scripts/xen-hotplug-cleanup:
> > > > XENBUS_PATH=backend/vif/12/0
> > > > May  7 16:31:40 qclsles01 logger:
> > > /etc/xen/scripts/xen-hotplug-cleanup:
> > > > XENBUS_PATH=backend/vbd/12/832
> > > > May  7 16:31:40 qclsles01 ifdown: Interface not available and no
> > > > configuration found.
> > > > May  7 16:31:41 qclsles01 lrmd: [5020]: info: RA output:
> > > > (resource_qclvmsles02:stop:stderr) Error: the domain
> > > 'resource_qclvmsles02'
> > > > does not exist.
> > > > May  7 16:31:41 qclsles01 lrmd: [5020]: info: RA output:
> > > > (resource_qclvmsles02:stop:stdout) Domain resource_qclvmsles02
> > > terminated
> > > > May  7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event:
lrm.cLRM
> > > > operation (35) stop_0 on resource_qclvmsles02 Error: (4)
insufficient
> > > > privileges
> > > > May  7 16:31:41 qclsles01 cib: [22024]: info: activateCibXml:io.cCIB
> > > size
> > > > is 164748 bytes (was 161648)
> > > > May  7 16:31:41 qclsles01 crmd: [22028]: info:
> > > > do_state_transition: fsa.cqclsles01: State transition
> > > > S_TRANSITION_ENGINE -> S_POLICY_ENGINE [
> > > > input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
> > > > May  7 16:31:41 qclsles01 tengine: [22591]: info:
> > > > te_update_diff: callbacks.cProcessing diff (cib_update):
> > > > 0.65.1022 -> 0.65.1023
> > > > May  7 16:31:41 qclsles01 cib: [22024]: info:
> > > > cib_diff_notify:notify.cUpdate (client: 22028, call:100):
> > > > 0.65.1022 -> 0.65.1023 (ok)
> > > > May  7 16:31:41 qclsles01 crmd: [22028]: info:
do_state_transition:
> > > fsa.c All
> > > > 2 cluster nodes are eligable to run resources.
> > > > May  7 16:31:41 qclsles01 tengine: [22591]: ERROR:
match_graph_event:
> > > > events.c Action resource_qclvmsles02_stop_0 on qclsles01 failed
> > > (target: 0
> > > > vs. rc: 4): Error
> > > > May  7 16:31:41 qclsles01 tengine: [22591]: info:
> > > > match_graph_event:events.cAction resource_qclvmsles02_stop_0 (10)
> > > > confirmed
> > > > May  7 16:31:41 qclsles01 tengine: [22591]: info:
> > > > run_graph:
graph.c====================================================
> > > > May  7 16:31:41 qclsles01 tengine: [22591]: notice:
> > > > run_graph: graph.cTransition 12: (Complete=3, Pending=0, Fired=0,
> > > > Skipped=2, Incomplete=0)
> > > > May  7 16:31:41 qclsles01 haclient: on_event:evt:cib_changed
> > > > May  7 16:31:41 qclsles01 cib: [26190]: info: write_cib_contents:
io.cWrote
> > > > version 0.65.1023 of the CIB to disk (digest:
> > > > c80326e44b5a106fe9a384240c4a3cc9)
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
process_pe_message:
> > > > [generation] <cib generated="true" admin_epoch="0"
have_quorum="true"
> > > > num_peers="2" cib_feature_revision="1.3" ccm_transition="10"
> > > > dc_uuid="46ef9c7b-5f6e-4cc0-a0bb-94227b605170" epoch="65"
> > > > num_updates="1023"/>
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: WARN: unpack_config:
> > > unpack.c No
> > > > value specified for cluster preference: default_action_timeout
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > > unpack_config: unpack.cDefault stickiness: 1000000
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > > unpack_config:unpack.cDefault failure stickiness: -500
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > > unpack_config: unpack.cSTONITH of failed nodes is disabled
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > > unpack_config:unpack.cSTONITH will reboot nodes
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > > unpack_config: unpack.cCluster is symmetric - resources can run
> > > > anywhere by default
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info: unpack_config:
> > > unpack.c On
> > > > loss of CCM Quorum: Stop ALL resources
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > > unpack_config:unpack.cOrphan resources are stopped
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > > unpack_config:unpack.cOrphan resource actions are stopped
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: WARN: unpack_config:
> > > unpack.c No
> > > > value specified for cluster preference: remove_after_stop
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > > unpack_config:unpack.cStopped resources are removed from the
status
> > > > section: false
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info: unpack_config:
> > > unpack.c By
> > > > default resources are managed
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > determine_online_status:
> > > > unpack.c Node qclsles02 is online
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > determine_online_status:
> > > > unpack.c Node qclsles01 is online
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: WARN:
> > > > unpack_rsc_op: unpack.cProcessing failed op
> > > > (resource_qclvmsles02_stop_0) for resource_qclvmsles02
> > > > on qclsles01
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: WARN:
> > > > unpack_rsc_op:unpack.cHandling failed stop for
resource_qclvmsles02 on
> > >
> > > > qclsles01
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > process_orphan_resource:
> > > > Orphan resource <lrm_resource id="resource_NFS" type="nfs"
class="lsb"
> > > > provider="heartbeat">
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > process_orphan_resource:
> > > > Orphan resource   <lrm_rsc_op id="resource_NFS_monitor_0"
> > > > operation="monitor" crm-debug-origin="build_active_RAs"
> > > > transition_key="27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085"
> > > > transition_magic="0:0;27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085"
> > > call_id="9"
> > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0"
> > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/>
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > process_orphan_resource:
> > > > Orphan resource   <lrm_rsc_op id="resource_NFS_stop_0"
> > > operation="stop"
> > > > crm-debug-origin="build_active_RAs"
> > > > transition_key="28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085"
> > > > transition_magic="0:0;28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085"
> > > call_id="10"
> > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0"
> > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/>
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > process_orphan_resource:
> > > > Orphan resource </lrm_resource>
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: WARN:
> > > process_orphan_resource:
> > > > unpack.c Nothing known about resource resource_NFS running on
> > > qclsles01
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > create_fake_resource:
> > > > Orphan resource <lrm_resource id="resource_NFS" type="nfs"
class="lsb"
> > > > provider="heartbeat">
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > create_fake_resource:
> > > > Orphan resource   <lrm_rsc_op id="resource_NFS_monitor_0"
> > > > operation="monitor" crm-debug-origin="build_active_RAs"
> > > > transition_key="27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085"
> > > > transition_magic="0:0;27:3a815bc6-ffaa-49b3-aac2-0ed46e85f085"
> > > call_id="9"
> > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0"
> > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/>
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > create_fake_resource:
> > > > Orphan resource   <lrm_rsc_op id="resource_NFS_stop_0"
> > > operation="stop"
> > > > crm-debug-origin="build_active_RAs"
> > > > transition_key="28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085"
> > > > transition_magic="0:0;28:3a815bc6-ffaa-49b3-aac2-0ed46e85f085"
> > > call_id="10"
> > > > crm_feature_set="1.0.6" rc_code="0" op_status="0" interval="0"
> > > > op_digest="08b7001b97ccdaa1ca23a9f165256bc1"/>
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > create_fake_resource:
> > > > Orphan resource </lrm_resource>
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
> > > process_orphan_resource:
> > > > unpack.c Making sure orphan resource_NFS is stopped
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
resource_qclvmsles01
> > > > (heartbeat::ocf:Xen):    Started qclsles01
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
resource_qclvmsles02
> > >
> > > > (heartbeat::ocf:Xen):    Started qclsles01 (unmanaged) FAILED
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info: resource_NFS
> > > > (lsb:nfs):    Stopped
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: notice:
> > > > NoRoleChange:native.cLeave resource resource_qclvmsles01
> > > > (qclsles01)
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: notice:
> > > > NoRoleChange:native.cMove  resource
resource_qclvmsles02    (qclsles01
> > >
> > > > -> qclsles02)
> > > > May  7 16:31:41 qclsles01 crmd: [22028]: info:
> > > > do_state_transition:fsa.cqclsles01: State transition
S_POLICY_ENGINE
> > > > -> S_TRANSITION_ENGINE [
> > > > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: WARN:
> > > > custom_action:utils.cAction resource_qclvmsles02_stop_0 stop is
for
> > > > resource_qclvmsles02
> > > > (unmanaged)
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: WARN:
> > > > custom_action:utils.cAction resource_qclvmsles02_start_0 start is
for
> > > > resource_qclvmsles02
> > > > (unmanaged)
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: notice:
> > > > stage8:allocate.cCreated transition graph 13.
> > > > May  7 16:31:41 qclsles01 tengine: [22591]: info:
> > > > unpack_graph:unpack.cUnpacked transition 13: 0 actions in 0
synapses
> > > > May  7 16:31:41 qclsles01 crmd: [22028]: info:
> > > > do_state_transition:fsa.cqclsles01 : State transition
> > > > S_TRANSITION_ENGINE -> S_IDLE [
> > > > input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: WARN:
process_pe_message:
> > > > pengine.c No value specified for cluster preference:
> > > pe-input-series-max
> > > > May  7 16:31:41 qclsles01 tengine: [22591]: info:
> > > > run_graph:graph.cTransition 13: (Complete=0, Pending=0, Fired=0,
> > > > Skipped=0, Incomplete=0)
> > > > May  7 16:31:41 qclsles01 pengine: [22592]: info:
process_pe_message:
> > > > pengine.c Transition 13: PEngine Input stored in:
> > > > /var/lib/heartbeat/pengine/pe-input-100.bz2
> > > > May  7 16:31:41 qclsles01 tengine: [22591]: info:
> > > > notify_crmd:actions.cTransition 13 status: te_complete - (null)
> > > >
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > --
> > > > René Jr Purcell
> > > > Chargé de projet, sécurité et sytèmes
> > > > Techno Centre Logiciels Libres, http://www.tc2l.ca/
> > > > Téléphone : (418) 681-2929 #124
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > [email protected]
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > >
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> >
> >
> >
> > --
> > René Jr Purcell
> > Chargé de projet, sécurité et sytèmes
> > Techno Centre Logiciels Libres, http://www.tc2l.ca/
> > Téléphone : (418) 681-2929 #124
> >
>
> ah and how am I supposed to know which node is concerned int he log ?
> I can read:
>
> "May  7 16:31:41 qclsles01 crmd: [22028]: WARN: process_lrm_event:lrm.cLRM
> operation (35) stop_0 on resource_qclvmsles02 Error: (4) insufficient
> privileges"
>
> on my first node and the same message except for the hostname in my
second
> node.. so which one have a privileges problem ?

both
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




--
René Jr Purcell
Chargé de projet, sécurité et sytèmes
Techno Centre Logiciels Libres, http://www.tc2l.ca/
Téléphone : (418) 681-2929 #124
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Xen-HA - SLES X86_64

Reply via email to