Re: [Linux-HA] Problem with kvm virtual machine and cluster

Umberto Carrara Thu, 25 Aug 2011 11:54:03 -0700

hi,
now my cluster config is:



node host1
node host2
node host3
primitive FileServer ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/FileServerserver.xml" 
hypervisor="qemu:///system" migration_transport="ssh" \
        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="90s" \
        op monitor interval="30" timeout="60s" \
        op migrate_from interval="0" timeout="120" \
        op migrate_to interval="0" timeout="120" \
        meta migration-threshold="3" target-role="Stopped" 
allow-migrate="true" is-managed="true"
primitive Iscsi lsb:open-iscsi \
        operations $id="Iscsi-operation" \
        op start interval="0" timeout="15s" \
        op stop interval="0" timeout="15s" \
        op monitor interval="30s" timeout="15s"
primitive PingSan ocf:pacemaker:ping \
        params name="pingd-san" host_list="192.168.1.3" multiplier="100" \
        op monitor interval="10s" timeout="60s" \
        op start interval="0" timeout="60s" \
        op stop interval="0" timeout="60s"
primitive Virsh lsb:libvirt-bin \
        operations $id="Virsh-operation" \
        op start interval="0" timeout="15s" \
        op stop interval="0" timeout="15s" \
        op monitor interval="30s" timeout="15s"
group Service Iscsi Virsh
clone PingSanClone PingSan \
        meta globally-unique="false" interleave="true" target-role="Started"
clone ServiceClone Service \
        meta globally-unique="false" interleave="true" target-role="Started"
location ServiceCloneLocation ServiceClone \
        rule $id="ServiceCloneOnConnectedSan" -inf: not_defined pingd-san or 
pingd-san lte 0
colocation B inf: FileServerServer ServiceClone
order A inf: ServiceClone:start FileServerServer
property $id="cib-bootstrap-options" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        dc-version="1.0.11-6e010d6b0d49a6b929d17c0114e9d2d934dc8e04" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="3" \
        start-failure-is-fatal="false" \
        stop-orphan-resources="false" \
        stop-orphan-actions="false"
rsc_defaults $id="rsc-options" \
        resource-stickiness="200"

after I start cluster I have that:

Online: [ host1 host2 host3 ]

 Clone Set: ServiceClone
     Started: [ host1 host2 host3 ]
 Clone Set: PingSanClone
     Started: [ host1 host2 host3 ]
File    (ocf::heartbeat:VirtualDomain) Started  (unmanaged) FAILED[     host2   
host3   host1 ]

Migration summary:
* Node host2:
   FileServer: migration-threshold=3 fail-count=1000000 last-failure='Thu Aug 
25 20:24:28 2011'
* Node host3:
   FileServer: migration-threshold=3 fail-count=1000000 last-failure='Thu Aug 
25 20:24:28 2011'
* Node host1:
   FileServer: migration-threshold=3 fail-count=1000000 last-failure='Thu Aug 
25 20:24:28 2011'

Failed actions:
    FileServer_monitor_0 (node=host2, call=25, rc=1, status=complete): unknown 
error
    FileServer_stop_0 (node=host2, call=26, rc=1, status=complete): unknown 
error
    FileServer_monitor_0 (node=host3, call=25, rc=1, status=complete): unknown 
error
    FileServer_stop_0 (node=host3, call=26, rc=1, status=complete): unknown 
error
    FileServer_monitor_0 (node=host1, call=25, rc=1, status=complete): unknown 
error
    FileServer_stop_0 (node=host1, call=26, rc=1, status=complete): unknown 
error


in log after a cleanup I found that:

ug 25 20:24:28 host1 lrmd: [12314]: debug: lrmd_rsc_destroy: removing resource 
File
Aug 25 20:24:28 host1 lrmd: [12314]: debug: on_msg_add_rsc:client [12317] adds 
resource File
Aug 25 20:24:28 host1 lrmd: [12314]: debug: on_msg_perform_op:2385: copying 
parameters for rsc File
Aug 25 20:24:28 host1 lrmd: [12314]: debug: on_msg_perform_op: add an 
operation operation monitor[25] on ocf::VirtualDomain::FileServer for client 
12317, its parameters: crm_feature_set=[3.0.1] 
config=[/etc/libvirt/qemu/FileServerserver.xml] migration_transport=[ssh] 
hypervisor=[qemu:///system] CRM_meta_timeout=[60000]  to the operation list.
Aug 25 20:24:28 host1 lrmd: [12314]: info: rsc:FileServer:25: probe
Aug 25 20:24:28 host1 lrmd: [28405]: debug: perform_ra_op: resetting scheduler 
class to SCHED_OTHER

Aug 25 20:24:28 host1 lrmd: [12314]: WARN: Managed FileServer:monitor process 
28405
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 exited with return code 1.
^^^^^^^^^^^^^^^^^^^^^^^^


I have done a debug of VirtualDomain but seems ok, is thesre someone that can 
help me in this trouble?
I think the problem is in the interaction between the cluster, and libvirt

thanks to all

Umberto











Il giovedì 11 agosto 2011 08:04:36 Andrew Beekhof ha scritto:
> On Wed, Aug 10, 2011 at 11:15 PM, Maloja01 <[email protected]> wrote:
> > The order constraints do work as I assume, but I guess that
> > you run into a pifall:
> >
> > A clone is marked as "up", if one instance in the cluster is started
> > successfully. The order does not say, that the clone on the same node
> > must be up.
>
> Use a colocation constraint to have that
>
> > Kind regards
> > Fabian
> >
> > On 08/10/2011 01:43 PM, [email protected] wrote:
> >> hi,
> >> excuse me for my poor english, i use google to help me in traslation....
> >> and I am a newbie in clustering :-).
> >>
> >> I'm trying to start a cluster with tree nodes for virtualizzation, I
> >> have used a how-to that I found at
> >> http://www.linbit.com/support/ha-kvm.pdf to configure the cluster,
> >> volumes of vm are shared on openFileServerServerServerServerr cluster on 
iscsi that works well.
> >>
> >> vm start ok in hosts if I'm out of the cluster.
> >>
> >> The problem is that the vm start before libvirt and open-iscsi initiator
> >> I have set a order rule but seems wont work.
> >> after when services are started the cluster can not restart the machine
> >>
> >>
> >> so the output of crm_mon -1 is
> >> ============
> >> Last updated: Wed Aug 10 12:40:20 2011
> >> Stack: openais
> >> Current DC: host1 - partition with quorum
> >> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> >> 3 Nodes configured, 3 expected votes
> >> 2 Resources configured.
> >> ============
> >>
> >> Online: [ host1 host2 host3 ]
> >>
> >>  Clone Set: BackEndClone
> >>      Started: [ host1 host2 host3 ]
> >> Samba   (ocf::heartbeat:VirtualDomain) Started [        host1   host2
> >> host3 ]
> >>
> >> Failed actions:
> >>     Samba_monitor_0 (node=host1, call=15, rc=1, status=complete):
> >> unknown error
> >>     Samba_stop_0 (node=host1, call=16, rc=1, status=complete): unknown
> >> error Samba_monitor_0 (node=host2, call=12, rc=1, status=complete):
> >> unknown error
> >>     Samba_stop_0 (node=host2, call=13, rc=1, status=complete): unknown
> >> error Samba_monitor_0 (node=host3, call=12, rc=1, status=complete):
> >> unknown error
> >>     Samba_stop_0 (node=host3, call=13, rc=1, status=complete): unknown
> >> error
> >>
> >>
> >>
> >>
> >> this is my cluster config:
> >>
> >> root@host1:~# crm configure show
> >> node host1 \
> >>         attributes standby="on"
> >> node host2 \
> >>         attributes standby="on"
> >> node host3 \
> >>         attributes standby="on"
> >> primitive Iscsi lsb:open-iscsi \
> >>         op monitor interval="30"
> >> primitive Samba ocf:heartbeat:VirtualDomain \
> >>         params config="/etc/libvirt/qemu/samba.iso.xml" \
> >>         meta allow-migrate="true" \
> >>         op monitor interval="30"
> >> primitive Virsh lsb:libvirt-bin \
> >>         op monitor interval="30"
> >> group BackEnd Iscsi Virsh
> >> clone BackEndClone BackEnd \
> >>         meta target-role="Started"
> >> colocation SambaOnBackEndClone inf: Samba BackEndClone
> >> order SambaBeforeBackEndClone inf: BackEndClone Samba
> >> property $id="cib-bootstrap-options" \
> >>         dc-version="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b" \
> >>         cluster-infrastructure="openais" \
> >>         expected-quorum-votes="3" \
> >>         stonith-enabled="false" \
> >>         no-quorum-policy="ignore" \
> >>         default-action-timeout="100" \
> >>         last-lrm-refresh="1312970592"
> >> rsc_defaults $id="rsc-options" \
> >>         resource-stickiness="200"
> >>
> >> my log is:
> >>
> >> Aug 10 13:36:34 host1 pengine: [1923]: info: get_failcount: Samba has
> >> failed INFINITY times on host1
> >> Aug 10 13:36:34 host1 pengine: [1923]: WARN: common_apply_stickiness:
> >> Forcing Samba away from host1 after 1000000 failures (max=1000000)
> >> Aug 10 13:36:34 host1 pengine: [1923]: info: get_failcount: Samba has
> >> failed INFINITY times on host2
> >> Aug 10 13:36:34 host1 pengine: [1923]: WARN: common_apply_stickiness:
> >> Forcing Samba away from host2 after 1000000 failures (max=1000000)
> >> Aug 10 13:36:34 host1 pengine: [1923]: info: get_failcount: Samba has
> >> failed INFINITY times on host3
> >> Aug 10 13:36:34 host1 pengine: [1923]: WARN: common_apply_stickiness:
> >> Forcing Samba away from host3 after 1000000 failures (max=1000000)
> >> Aug 10 13:36:34 host1 pengine: [1923]: info: native_merge_weights:
> >> BackEndClone: Rolling back scores from Samba
> >> Aug 10 13:36:34 host1 pengine: [1923]: info: native_color: Unmanaged
> >> resource Samba allocated to 'nowhere': failed
> >> Aug 10 13:36:34 host1 pengine: [1923]: WARN: native_create_actions:
> >> Attempting recovery of resource Samba
> >> Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave
> >> resource Iscsi:0       (Started host1)
> >> Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave
> >> resource Virsh:0       (Started host1)
> >> Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave
> >> resource Iscsi:1       (Started host2)
> >> Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave
> >> resource Virsh:1       (Started host2)
> >> Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave
> >> resource Iscsi:2       (Started host3)
> >> Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave
> >> resource Virsh:2       (Started host3)
> >> Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave
> >> resource Samba (Started unmanaged)
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Problem with kvm virtual machine and cluster

Reply via email to