I am soooo frelling close! But still having issues. First, I found a thread (http://comments.gmane.org/gmane.linux.highavailability.pacemaker/11880) dealing with the same issue. It suggests that the reason for the 'not installed' is that the VirtualDomain RA can't read the /etc/libvirt/qemu/<domain>.xml file. So I tried moving mine to local storage, and the 'not installed' errors went away. ( Plea to the devs: Please! O! Please! flesh out that error message so it is more useful and makes sense. Thanks!)
It seems to be an issue of a non-root user accessing the files (/etc/libvirt/qemu/<domain>.xml), which are owned by root, and have permissions of 600 (rw only for root, no permissions for anyone else). I've tried changing the permissions/ownership, but they get reset every time I migrate the VM, even just through libvirt. The odd thing is that with local storage, this isn't a problem. The shared FS is gluster, so I guess it's something that gluster is doing (or not doing), but I'm not sure what. Can anyone tell me which process does the access, and if it's not running as root, how it accesses those files when they're normally only readable by root? Thanks, pma Paul Archer, Linux System Administrator [email protected] 972-646-0137 cell 1717 McKinney Ave, Suite 800 Dallas, TX 75201 www.topgolf.com ________________________________________ From: [email protected] [[email protected]] on behalf of Paul Archer [[email protected]] Sent: Wednesday, February 13, 2013 3:19 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] VirtualDomain resources won't migrate Two more things: I can clear the errors ('cleanup resources') on vmhost2, and the resource (which has never stopped) will become managed again. If I reboot the node that a VM resource running on it, that VM is shutdown, and doesn't seem to want to come up on the other node. Shouldn't the VM be live-migrated to the other node before the node shuts down? pma Paul Archer, Linux System Administrator [email protected] 972-646-0137 cell 1717 McKinney Ave, Suite 800 Dallas, TX 75201 www.topgolf.com ________________________________________ From: [email protected] [[email protected]] on behalf of Paul Archer [[email protected]] Sent: Wednesday, February 13, 2013 3:12 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] VirtualDomain resources won't migrate I'm getting closer, but I'm stuck on one last thing. First, I made sure that every node could ssh to every other node. That got me to the point that I could migrate my VM (dns1.austin9). It's even a live migration, which is great. But while migrating from vmhost2 to vmhost1 works with no issue, when I migrate from vmhost1 to vmhost2 the VM makes it to the target host--but then I get failures: Failed actions: dns1.austin9_monitor_10000 (node=vmhost2.austin9.topgolf.com, call=36, rc=5, status=complete): not in stalled dns1.austin9_stop_0 (node=vmhost2.austin9.topgolf.com, call=37, rc=5, status=complete): not installed I don't understand why I'm getting these errors, and I really don't get why it's happening on one node and not the other. One is literally a clone of the other (the list of installed packages is identical), and all the VM configuration (/etc/libvirt and /var/lib/libvirt) is on shared storage. Both nodes have been rebooted a few times, and I'm still getting the same behavior. I'm seeing errors like this in /var/log/corosync.log: Feb 13 15:02:14 vgs2.austin9.topgolf.com crmd: [14625]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Feb 13 15:02:14 vgs2.austin9.topgolf.com crmd: [14625]: info: do_state_transition: All 4 cluster nodes are eligible to run resources. Feb 13 15:02:14 vgs2.austin9.topgolf.com crmd: [14625]: info: do_pe_invoke: Query 2369: Requesting the current CIB: S_POLICY_ENGINE Feb 13 15:02:14 vgs2.austin9.topgolf.com crmd: [14625]: info: do_pe_invoke_callback: Invoking the PE: query=2369, ref=pe_calc-dc-1360789334-1707, seq=480, quorate=1 Feb 13 15:02:14 vgs2.austin9.topgolf.com pengine: [14624]: notice: process_pe_message: Transition 393: PEngine Input stored in: /var/lib/pengine/pe-input-325.bz2 Feb 13 15:02:14 vgs2.austin9.topgolf.com pengine: [14624]: notice: unpack_rsc_op: Hard error - dns1.austin9_monitor_10000 failed with rc=5: Preventing dns1.austin9 from re-starting on vmhost2.austin9.topgolf.com Feb 13 15:02:14 vgs2.austin9.topgolf.com pengine: [14624]: WARN: unpack_rsc_op: Processing failed op dns1.austin9_monitor_10000 on vmhost2.austin9.topgolf.com: not installed (5) Feb 13 15:02:14 vgs2.austin9.topgolf.com pengine: [14624]: notice: unpack_rsc_op: Hard error - dns1.austin9_last_failure_0 failed with rc=5: Preventing dns1.austin9 from re-starting on vmhost2.austin9.topgolf.com Feb 13 15:02:14 vgs2.austin9.topgolf.com pengine: [14624]: WARN: unpack_rsc_op: Processing failed op dns1.austin9_last_failure_0 on vmhost2.austin9.topgolf.com: not installed (5) Feb 13 15:02:14 vgs2.austin9.topgolf.com pengine: [14624]: WARN: common_apply_stickiness: Forcing dns1.austin9 away from vmhost2.austin9.topgolf.com after 1000000 failures (max=1000000) Any help would be appreciated, pma Paul Archer, Linux System Administrator [email protected] 972-646-0137 cell 1717 McKinney Ave, Suite 800 Dallas, TX 75201 www.topgolf.com ________________________________________ From: [email protected] [[email protected]] on behalf of Paul Archer [[email protected]] Sent: Wednesday, February 13, 2013 12:57 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] VirtualDomain resources won't migrate OK, it looks like it was a stupid mistake on my part. I had to rebuild one of the vmhost servers, and didn't make sure the other servers in the cluster knew about the server's new ssh key. So the DC and the other vmhost couldn't communicate with it properly. I've taken care of that, but I'm still getting some errors. I'll see what I can do with them before coming back for more help. Thanks, pma Paul Archer, Linux System Administrator [email protected] 972-646-0137 cell 1717 McKinney Ave, Suite 800 Dallas, TX 75201 www.topgolf.com ________________________________________ From: [email protected] [[email protected]] on behalf of Paul Archer [[email protected]] Sent: Wednesday, February 13, 2013 11:54 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] VirtualDomain resources won't migrate You're right, you didn't say it was wrong; I just misread your initial post. I don't really follow on the move vs master/promotion explanation. I can use 'crm resource move' to move my virtual IP (vgsIP) back and forth all day long. And 'crm' itself says that 'crm resource migrate' is used to move a resource to another node: # crm resource help migrate Migrate a resource to a different node. If node is left out, the resource is migrated by creating a constraint which prevents it from running on the current node. Additionally, you may specify a lifetime for the constraint---once it expires, the location constraint will no longer be active. Not to mention that when I shut down the node with the active resource, the resource didn't fail over properly. If these logs don't show anything, is there someplace else I could look? Or is it possible to turn on debugging? In the meantime, I'll look into the showscores script to see if it yields anything useful. pma Paul Archer, Linux System Administrator [email protected] 972-646-0137 cell 1717 McKinney Ave, Suite 800 Dallas, TX 75201 www.topgolf.com ________________________________________ From: [email protected] [[email protected]] on behalf of Dejan Muhamedagic [[email protected]] Sent: Wednesday, February 13, 2013 10:32 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] VirtualDomain resources won't migrate On Wed, Feb 13, 2013 at 01:43:22PM +0000, Paul Archer wrote: > Thanks for the response. I hadn't run across anything about > symmetrical/asymmetrical clusters; however, this doc makes it clear that it's > a matter of preference, and that I'm not actually doing anything wrong in > regards to that: > http://clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained/s-resource-location.html I didn't say it was wrong, just that your configuration would look better. Of course, your configuration, your choice. Nothing in the logs below, i.e. the PE doesn't think anything needs moving. You'll need to calculate the scores (there's a script by Dominik Klein somewhere, called showscores.sh). BTW, how did you try to migrate the VM? Just "crm resource migrate" equals "crm resource move" which is moving just a "Started" resource. But you want to have the VM _promoted_ elsewhere. For that, you'll need to set the role to Master in the location constraint. Something like this: location cli-prefer-dns1.austin9 dns1.austin9 \ rule $id="cli-prefer-rule-dns1.austin9" $role="Master" inf: #uname eq vmhost1.austin9.topgolf.com and date lt "2013-04-13 01:50:32Z" Thanks, Dejan > As far as logs, this is from the DC, where I also tried to do a 'crm resource > migrate' command: > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib_process_request: Operation > complete: op cib_delete for section constraints (origin=local/crm_resource/3, > version=0.145.60): ok (rc=0) > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - <cib admin_epoch="0" > epoch="145" num_updates="60" > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - <configuration > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - <constraints > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - <rsc_location > id="cli-prefer-dns1.austin9" > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - <rule > id="cli-prefer-rule-dns1.austin9" > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - > <date_expression end="2013-04-13 01:50:32Z" > id="cli-prefer-lifetime-end-dns1.austin9" /> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - </rule> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - </rsc_location> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - </constraints> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - </configuration> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: - </cib> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + <cib epoch="146" > num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" > crm_feature_set="3.0.5" update-origin="vmhost1.austin9.topgolf.com" > update-client="cibadmin" cib-last-written="Wed Feb 13 06:47:34 2013" > have-quorum="1" dc-uuid="vgs2.austin9.topgolf.com" > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + <configuration > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + <constraints > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + <rsc_location > id="cli-prefer-dns1.austin9" rsc="dns1.austin9" > > Feb 13 07:41:42 vgs2 crmd: [14625]: info: abort_transition_graph: > te_update_diff:124 - Triggered transition abort (complete=1, tag=diff, > id=(null), magic=NA, cib=0.146.1) : Non-status change > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + <rule > id="cli-prefer-rule-dns1.austin9" score="INFINITY" boolean-op="and" > > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + <expression > attribute="#uname" id="cli-prefer-expr-dns1.austin9" operation="eq" > value="vmhost1.austin9.topgolf.com" type="string" /> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + > <date_expression end="2013-04-13 13:41:42Z" > id="cli-prefer-lifetime-end-dns1.austin9" operation="lt" /> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + </rule> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + </rsc_location> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + </constraints> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + </configuration> > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib:diff: + </cib> > Feb 13 07:41:42 vgs2 crmd: [14625]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Feb 13 07:41:42 vgs2 cib: [14621]: info: cib_process_request: Operation > complete: op cib_modify for section constraints (origin=local/crm_resource/4, > version=0.146.1): ok (rc=0) > Feb 13 07:41:42 vgs2 crmd: [14625]: info: do_state_transition: All 4 cluster > nodes are eligible to run resources. > Feb 13 07:41:42 vgs2 crmd: [14625]: info: do_pe_invoke: Query 1200: > Requesting the current CIB: S_POLICY_ENGINE > Feb 13 07:41:42 vgs2 crmd: [14625]: info: do_pe_invoke_callback: Invoking the > PE: query=1200, ref=pe_calc-dc-1360762902-801, seq=404, quorate=1 > Feb 13 07:41:42 vgs2 pengine: [14624]: notice: LogActions: Leave > vgsIP#011(Started vgs1.austin9.topgolf.com) > Feb 13 07:41:42 vgs2 pengine: [14624]: notice: LogActions: Leave > vgsWebServer#011(Started vgs1.austin9.topgolf.com) > Feb 13 07:41:42 vgs2 pengine: [14624]: notice: LogActions: Leave > dns1.austin9#011(Started vmhost2.austin9.topgolf.com) > Feb 13 07:41:42 vgs2 pengine: [14624]: notice: LogActions: Leave > vgs1-stonith-ipmi#011(Started vgs2.austin9.topgolf.com) > Feb 13 07:41:42 vgs2 pengine: [14624]: notice: LogActions: Leave > vgs2-stonith-ipmi#011(Started vmhost1.austin9.topgolf.com) > Feb 13 07:41:42 vgs2 pengine: [14624]: notice: LogActions: Leave > vmhost1-stonith-ipmi#011(Started vgs2.austin9.topgolf.com) > Feb 13 07:41:42 vgs2 pengine: [14624]: notice: LogActions: Leave > vmhost2-stonith-ipmi#011(Started vmhost1.austin9.topgolf.com) > Feb 13 07:41:42 vgs2 crmd: [14625]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Feb 13 07:41:42 vgs2 crmd: [14625]: info: unpack_graph: Unpacked transition > 160: 0 actions in 0 synapses > Feb 13 07:41:42 vgs2 crmd: [14625]: info: do_te_invoke: Processing graph 160 > (ref=pe_calc-dc-1360762902-801) derived from /var/lib/pengine/pe-input-127.bz2 > Feb 13 07:41:42 vgs2 crmd: [14625]: info: run_graph: > ==================================================== > Feb 13 07:41:42 vgs2 crmd: [14625]: notice: run_graph: Transition 160 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pengine/pe-input-127.bz2): Complete > Feb 13 07:41:42 vgs2 crmd: [14625]: info: te_graph_trigger: Transition 160 is > now complete > Feb 13 07:41:42 vgs2 crmd: [14625]: info: notify_crmd: Transition 160 status: > done - <null> > Feb 13 07:41:42 vgs2 crmd: [14625]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Feb 13 07:41:42 vgs2 crmd: [14625]: info: do_state_transition: Starting > PEngine Recheck Timer > Feb 13 07:41:42 vgs2 pengine: [14624]: notice: process_pe_message: Transition > 160: PEngine Input stored in: /var/lib/pengine/pe-input-127.bz2 > > > > > Paul Archer, Linux System Administrator > [email protected] > 972-646-0137 cell > 1717 McKinney Ave, Suite 800 > Dallas, TX 75201 > www.topgolf.com > > ________________________________________ > From: [email protected] > [[email protected]] on behalf of Dejan Muhamedagic > [[email protected]] > Sent: Wednesday, February 13, 2013 1:04 AM > To: [email protected] > Subject: Re: [Linux-HA] VirtualDomain resources won't migrate > > Hi, > > On Wed, Feb 13, 2013 at 02:48:57AM +0000, Paul Archer wrote: > > Background: > > I have four nodes, two of which are running apache with a virtual IP, and > > two of which are running KVM VMs via libvirt (and backed by gluster, which > > is not managed by pacemaker). > > All four nodes are in the same cluster (for quorum). The apache and > > virtualIP piece works fine. But I am having trouble with the VM piece. > > I can manually migrate a VM from one host to another using libvirt. But I > > can't get a VM to failover via HA. Even bringing a node down entirely just > > results in the VM going offline. > > > > Here's my config (the vmhost1 & vmhost2 nodes are for theVMs, of course): > > > > node vgs1.austin9.topgolf.com \ > > attributes standby="off" > > node vgs2.austin9.topgolf.com \ > > attributes standby="off" > > node vmhost1.austin9.topgolf.com > > node vmhost2.austin9.topgolf.com > > primitive dns1.austin9 ocf:heartbeat:VirtualDomain \ > > params config="/etc/libvirt/qemu/dns1.xml" > > hypervisor="qemu:///system" migration_transport="ssh" \ > > meta allow-migrate="true" is-managed="true" target-role="started" \ > > op start interval="0" timeout="120s" \ > > op stop interval="0" timeout="120s" \ > > op monitor interval="10" timeout="30" depth="0" \ > > op migrate_from interval="0" timeout="90" \ > > op migrate_to interval="0" timeout="180" > > primitive vgs1-stonith-ipmi stonith:external/ipmi \ > > params hostname="vgs1.austin9.topgolf.com" ipaddr="10.9.1.91" > > userid="root" passwd="calvin" interface="lan" > > primitive vgs2-stonith-ipmi stonith:external/ipmi \ > > params hostname="vgs2.austin9.topgolf.com" ipaddr="10.9.1.92" > > userid="root" passwd="calvin" interface="lan" > > primitive vgsIP ocf:heartbeat:IPaddr2 \ > > params ip="10.9.1.10" cidr_netmask="32" \ > > op monitor interval="20s" \ > > meta target-role="Started" > > primitive vgsWebServer ocf:heartbeat:apache \ > > params configfile="/etc/apache2/apache2.conf" \ > > op monitor interval="40s" timeout="30s" \ > > meta target-role="started" > > primitive vmhost1-stonith-ipmi stonith:external/ipmi \ > > params hostname="vmhost1.austin9.topgolf.com" ipaddr="10.9.1.93" > > userid="root" passwd="calvin" interface="lan" > > primitive vmhost2-stonith-ipmi stonith:external/ipmi \ > > params hostname="vmhost2.austin9.topgolf.com" ipaddr="10.9.1.94" > > userid="root" passwd="calvin" interface="lan" > > location cli-prefer-dns1.austin9 dns1.austin9 \ > > rule $id="cli-prefer-rule-dns1.austin9" inf: #uname eq > > vmhost1.austin9.topgolf.com and date lt "2013-04-13 01:50:32Z" > > location prefer-dns1.austin9-vmhost1 dns1.austin9 +inf: > > vmhost1.austin9.topgolf.com > > location prefer-dns1.austin9-vmhost2 dns1.austin9 +inf: > > vmhost2.austin9.topgolf.com > > location prefer-vgs1-for-vgsWebServer vgsWebServer 50: > > vgs1.austin9.topgolf.com > > location prefer-vgs2-for-vgsWebServer vgsWebServer 50: > > vgs2.austin9.topgolf.com > > location reject-stonith-vgs1-on-vgs1 vgs1-stonith-ipmi -50: > > vgs1.austin9.topgolf.com > > location reject-stonith-vgs2-on-vgs2 vgs2-stonith-ipmi -50: > > vgs2.austin9.topgolf.com > > location reject-stonith-vmhost1-on-vmhost1 vmhost1-stonith-ipmi -50: > > vmhost1.austin9.topgolf.com > > location reject-stonith-vmhost2-on-vmhost2 vmhost2-stonith-ipmi -50: > > vmhost2.austin9.topgolf.com > > location reject-vgs1-for-dns1.austin9 dns1.austin9 -50: > > vgs1.austin9.topgolf.com > > location reject-vgs2-for-dns1.austin9 dns1.austin9 -50: > > vgs2.austin9.topgolf.com > > location reject-webserver-vmhost1 vgsWebServer -inf: > > vmhost1.austin9.topgolf.com > > location reject-webserver-vmhost2 vgsWebServer -inf: > > vmhost2.austin9.topgolf.com > > colocation vgsWebServer-on-vgsIP inf: vgsWebServer vgsIP > > order vgsWebServer-after-vgsIP inf: vgsIP vgsWebServer > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="4" \ > > stonith-enabled="true" \ > > last-lrm-refresh="1360706584" > > Hard to say without logs. BTW, this is an asymmetric cluster, > the configuration would probably look better if also configured > as such. > > Thanks, > > Dejan > > > > > Any suggestions? > > > > pma > > > > > > > > > > > > Paul Archer, Linux System Administrator > > [email protected] > > 972-646-0137 cell > > 1717 McKinney Ave, Suite 800 > > Dallas, TX 75201 > > www.topgolf.com<http://www.topgolf.com/> > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
