[Linux-HA] Antw: Re: Pacemaker - Resource dont get started on the standby node.
Hi! The problem seems to be this: Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: WARN: unpack_rsc_op: Processing failed op apache_start_0 on prod-hb-nmn-002: unknown error (1) Check why apache won't start. Regards, Ulrich Parkirat parkiratba...@gmail.com schrieb am 15.06.2013 um 22:24 in Nachricht 1371327853299-14687.p...@n3.nabble.com: Also adding the log while it tries to do the failover from master node to slave node: Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: notice: crmd_ha_status_callback: Status update: Node prod-hb-nmn-001 now has status [dead] (DC=true) Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: crm_update_peer_proc: prod-hb-nmn-001.ais is now offline Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: WARN: match_down_event: No match for shutdown action on 7910c4de-718d-45d7-b4da-24b3b65b9855 Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: te_update_diff: Stonith/shutdown of 7910c4de-718d-45d7-b4da-24b3b65b9855 not matched Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: abort_transition_graph: te_update_diff:191 - Triggered transition abort (complete=1, tag=node_state, id=7910c4de-718d-45d7-b4da-24b3b65b9855, magic=NA, cib=0.80.18) : Node failure Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: do_pe_invoke: Query 163: Requesting the current CIB: S_POLICY_ENGINE Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: do_pe_invoke_callback: Invoking the PE: query=163, ref=pe_calc-dc-1371327387-113, seq=5, quorate=1 Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: info: determine_online_status: Node prod-hb-nmn-002 is online Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: WARN: unpack_rsc_op: Processing failed op apache_start_0 on prod-hb-nmn-002: unknown error (1) Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: notice: native_print: apache#011(ocf::heartbeat:apache):#011Stopped Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: info: get_failcount: apache has failed INFINITY times on prod-hb-nmn-002 Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: WARN: common_apply_stickiness: Forcing apache away from prod-hb-nmn-002 after 100 failures (max=100) Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: info: native_color: Resource apache cannot run anywhere Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: notice: LogActions: Leave resource apache#011(Stopped) Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: do_state_transition: State transition S_POLICY_ENGINE - S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: unpack_graph: Unpacked transition 48: 0 actions in 0 synapses Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: do_te_invoke: Processing graph 48 (ref=pe_calc-dc-1371327387-113) derived from /var/lib/pengine/pe-input-452.bz2 Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: run_graph: Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: notice: run_graph: Transition 48 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-452.bz2): Complete Jun 15 20:16:27 prod-hb-nmn-002 pengine: [4401]: info: process_pe_message: Transition 48: PEngine Input stored in: /var/lib/pengine/pe-input-452.bz2 Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: te_graph_trigger: Transition 48 is now complete Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: notify_crmd: Transition 48 status: done - null Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: do_state_transition: State transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jun 15 20:16:27 prod-hb-nmn-002 crmd: [4395]: info: do_state_transition: Starting PEngine Recheck Timer Jun 15 20:17:17 prod-hb-nmn-002 cibadmin: [5427]: info: Invoked: cibadmin -Ql Jun 15 20:17:17 prod-hb-nmn-002 cibadmin: [5428]: info: Invoked: cibadmin -Ql Jun 15 20:17:17 prod-hb-nmn-002 crm_shadow: [5437]: info: Invoked: crm_shadow -c __crmshell.5404 Jun 15 20:17:17 prod-hb-nmn-002 cibadmin: [5438]: info: Invoked: cibadmin -p -R -o crm_config Jun 15 20:17:17 prod-hb-nmn-002 crm_shadow: [5440]: info: Invoked: crm_shadow -C __crmshell.5404 --force Jun 15 20:17:17 prod-hb-nmn-002 cib: [4391]: info: cib_process_request: Operation complete: op cib_replace for section 'all'
Re: [Linux-HA] Antw: ocf HA_RSCTMP directory location
David Vossel dvos...@redhat.com schrieb am 14.06.2013 um 16:21 in Nachricht 206418282.11638415.1371219712940.javamail.r...@redhat.com: [...] I think RAs should not rely on the fact that temp directories are clean when a resource is going to be started. The resource tmp directory has to get cleaned out on startup, if it doesn't I don't think there is a good solution for resource agents to detect a stale pid file from one that is current. Nearly all the agents depend on this tmp directory to get reinitialized. If we decided not to depend on this logic, every agent would have to be altered to account for this. This would mean adding a layer of complexity to the agents that should otherwise be unnecessary. [...] But you only have to do it right once (in a procedure/function): If the PID file exists then if the PID file is newer than the time of reboot then if there is a process with this pid if the process having that pid matches a given pattern then the process is alive else another process has this PID; remove stale PID file fi else remove stale pid file fi else remove stale pid file fi fi Everything else ist just wrong IMHO. Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Pacemaker - Resource dont get started on the standby node.
Thanks Ulrich, I have figured out the problem. The actual problem was in the configuration file for the resource httpd. It was correct in the Master node but the configuration was missing in the standby node, which was not allowing it to start. Regards, Parkirat Singh Bagga. -- View this message in context: http://linux-ha.996297.n3.nabble.com/Pacemaker-Resource-dont-get-started-on-the-standby-node-tp14686p14695.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Pacemaker - Resource dont get started on the standby node.
Hello Parkirat Thank you very much 2013/6/17 Parkirat parkiratba...@gmail.com Thanks Ulrich, I have figured out the problem. The actual problem was in the configuration file for the resource httpd. It was correct in the Master node but the configuration was missing in the standby node, which was not allowing it to start. Regards, Parkirat Singh Bagga. -- View this message in context: http://linux-ha.996297.n3.nabble.com/Pacemaker-Resource-dont-get-started-on-the-standby-node-tp14686p14695.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Heartbeat haresources with IPv6
Hi, I'm using Ubuntu 12.04 + Heartbeat 3.0.5-3ubuntu2 to provide high availability for some IP addresses. I want to configure an IPv6 address on my haresources. I did this: File /etc/heartbeat/haresources: server.domain.com \ nbsp;nbsp;nbsp; 192.168.2.62/32/eth1 \ nbsp;nbsp;nbsp; 192.168.2.64/32/eth1 \ nbsp;nbsp;nbsp; 192.168.2.72/32/eth1 \ nbsp;nbsp;nbsp; IPv6addr::2001:db8:38a5:8::2006/48/eth1 \ nbsp;nbsp;nbsp; MailTo::a...@domain.com The IPv4 addresses work fine, but I'm not getting success with the IPv6 address. My logs shows this message: ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 2001:db8:38a5:8 2006/48/eth1 start ResourceManager[22129]: CRIT: Giving up resources due to failure of IPv6addr::2001:db8:38a5:8::2006/48/eth1 ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 2001:db8:38a5:8 2006/48/eth1 stop ResourceManager[22129]: info: Retrying failed stop operation [IPv6addr::2001:db8:38a5:8::2006/48/eth1] Apparently there is a conflict between the characters '::' inside the IPv6 address and the separator '::' used in the haresources. But I would not like have to expand the IPv6 address. Does anyone know a way to avoid this conflict? Thanks! -- Thiago Henrique www.adminlinux.com.br ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat haresources with IPv6
Ho Thiago, Heartbeat is deprecated and has not been developed in some time. There are no plans to restart development, either. It is _strongly_ advised that new setups use corosync + pacemaker. You can use the IPv6 resource agents with it, too. The best place to look is on clusterlabs.org's Cluster from Scratch tutorial. It covers as the first example setting up an (IPv4) virtual IP address. It should be easy to adapt that to your IPv6 implementation. You will see two versions; One for crmsh and one for pcs. I would recommend the crmsh version for Ubuntu. Cheers On 06/17/2013 11:35 AM, lis...@adminlinux.com.br wrote: Hi, I'm using Ubuntu 12.04 + Heartbeat 3.0.5-3ubuntu2 to provide high availability for some IP addresses. I want to configure an IPv6 address on my haresources. I did this: File /etc/heartbeat/haresources: server.domain.com \ nbsp;nbsp;nbsp; 192.168.2.62/32/eth1 \ nbsp;nbsp;nbsp; 192.168.2.64/32/eth1 \ nbsp;nbsp;nbsp; 192.168.2.72/32/eth1 \ nbsp;nbsp;nbsp; IPv6addr::2001:db8:38a5:8::2006/48/eth1 \ nbsp;nbsp;nbsp; MailTo::a...@domain.com The IPv4 addresses work fine, but I'm not getting success with the IPv6 address. My logs shows this message: ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 2001:db8:38a5:8 2006/48/eth1 start ResourceManager[22129]: CRIT: Giving up resources due to failure of IPv6addr::2001:db8:38a5:8::2006/48/eth1 ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 2001:db8:38a5:8 2006/48/eth1 stop ResourceManager[22129]: info: Retrying failed stop operation [IPv6addr::2001:db8:38a5:8::2006/48/eth1] Apparently there is a conflict between the characters '::' inside the IPv6 address and the separator '::' used in the haresources. But I would not like have to expand the IPv6 address. Does anyone know a way to avoid this conflict? Thanks! -- Thiago Henrique www.adminlinux.com.br ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Resource Collocation v/s Resource Groups
Hi, Is there any difference between Resource Collocation and Resource Groups? I grouped 2 resources both having migration_threshold=2 and monitor_interval=60s. When, I stopped 1 of the resource from the group, I did not restarted. However, when I was configuring the resource not in the group, the resource started on manually stopping it. Also is there any way to order the sequence of the resource in a group? Regards, Parkirat Singh Bagga. -- View this message in context: http://linux-ha.996297.n3.nabble.com/Resource-Collocation-v-s-Resource-Groups-tp14699.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Why cman is started at rc0.d and rc6.d
Hi All, I am very new to pacemaker, cororsync and cman. I installed the packages on an Ubuntu machine. (aptitude install pacemaker cman fence-agents) To my surprise, cman has a link under rc0.d and rc6.d, why cman need to be started while system is shuting down? root@SuTH3:/etc# ls -l /etc/rc0.d/S05cman /etc/rc6.d/S05cman lrwxrwxrwx 1 root root 14 May 18 23:55 /etc/rc0.d/S05cman - ../init.d/cman lrwxrwxrwx 1 root root 14 May 18 23:55 /etc/rc6.d/S05cman - ../init.d/cman Thanks, Su ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Why cman is started at rc0.d and rc6.d
And another thing is it never get killed. Should it be stopped when system is halting? root@SuTH3:/etc# ls /etc/rc* | grep cman S05cman S05cman S61cman Thanks, Su From: Su Chen Sent: Monday, June 17, 2013 11:17 AM To: 'General Linux-HA mailing list' Subject: Why cman is started at rc0.d and rc6.d Hi All, I am very new to pacemaker, cororsync and cman. I installed the packages on an Ubuntu machine. (aptitude install pacemaker cman fence-agents) To my surprise, cman has a link under rc0.d and rc6.d, why cman need to be started while system is shuting down? root@SuTH3:/etc# ls -l /etc/rc0.d/S05cman /etc/rc6.d/S05cman lrwxrwxrwx 1 root root 14 May 18 23:55 /etc/rc0.d/S05cman - ../init.d/cman lrwxrwxrwx 1 root root 14 May 18 23:55 /etc/rc6.d/S05cman - ../init.d/cman Thanks, Su ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resource Collocation v/s Resource Groups
Hi Pakirat, Is there any difference between Resource Collocation and Resource Groups? Resources inside a resource group are colocated _and_ ordered. See http://clusterlabs.org/doc/Ordering_Explained.pdf for more details. I grouped 2 resources both having migration_threshold=2 and monitor_interval=60s. When, I stopped 1 of the resource from the group, I did not restarted. However, when I was configuring the resource not in the group, the resource started on manually stopping it. Also is there any way to order the sequence of the resource in a group? Best regards, Sven ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resource Collocation v/s Resource Groups
Hi Sven, Thanks for the reply. I am now collocating resources but the same problem of the resource not getting started when manually stopped persists when my migration_threshold=2, and its the 1st time, I have brought down the resource after doing the cleanup as well as waiting for the failure-timeout on that node. Note: It behaves properly when my resources are not collocated or grouped. Regards, Parkirat Singh Bagga. -- View this message in context: http://linux-ha.996297.n3.nabble.com/Resource-Collocation-v-s-Resource-Groups-tp14699p14703.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] LVM Resource agent, exclusive activation
- Original Message - From: David Vossel dvos...@redhat.com To: General Linux-HA mailing list linux-ha@lists.linux-ha.org Sent: Tuesday, June 4, 2013 4:41:06 PM Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation - Original Message - From: David Vossel dvos...@redhat.com To: General Linux-HA mailing list linux-ha@lists.linux-ha.org Sent: Monday, June 3, 2013 10:50:01 AM Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation - Original Message - From: Lars Ellenberg lars.ellenb...@linbit.com To: linux-ha@lists.linux-ha.org Sent: Tuesday, May 21, 2013 5:58:05 PM Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation On Tue, May 21, 2013 at 05:52:39PM -0400, David Vossel wrote: - Original Message - From: Lars Ellenberg lars.ellenb...@linbit.com To: Brassow Jonathan jbras...@redhat.com Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org, Lars Marowsky-Bree l...@suse.com, Fabio M. Di Nitto fdini...@redhat.com Sent: Monday, May 20, 2013 3:50:49 PM Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation On Fri, May 17, 2013 at 02:00:48PM -0500, Brassow Jonathan wrote: On May 17, 2013, at 10:14 AM, Lars Ellenberg wrote: On Thu, May 16, 2013 at 10:42:30AM -0400, David Vossel wrote: The use of 'auto_activation_volume_list' depends on updates to the LVM initscripts - ensuring that they use '-aay' in order to activate logical volumes. That has been checked in upstream. I'm sure it will go into RHEL7 and I think (but would need to check on) RHEL6. Only that this is upstream here, so it better work with debian oldstale, gentoo or archlinux as well ;-) Would this be good enough: vgchange --addtag pacemaker $VG and NOT mention the pacemaker tag anywhere in lvm.conf ... then, in the agent start action, vgchange -ay --config tags { pacemaker {} } $VG (or have the to be used tag as an additional parameter) No retagging necessary. How far back do the lvm tools understand the --config ... option? --config option goes back years and years - not sure of the exact date, but could probably tell with 'git bisect' if you wanted me to. The above would not quite be sufficient. You would still have to change the 'volume_list' field in lvm.conf (and update the initrd). You have to do that anyways if you want to make use of tags in this way? What you are proposing would simplify things in that you would not need different 'volume_list's on each machine - you could copy configs between machines. I thought volume_list = [ ... , @* ] in lvm.conf, assuming that works on all relevant distributions as well, and a command line --config tag would also propagate into that @*. It did so for me. But yes, vlumen_list = [ ... , pacemaker ] would be fine as well. wait, did we just go around in a circle. If we add pacemaker to the volume list, and use that in every cluster node's config, then we've by-passed the exclusive activation part have we not?! No. I suggested to NOT set that pacemaker tag in the config (lvm.conf), but only ever explicitly set that tag from the command line as used from the resource agent ( --config tags { pacemaker {} } ) That would also mean to either override volume_list with the same command line, or to have the tag mentioned in the volume_list in lvm.conf (but not set it in the tags {} section). Also, we're not happy with the auto_activate list because it won't work with old distros?! It's a new feature, why do we have to work with old distros that don't support it? You are right, we only have to make sure we don't break existing setup by rolling out a new version of the RA. So if the resource agent won't accidentally use a code path where support of a new feature (of LVM) would be required, that's good enough compatibility. Still it won't hurt to pick the most compatible implementation of several possible equivalent ones (RA-feature wise). I think the proposed --config tags { pacemaker {} } is simpler (no retagging, no re-writing of lvm meta data), and will work for any setup that knows about tags. I've had a good talk with Jonathan about the --config tags { pacemaker {} } approach. This was originally complicated for us because we were using the --config option for a device filter during activation in certain situations... using the --config option twice caused problems which made adding the tag in the config difficult. We've