[Linux-HA] corosync crashes after firing crm configuration command on any one node
Hi, I am facing weird issue in the corosync behavior. I have configured a two node cluster. The cluster is working fine the crm_mon command is showing proper output. The command cibadmin -Q also working on both the nodes properly. The issue starts when I put any crm configuration command. As I put crm configuration command, I can see the following output:- [root@AAA02 corosync]# crm configure property no-quorum-policy=ignore Could not connect to the CIB: Remote node did not respond ERROR: creating tmp shadow __crmshell.12274 failed [root@AAA02 corosync]# At the same time, the logs in the /var/log/messages says that:- Sep 28 13:38:40 localhost cibadmin: [12295]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 localhost cibadmin: [12296]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 localhost crm_shadow: [12298]: info: Invoked: crm_shadow -c __crmshell.12274 I have attached a file which has cib.xml corosync.conf file contents on both the nodes . Please guide me to troubleshoot this error. Thanks in advance. Thanks, Amit This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. cib.xml file on node-1:- cib epoch=7 num_updates=0 admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0.1 have-quorum=1 dc-uuid=AAA01 cib-last-written=Wed Sep 28 13:36:11 2011 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87/ nvpair id=cib-bootstrap-options-cluster-infrastructure name=cluster-infrastructure value=openais/ nvpair id=cib-bootstrap-options-expected-quorum-votes name=expected-quorum-votes value=2/ /cluster_property_set /crm_config nodes node id=AAA01 uname=AAA01 type=normal/ node id=AAA02 uname=AAA02 type=normal/ /nodes resources/ constraints/ /configuration /cib == cib.cml file on node-2:- cib validate-with=pacemaker-1.0 crm_feature_set=3.0.1 have-quorum=1 dc-uuid=AAA01 admin_epoch=0 epoch=7 num_updates=0 cib-last-written=Wed Sep 28 13:36:11 2011 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87/ nvpair id=cib-bootstrap-options-cluster-infrastructure name=cluster-infrastructure value=openais/ nvpair id=cib-bootstrap-options-expected-quorum-votes name=expected-quorum-votes value=2/ /cluster_property_set /crm_config nodes node id=AAA01 uname=AAA01 type=normal/ node id=AAA02 uname=AAA02 type=normal/ /nodes resources/ constraints/ /configuration /cib = aisexec { user: root group: root } corosync { user: root group: root } amf { mode: disabled } logging { to_stderr: yes debug: off timestamp: on to_file: no to_syslog: yes syslog_facility: daemon } totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 4000 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: on threads: 0 # nodeid: 1234 rrp_mode: active fail_recv_const: 5000 interface { ringnumber: 0 bindnetaddr: 172.25.0.0 mcastaddr: 227.95.1.1 mcastport: 5404 } } ==Sep 28 13:35:13 localhost corosync[12726]: [pcmk ] info: update_member: 0x153fa980 Node 184555948 now known as AAA02 (was: (null)) Sep 28 13:35:13 localhost cib: [12733]: notice: ais_dispatch: Membership 3388: quorum acquired Sep 28 13:35:13 localhost corosync[12726]: [pcmk ] info: update_member: Node AAA02 now has process list: 00013312 (78610) Sep 28 13:35:13 localhost cib: [12733]: info: crm_get_peer: Node 184555948 is now known as AAA02 Sep 28 13:35:13 localhost corosync[12726]: [pcmk ] info: update_member: Node AAA02 now has 1 quorum votes (was 0) Sep 28 13:35:13 localhost cib: [12733]: info: crm_update_peer: Node AAA02: id=184555948 state=member addr=r(0) ip(172.25.0.11) votes=1 (new) born=3388 seen=3388 proc=00013312 (new) Sep 28 13:35:13 localhost corosync[12726]: [pcmk ] info:
Re: [Linux-HA] cluster-glue make error
Nothing is missing there, it seems. You can try the command with --enable-fatal-warnings=no. --Amit -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Nikita Michalko Sent: 17 June 2011 19:30 To: pacema...@oss.clusterlabs.org; Linux-HA@lists.linux-ha.org Subject: [Linux-HA] cluster-glue make error Hi all, I've downloaded the last tarball from http://hg.linux- ha.org/glue/archive/tip.tar.bz2, configured with: ./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc --with- heartbeat --with-stonith --with-pacemaker --with-daemon-user=$CLUSTER_USER -- with-daemon-group=$CLUSTER_GROUP and now by make I've got the following error: ... snip ... libtool: link: ( cd .libs rm -f libstonith.la ln -s ../libstonith.la libstonith.la ) gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../include -I../../include - I../../include -I../../linux-ha -I../../linux-ha -I../../libltdl - I../../libltdl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include - I/usr/include/libxml2 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align - Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat- security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing- declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno- strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi - D_GNU_SOURCE -DANSI_ONLY -Werror -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c cc1: warnings being treated as errors main.c:408: Fehler: kein vorheriger Prototyp für »setup_cl_log« gmake[2]: *** [main.o] Fehler 1 gmake[2]: Leaving directory `/root/neueRPMs/ha/sources/Reusable-Cluster- Components-glue--0ff4e044f1be/lib/stonith' gmake[1]: *** [all-recursive] Fehler 1 gmake[1]: Leaving directory `/root/neueRPMs/ha/sources/Reusable-Cluster- Components-glue--0ff4e044f1be/lib' make: *** [all-recursive] Fehler 1 OS: SLES11/SP1 cluster-glue version: 1.0.7 (Build: 0ff4e044f1be0138e8273a98c9fbee95b643bcf7) What I'm missing? TIA! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] need help on email alerts
Hi, I have configured email alerts for corosync as follows :- Crm configure show ---SNIP- primitive resMON ocf:pacemaker:ClusterMon \ operations $id=resMON-operations \ op monitor interval=180 timeout=20 \ params extra_options=--mail-to x...@gmail.com ---SNIP--- I can see this resource is started :- crm_mon -1 ---SNIP resMON (ocf::pacemaker:ClusterMon):Started xx ---SNIP I can send mail from my machine :- [root@localhost] mail -s testmail xx . Cc: Null message body; hope that's ok I cannot get mails any mails if my cluster status changes. I could not see anything in the /var/log/maillog also. Is there any hint if I am missing out any configuration. Thanks, Amit This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Urgent help required ... Corosync not getting started ... !!!
The Selinux is disabled. I am launching corosync with command /usr/etc/init.d/corosync start How to strace lauch of corosync ? Also, how to check the UID? -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Eric Warnke Sent: 02 June 2011 19:16 To: General Linux-HA mailing list Subject: Re: [Linux-HA] Urgent help required ... Corosync not getting started ... !!! Selinux is disabled? 'getenforce' returns permissive/disabled? How are you launching corosync, init script or direct? What UID are you launching corosync as? Does 'strace'-ing the launch of corosync reveal anything? Eric On 6/1/11 11:32 PM, Amit Jathar amit.jat...@alepo.com wrote: Hi, I have modified my corosync.conf file as follows :- logging { fileline: off to_syslog: yes to_stderr: no syslog_facility: daemon debug: on timestamp: on to_logfile: yes logfile: /var/log/corosync.log } The corosync fails at startup the file /var/log/corosync.log is not getting created... :( Thanks, Amit -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Mike Caldwell Sent: 02 June 2011 02:42 To: General Linux-HA mailing list Subject: Re: [Linux-HA] Urgent help required ... Corosync not getting started ... !!! I am not able to troubleshoot the issue after chasing it for more than a day . No hint, as no logs present in /var/log/messages/ ... :( Any help is appreciable. Let me know, if you need more information. Thanks, Amit I've had more luck with logging set up with to_logfile: yes logfile: /var/log/corosync.log ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Urgent help required ... Corosync not getting started ... !!!
Hi, I am not able to start the corosync service. I am using corosync for around 6 months. I just need install on one more machine it is not getting started. Machine is RHEL5.4 64 bit. I have installed corosync from sources. The version of corosync I am using is :- Corosync Cluster Engine, version '1.2.8' SVN revision '3059 I tried different pacemaker versions :- 1.0.10 1.0.11 I get the message :- Starting Corosync Cluster Engine (corosync): [FAILED] I have created the corosync.conf file (attached to this mail) in /usr/etc/corosync/ pcmk file in /usr/etc/corosync/service.d/ I am not able to troubleshoot the issue after chasing it for more than a day . No hint, as no logs present in /var/log/messages/ ... :( Any help is appreciable. Let me know, if you need more information. Thanks, Amit This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. corosync.conf Description: corosync.conf ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Urgent help required ... Corosync not getting started ... !!!
Hi, I have modified my corosync.conf file as follows :- logging { fileline: off to_syslog: yes to_stderr: no syslog_facility: daemon debug: on timestamp: on to_logfile: yes logfile: /var/log/corosync.log } The corosync fails at startup the file /var/log/corosync.log is not getting created... :( Thanks, Amit -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Mike Caldwell Sent: 02 June 2011 02:42 To: General Linux-HA mailing list Subject: Re: [Linux-HA] Urgent help required ... Corosync not getting started ... !!! I am not able to troubleshoot the issue after chasing it for more than a day . No hint, as no logs present in /var/log/messages/ ... :( Any help is appreciable. Let me know, if you need more information. Thanks, Amit I've had more luck with logging set up with to_logfile: yes logfile: /var/log/corosync.log ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(
Have you generated the authkey by corosync-keygen command on one node then copied that file to other node ? -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of mike Sent: Tuesday, April 26, 2011 5:41 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] [Heartbeat] my VIP doesn't work :( On 11-04-22 06:25 AM, SEILLIER Mathieu wrote: Hi all, First I'm french so sorry in advance for my English... I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers under Linux RedHat 5.3. The first server is active, the other one is passive. The master is called servappli01, with IP address 186.20.100.40, the slave is called servappli02, with IP address 186.20.100.39. I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when server is started, this is Heartbeat which starts Tomcat when it's running. My problem is : When heartbeat is started on the first server, then on the second server, the VIP is assigned to the 2 servers ! also, Tomcat is started on each server, and each node see the other node as dead ! Here is my configuration : ha.cf file (the same on each server) : logfile /var/log/ha-log debugfile /var/log/ha-debug logfacility none keepalive 2 warntime 6 deadtime 10 initdead 90 bcast eth0 node servappli01 servappli02 auto_failback yes respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster haresources file (the same on each server) : servappli01 IPaddr::186.20.100.41/24/eth0 tomcat Result of ifconfig command on the first server (servappli01) : eth0 Link encap:Ethernet HWaddr 00:1E:0B:BB:C2:38 inet adr:186.20.100.40 Bcast:186.20.100.255 Masque:255.255.255.0 adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0 TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:385833 (3.5 GiB) TX bytes:2694953468 (2.5 GiB) Interruption:177 Memoire:fa00-fa012100 eth0:0Link encap:Ethernet HWaddr 00:1E:0B:BB:C2:38 inet adr:186.20.100.41 Bcast:186.20.100.255 Masque:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interruption:177 Memoire:fa00-fa012100 Result of ifconfig command on the second server (servappli02) at the same time : eth0 Link encap:Ethernet HWaddr 00:1E:0B:77:C9:0C inet adr:186.20.100.39 Bcast:186.20.100.255 Masque:255.255.255.0 adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0 TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:2620027933 (2.4 GiB) TX bytes:3595896739 (3.3 GiB) Interruption:177 Memoire:fa00-fa012100 eth0:0Link encap:Ethernet HWaddr 00:1E:0B:77:C9:0C inet adr:186.20.100.41 Bcast:186.20.100.255 Masque:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interruption:177 Memoire:fa00-fa012100 Result of /usr/bin/cl_status listnodes command (on each server) : servappli02 servappli01 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 : active Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 : dead Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 : dead Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 : active And of course, if I kill Tomcat on master server, there's no switch to the second server (a call to a webapp using the VIP doesn't work). Can somebody help me please ? I guess there's is something wrong but I don't know what ! Thanx Mathieu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems It almost sounds like the nodes are unaware of each other. Could be a network thing maybe. Here's some things to try: Can you ssh or ping one node from the other? Bring up one node with the VIP running - leave the other node up but heartbeat down. an you ping the VIP from the node NOT running HA? What happens when you look at the cluster when both nodes are running - use the crm_mon command and paste what you see in here. I'm thinking you have some sort of network issue. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
[Linux-HA] Pacemaker installation errors
Hi , I tried to install pacemaker. While installing 'Resource Agents', I run make command and got attached errors. I tried twice (did make clean also) and on both occasions, error was bit different (as attached). The steps I was performing was :- wget -O resource-agents.tar.bz2 http://hg.linux-ha.org/agents/archive/tip.tar.bz2 tar jxvf resource-agents.tar.bz2 cd Cluster-Resource-Agents-* ./autogen.sh ./configure --prefix=$PREFIX make sudo make install I am using CenOS 5.6 64-bit . Or can I use the Pacemaker with this erred source ? Thanks, Amit This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. Note: Writing ocf_heartbeat_ClusterMon.7 OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/CTDB meta-data metadata-CTDB.xml /usr/bin/xsltproc --novalid \ --stringparam package resource-agents \ --stringparam version 1.0.4 \ --output ocf_heartbeat_CTDB.xml \ ra2refentry.xsl metadata-CTDB.xml /usr/bin/xsltproc \ --xinclude \ http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl ocf_heartbeat_CTDB.xml Note: Writing ocf_heartbeat_CTDB.7 OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/Delay meta-data metadata-Delay.xml /usr/bin/xsltproc --novalid \ --stringparam package resource-agents \ --stringparam version 1.0.4 \ --output ocf_heartbeat_Delay.xml \ ra2refentry.xsl metadata-Delay.xml /usr/bin/xsltproc \ --xinclude \ http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl ocf_heartbeat_Delay.xml Note: Writing ocf_heartbeat_Delay.7 OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/Dummy meta-data metadata-Dummy.xml /usr/bin/xsltproc --novalid \ --stringparam package resource-agents \ --stringparam version 1.0.4 \ --output ocf_heartbeat_Dummy.xml \ ra2refentry.xsl metadata-Dummy.xml /usr/bin/xsltproc \ --xinclude \ http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl ocf_heartbeat_Dummy.xml http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: or ' expected ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : xmlParseEntityDecl: entity list.class not terminated ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : XML conditional section not closed ^ unable to parse ocf_heartbeat_Dummy.xml gmake[1]: *** [ocf_heartbeat_Dummy.7] Error 6 rm metadata-CTDB.xml metadata-Delay.xml metadata-Dummy.xml metadata-ClusterMon.xml metadata-AudibleAlarm.xml gmake[1]: Leaving directory `/usr/local/src/Cluster-Resource-Agents-7a11934b142d/doc' make: *** [all-recursive] Error 1 gmake[1]: Entering directory `/usr/local/src/Cluster-Resource-Agents-7a11934b142d/doc' OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/AoEtarget meta-data metadata-AoEtarget.xml /usr/bin/xsltproc --novalid \ --stringparam package resource-agents \ --stringparam version 1.0.4 \ --output ocf_heartbeat_AoEtarget.xml \ ra2refentry.xsl metadata-AoEtarget.xml /usr/bin/xsltproc \ --xinclude \ http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl ocf_heartbeat_AoEtarget.xml http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^ http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : EntityValue: '%' forbidden except for entities references ^
[Linux-HA] Need help on the femce_vmware configuration ... facing errors ... !!
Hi, I am facing following issues:- I have installed VMware-VIPerl-1.6.0-104313.x86_64.tar.gz VMware-vSphere-Perl-SDK-4.1.0-254719.x86_64.tar.gz on the RHEL6 image. My ESXi server hostname is esx5 1) If I use wrong password, I get error message as expected :- [root@OEL6_VIP_1 ~]# /usr/sbin/fence_vmware -o reboot -a x.x.x.x -l 'root' -p 'wrong_pass' -n esx5 fence_vmware_helper returned Cannot connect to server! VMware error:Cannot complete login due to an incorrect user name or password. Please use '-h' for usage 2) If I use right password, then I get error message like:- [root@OEL6_VIP_1 ~]# /usr/sbin/fence_vmware -o reboot -a 172.16.150.5 -l 'root' -p 'right_pass' -n esx5 fence_vmware_helper returned Cannot find vm esx5! I can ping that node esx5 :- PING esx5.localdomain (x.x.x.x) 56(84) bytes of data. 64 bytes from esx5.localdomain (x.x.x.x): icmp_seq=1 ttl=64 time=0.064 ms What might be going wrong, so that my fence_vmware script is not able to find esx5? Thanks, Amit -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Amit Jathar Sent: Wednesday, March 30, 2011 10:08 PM To: General Linux-HA mailing list Subject: [Linux-HA] Need help on the femce_vmware configuration Hi, I am having two RHEL6 vmware images running on Windows machines. I have configured STONITH on those RHEL6 images. I have installed the VI API on those machines. I get error as cound not connect http://x.x.x.x/SDK/webService; when I manually try /usr/sbin/fence_vmware -o reboot -a x.x.x.x -l xxx -p xxx -n xxx My question is :- can I use this fence_vmware script on the RHEL6 vmware images running on Windows machines ? or I must use the RHEL6 vmware images running on ESXi server ? Thanks, Amit This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Need help on the femce_vmware configuration
Hi, I am having two RHEL6 vmware images running on Windows machines. I have configured STONITH on those RHEL6 images. I have installed the VI API on those machines. I get error as cound not connect http://x.x.x.x/SDK/webService; when I manually try /usr/sbin/fence_vmware -o reboot -a x.x.x.x -l xxx -p xxx -n xxx My question is :- can I use this fence_vmware script on the RHEL6 vmware images running on Windows machines ? or I must use the RHEL6 vmware images running on ESXi server ? Thanks, Amit This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] need help to configure the fence_ifmib for stonith
Hi, I would like to try the fence_ifmib as the fencing agent. I can see it is present in my machine. [root@OEL6_VIP_1 fence]# ls /usr/sbin/fence_ifmib /usr/sbin/fence_ifmib Also, I can see some python scripts present on my machine :- [root@OEL6_VIP_1 fence]# pwd /usr/share/fence [root@OEL6_VIP_1 fence]# ls fencing.py fencing.pyc fencing.pyo fencing_snmp.py fencing_snmp.pyc fencing_snmp.pyo [root@OEL6_VIP_1 fence]# Is there any chance I can configure the if_mib as the stonith agent. If yes, then which MIB files shall I need ? Thanks, Amit This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] failover issues faced with pacemaker/corosync
Hi, I am trying to use Pacemaker with corosync facing following issues. I want to know whether these are due to misconfiguration or these are known issues. I have two nodes in the cluster :- VIP-1 VIP-2 The corosync version is :- Corosync Cluster Engine, version '1.2.7' SVN revision '3008' == The crm_mon output is :- Last updated: Thu Feb 24 17:44:33 2011 Stack: openais Current DC: VIP-1 - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 3 Resources configured. Online: [ VIP-1 VIP-2 ] ClusterIP (ocf::heartbeat:IPaddr2): Started VIP-1 WebSite (ocf::heartbeat:apache):Started VIP-1 My_Tomcat (ocf::heartbeat:tomcat):Started VIP- == My configuration is :- [root@VIP-1 local]# crm configure show node VIP-1 node VIP-2 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip=172.16.201.23 cidr_netmask=32 \ op monitor interval=5s primitive My_Tomcat ocf:heartbeat:tomcat \ params catalina_home=/root/Softwares/apache-tomcat-6.0.26 java_home=/root/Softwares/Java/linux/jdk1.6.0_21 \ op monitor interval=5s primitive WebSite ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.conf \ op monitor interval=5s property $id=cib-bootstrap-options \ dc-version=1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=false \ no-quorum-policy=ignore \ last-lrm-refresh=1298547656 rsc_defaults $id=rsc-options \ resource-stickiness=2 === Issue -1) I observed that If any service is manually shutdown on the VIP-1, then the corosync restarts it on the same node. In the logs, I can see this :- = Feb 24 18:14:32 VIP-1 pengine: [28098]: info: get_failcount: My_Tomcat has failed 35 times on VIP-1 Feb 24 18:14:32 VIP-1 pengine: [28098]: notice: common_apply_stickiness: My_Tomcat can fail 65 more times on VIP-1 before being forced off == I have not configured to restart the service for INFINITY times on VIP-1, so is this default behavior? Is there any configuration to tell the corosync to restart the service only for two times on VIP-1 if not started, then start it on VIP-2 ? Issue -2) I have changed the error codes in the Apache Tomcat RA scripts, returned the error code=2 if the monitor fails. Now, if I manually stop the service, then it is not restarted on the VIP-1 but it is started on VIP-2. The fail count of that service on VIP-1 is showing as 1. Now, if I make the service manually down on the VIP-2, then it is not getting started on the VIP-1 untill I clean up the resource. So, is this known behavior or I have missed any configuration? Let me know if you need more information. Thanks, Amit This email (message and any attachment) is confidential and may be privileged. If you are not certain that you are the intended recipient, please notify the sender immediately by replying to this message, and delete all copies of this message and attachments. Any other use of this email by you is prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems