[Linux-HA] corosync crashes after firing crm configuration command on any one node

2011-09-28 Thread Amit Jathar
Hi,

I am facing weird issue in the corosync behavior.

I have configured a two node cluster.
The cluster is working fine  the crm_mon command is showing proper output.
The command cibadmin -Q also working on both the nodes properly.

The issue starts when I put any crm configuration command.

As I put crm configuration command, I can see the following output:-
[root@AAA02 corosync]# crm configure property no-quorum-policy=ignore Could not 
connect to the CIB: Remote node did not respond
ERROR: creating tmp shadow __crmshell.12274 failed
[root@AAA02 corosync]#


At the same time, the logs in the /var/log/messages says that:- Sep 28 13:38:40 
localhost cibadmin: [12295]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 
localhost cibadmin: [12296]: info: Invoked: cibadmin -Ql Sep 28 13:38:40 
localhost crm_shadow: [12298]: info: Invoked: crm_shadow -c __crmshell.12274

I have attached a file which has cib.xml  corosync.conf file contents on both 
the nodes .

Please guide me to troubleshoot this error.
Thanks in advance.

Thanks,
Amit



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.





cib.xml file on node-1:-

cib epoch=7 num_updates=0 admin_epoch=0 validate-with=pacemaker-1.0 
crm_feature_set=3.0.1 have-quorum=1 dc-uuid=AAA01 cib-last-written=Wed 
Sep 28 13:36:11 2011
  configuration
crm_config
  cluster_property_set id=cib-bootstrap-options
nvpair id=cib-bootstrap-options-dc-version name=dc-version 
value=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87/
nvpair id=cib-bootstrap-options-cluster-infrastructure 
name=cluster-infrastructure value=openais/
nvpair id=cib-bootstrap-options-expected-quorum-votes 
name=expected-quorum-votes value=2/
  /cluster_property_set
/crm_config
nodes
  node id=AAA01 uname=AAA01 type=normal/
  node id=AAA02 uname=AAA02 type=normal/
/nodes
resources/
constraints/
  /configuration
/cib

==

cib.cml file on node-2:-

cib validate-with=pacemaker-1.0 crm_feature_set=3.0.1 have-quorum=1 
dc-uuid=AAA01 admin_epoch=0 epoch=7 num_updates=0 cib-last-written=Wed 
Sep 28 13:36:11 2011
  configuration
crm_config
  cluster_property_set id=cib-bootstrap-options
nvpair id=cib-bootstrap-options-dc-version name=dc-version 
value=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87/
nvpair id=cib-bootstrap-options-cluster-infrastructure 
name=cluster-infrastructure value=openais/
nvpair id=cib-bootstrap-options-expected-quorum-votes 
name=expected-quorum-votes value=2/
  /cluster_property_set
/crm_config
nodes
  node id=AAA01 uname=AAA01 type=normal/
  node id=AAA02 uname=AAA02 type=normal/
/nodes
resources/
constraints/
  /configuration
/cib


=


aisexec {
user: root
group: root
}

corosync {
user: root
group: root
}

amf {
mode: disabled
}

logging {
to_stderr: yes
debug: off
timestamp: on
to_file: no
to_syslog: yes
syslog_facility: daemon
}

totem {
version: 2
token: 3000
token_retransmits_before_loss_const: 10
join: 60
consensus: 4000
vsftype: none
max_messages: 20
clear_node_high_bit: yes
secauth: on
threads: 0
# nodeid: 1234
rrp_mode: active
fail_recv_const: 5000

interface {
ringnumber: 0
bindnetaddr: 172.25.0.0
mcastaddr: 227.95.1.1
mcastport: 5404
}
}


==Sep 28 13:35:13 localhost corosync[12726]:   [pcmk  ] info: update_member: 
0x153fa980 Node 184555948 now known as AAA02 (was: (null))
Sep 28 13:35:13 localhost cib: [12733]: notice: ais_dispatch: Membership 3388: 
quorum acquired
Sep 28 13:35:13 localhost corosync[12726]:   [pcmk  ] info: update_member: Node 
AAA02 now has process list: 00013312 (78610)
Sep 28 13:35:13 localhost cib: [12733]: info: crm_get_peer: Node 184555948 is 
now known as AAA02
Sep 28 13:35:13 localhost corosync[12726]:   [pcmk  ] info: update_member: Node 
AAA02 now has 1 quorum votes (was 0)
Sep 28 13:35:13 localhost cib: [12733]: info: crm_update_peer: Node AAA02: 
id=184555948 state=member addr=r(0) ip(172.25.0.11)  votes=1 (new) born=3388 
seen=3388 proc=00013312 (new)
Sep 28 13:35:13 localhost corosync[12726]:   [pcmk  ] info: 

Re: [Linux-HA] cluster-glue make error

2011-06-17 Thread Amit Jathar
Nothing is missing there, it seems. You can try the command with 
--enable-fatal-warnings=no.

--Amit

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Nikita Michalko
Sent: 17 June 2011 19:30
To: pacema...@oss.clusterlabs.org; Linux-HA@lists.linux-ha.org
Subject: [Linux-HA] cluster-glue make error

Hi all,

I've downloaded the last tarball from http://hg.linux- 
ha.org/glue/archive/tip.tar.bz2, configured with:

./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc --with- 
heartbeat --with-stonith --with-pacemaker --with-daemon-user=$CLUSTER_USER -- 
with-daemon-group=$CLUSTER_GROUP

and now by  make I've got the following error:
... snip ...
libtool: link: ( cd .libs  rm -f libstonith.la  ln -s 
../libstonith.la libstonith.la ) gcc -std=gnu99 -DHAVE_CONFIG_H -I. 
-I../../include -I../../include - I../../include -I../../linux-ha 
-I../../linux-ha -I../../libltdl -
I../../libltdl  -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -
I/usr/include/libxml2  -g -O2 -ggdb3 -O0  -fgnu89-inline -fstack-protector-all 
-Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align - 
Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat- 
security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing- 
declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno- 
strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi - 
D_GNU_SOURCE -DANSI_ONLY -Werror -MT main.o -MD -MP -MF .deps/main.Tpo -c -o 
main.o main.c
cc1: warnings being treated as errors
main.c:408: Fehler: kein vorheriger Prototyp für »setup_cl_log«
gmake[2]: *** [main.o] Fehler 1
gmake[2]: Leaving directory `/root/neueRPMs/ha/sources/Reusable-Cluster-
Components-glue--0ff4e044f1be/lib/stonith'
gmake[1]: *** [all-recursive] Fehler 1
gmake[1]: Leaving directory `/root/neueRPMs/ha/sources/Reusable-Cluster-
Components-glue--0ff4e044f1be/lib'
make: *** [all-recursive] Fehler 1

OS: SLES11/SP1
cluster-glue version: 1.0.7 (Build: 0ff4e044f1be0138e8273a98c9fbee95b643bcf7)

What I'm missing?


TIA!

Nikita Michalko

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] need help on email alerts

2011-06-05 Thread Amit Jathar
Hi,

I have configured email alerts for corosync as follows :-
Crm configure show
---SNIP-
primitive resMON ocf:pacemaker:ClusterMon \
operations $id=resMON-operations \
op monitor interval=180 timeout=20 \
params extra_options=--mail-to x...@gmail.com
---SNIP---

I can see this resource is started :-
crm_mon -1
---SNIP
resMON (ocf::pacemaker:ClusterMon):Started xx
---SNIP

I can send mail from my machine :-
[root@localhost] mail -s testmail xx
.
Cc:
Null message body; hope that's ok

I cannot get mails any mails if my cluster status changes. I could not see 
anything in the /var/log/maillog also.

Is there any hint if I am missing out any configuration.

Thanks,
Amit



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Urgent help required ... Corosync not getting started ... !!!

2011-06-03 Thread Amit Jathar
The Selinux is disabled.
I am launching corosync with command /usr/etc/init.d/corosync start

How to strace lauch of corosync ? Also, how to check the UID?

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Eric Warnke
Sent: 02 June 2011 19:16
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Urgent help required ... Corosync not getting started 
... !!!


Selinux is disabled? 'getenforce' returns permissive/disabled?

How are you launching corosync, init script or direct?

What UID are you launching corosync as?

Does 'strace'-ing the launch of corosync reveal anything?

Eric


On 6/1/11 11:32 PM, Amit Jathar amit.jat...@alepo.com wrote:

Hi,

I have modified my corosync.conf file as follows :- logging {
fileline: off
to_syslog: yes
to_stderr: no
syslog_facility: daemon
debug: on
timestamp: on
to_logfile: yes
logfile: /var/log/corosync.log

 }

The corosync fails at startup  the file /var/log/corosync.log is not
getting created... :(

Thanks,
Amit
-Original Message-
From: linux-ha-boun...@lists.linux-ha.org
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Mike Caldwell
Sent: 02 June 2011 02:42
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Urgent help required ... Corosync not getting
started ... !!!



 I am not able to troubleshoot the issue after chasing it for more
 than a day . No hint, as no logs present in /var/log/messages/ ... :(
 Any help is appreciable.

 Let me know, if you need more information.

 Thanks,
 Amit

 I've had more luck with logging set up with

   to_logfile: yes
   logfile: /var/log/corosync.log
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



This email (message and any attachment) is confidential and may be
privileged. If you are not certain that you are the intended recipient,
please notify the sender immediately by replying to this message, and
delete all copies of this message and attachments. Any other use of
this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Urgent help required ... Corosync not getting started ... !!!

2011-06-01 Thread Amit Jathar
Hi,

I am not able to start the corosync service.
I am using corosync for around 6 months. I just need install on one more 
machine  it is not getting started.
Machine is RHEL5.4 64 bit.

I have installed corosync from sources.

The version of corosync I am using is :-
Corosync Cluster Engine, version '1.2.8' SVN revision '3059

I tried different pacemaker versions :-
1.0.10
1.0.11

I get the message :-
Starting Corosync Cluster Engine (corosync): [FAILED]

I have created the corosync.conf file (attached to this mail) in 
/usr/etc/corosync/  pcmk file in /usr/etc/corosync/service.d/

I am not able to troubleshoot the issue after chasing it for more than a day . 
No hint, as no logs present in /var/log/messages/ ... :(
Any help is appreciable.

Let me know, if you need more information.

Thanks,
Amit





This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.





corosync.conf
Description: corosync.conf
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Urgent help required ... Corosync not getting started ... !!!

2011-06-01 Thread Amit Jathar
Hi,

I have modified my corosync.conf file as follows :-
logging {
fileline: off
to_syslog: yes
to_stderr: no
syslog_facility: daemon
debug: on
timestamp: on
to_logfile: yes
logfile: /var/log/corosync.log

 }

The corosync fails at startup  the file /var/log/corosync.log is not getting 
created... :(

Thanks,
Amit
-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Mike Caldwell
Sent: 02 June 2011 02:42
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Urgent help required ... Corosync not getting started 
... !!!



 I am not able to troubleshoot the issue after chasing it for more than
 a day . No hint, as no logs present in /var/log/messages/ ... :( Any
 help is appreciable.

 Let me know, if you need more information.

 Thanks,
 Amit

 I've had more luck with logging set up with

   to_logfile: yes
   logfile: /var/log/corosync.log
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread Amit Jathar
Have you generated the authkey by corosync-keygen command on one node  then 
copied that file to other node ?

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of mike
Sent: Tuesday, April 26, 2011 5:41 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

On 11-04-22 06:25 AM, SEILLIER Mathieu wrote:
 Hi all,
 First I'm french so sorry in advance for my English...

 I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers 
 under Linux RedHat 5.3. The first server is active, the other one is passive. 
 The master is called servappli01, with IP address 186.20.100.40, the slave is 
 called servappli02, with IP address 186.20.100.39.
 I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when 
 server is started, this is Heartbeat which starts Tomcat when it's running.
 My problem is : When heartbeat is started on the first server, then on the 
 second server, the VIP is assigned to the 2 servers ! also, Tomcat is started 
 on each server, and each node see the other node as dead !

 Here is my configuration :

 ha.cf file (the same on each server) :

 logfile /var/log/ha-log

 debugfile /var/log/ha-debug

 logfacility none

 keepalive 2

 warntime 6

 deadtime 10

 initdead 90

 bcast eth0

 node servappli01 servappli02

 auto_failback yes

 respawn hacluster /usr/lib/heartbeat/ipfail

 apiauth ipfail gid=haclient uid=hacluster


 haresources file (the same on each server) :

 servappli01 IPaddr::186.20.100.41/24/eth0 tomcat


 Result of ifconfig command on the first server (servappli01) :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.40  Bcast:186.20.100.255
 Masque:255.255.255.0

adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0

TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:385833 (3.5 GiB)  TX bytes:2694953468 (2.5
 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.41  Bcast:186.20.100.255
 Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of ifconfig command on the second server (servappli02) at the same 
 time :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.39  Bcast:186.20.100.255
 Masque:255.255.255.0

adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0

TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:2620027933 (2.4 GiB)  TX bytes:3595896739 (3.3
 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.41  Bcast:186.20.100.255
 Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of /usr/bin/cl_status listnodes command (on each server) :

 servappli02

 servappli01


 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 :

 active

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 :

 active

 And of course, if I kill Tomcat on master server, there's no switch to the 
 second server (a call to a webapp using the VIP doesn't work).

 Can somebody help me please ?
 I guess there's is something wrong but I don't know what !
 Thanx

 Mathieu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


It almost sounds like the nodes are unaware of each other. Could be a network 
thing maybe. Here's some things to try:
Can you ssh or ping one node from the other?
Bring up one node with the VIP running - leave the other node up but heartbeat 
down. an you ping the VIP from the node NOT running HA?
What happens when you look at the cluster when both nodes are running - use the 
crm_mon command and paste what you see in here.

I'm thinking you have some sort of network issue.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: 

[Linux-HA] Pacemaker installation errors

2011-04-21 Thread Amit Jathar
Hi ,

I tried to install pacemaker. While installing  'Resource Agents', I run make 
command and got attached errors. I tried twice (did make clean also) and on 
both occasions, error was bit different (as attached).

The steps I was performing was :-
wget -O resource-agents.tar.bz2 
http://hg.linux-ha.org/agents/archive/tip.tar.bz2

tar jxvf resource-agents.tar.bz2
 cd Cluster-Resource-Agents-*

./autogen.sh  ./configure --prefix=$PREFIX

make
 sudo make install

I am using CenOS 5.6 64-bit .
Or can I use the Pacemaker with this erred source ?

Thanks,
Amit



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.





Note: Writing ocf_heartbeat_ClusterMon.7
OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/CTDB meta-data  
metadata-CTDB.xml
/usr/bin/xsltproc --novalid \
--stringparam package resource-agents \
--stringparam version 1.0.4 \
--output ocf_heartbeat_CTDB.xml \
ra2refentry.xsl metadata-CTDB.xml
/usr/bin/xsltproc \
--xinclude \
http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl 
ocf_heartbeat_CTDB.xml
Note: Writing ocf_heartbeat_CTDB.7
OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/Delay meta-data  
metadata-Delay.xml
/usr/bin/xsltproc --novalid \
--stringparam package resource-agents \
--stringparam version 1.0.4 \
--output ocf_heartbeat_Delay.xml \
ra2refentry.xsl metadata-Delay.xml
/usr/bin/xsltproc \
--xinclude \
http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl 
ocf_heartbeat_Delay.xml
Note: Writing ocf_heartbeat_Delay.7
OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/Dummy meta-data  
metadata-Dummy.xml
/usr/bin/xsltproc --novalid \
--stringparam package resource-agents \
--stringparam version 1.0.4 \
--output ocf_heartbeat_Dummy.xml \
ra2refentry.xsl metadata-Dummy.xml
/usr/bin/xsltproc \
--xinclude \
http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl 
ocf_heartbeat_Dummy.xml
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue:  or ' expected

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
xmlParseEntityDecl: entity list.class not terminated

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : XML 
conditional section not closed

^
unable to parse ocf_heartbeat_Dummy.xml
gmake[1]: *** [ocf_heartbeat_Dummy.7] Error 6
rm metadata-CTDB.xml metadata-Delay.xml metadata-Dummy.xml 
metadata-ClusterMon.xml metadata-AudibleAlarm.xml
gmake[1]: Leaving directory 
`/usr/local/src/Cluster-Resource-Agents-7a11934b142d/doc'
make: *** [all-recursive] Error 1
gmake[1]: Entering directory 
`/usr/local/src/Cluster-Resource-Agents-7a11934b142d/doc'
OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/AoEtarget meta-data  
metadata-AoEtarget.xml
/usr/bin/xsltproc --novalid \
--stringparam package resource-agents \
--stringparam version 1.0.4 \
--output ocf_heartbeat_AoEtarget.xml \
ra2refentry.xsl metadata-AoEtarget.xml
/usr/bin/xsltproc \
--xinclude \
http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl 
ocf_heartbeat_AoEtarget.xml
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^
http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
EntityValue: '%' forbidden except for entities references

^

[Linux-HA] Need help on the femce_vmware configuration ... facing errors ... !!

2011-03-31 Thread Amit Jathar
Hi,

I am facing following issues:-

I have installed VMware-VIPerl-1.6.0-104313.x86_64.tar.gz  
VMware-vSphere-Perl-SDK-4.1.0-254719.x86_64.tar.gz on the RHEL6 image.

My ESXi server hostname is esx5

1) If I use wrong password, I get error message as expected :-
[root@OEL6_VIP_1 ~]# /usr/sbin/fence_vmware  -o reboot -a x.x.x.x -l 'root' -p 
'wrong_pass' -n esx5
fence_vmware_helper returned Cannot connect to server!
VMware error:Cannot complete login due to an incorrect user name or password.

Please use '-h' for usage

2) If I use right password, then I get error message like:-
[root@OEL6_VIP_1 ~]# /usr/sbin/fence_vmware  -o reboot -a 172.16.150.5 -l 
'root' -p 'right_pass' -n esx5
fence_vmware_helper returned Cannot find vm esx5!

I can ping that node esx5 :-
PING esx5.localdomain (x.x.x.x) 56(84) bytes of data.
64 bytes from esx5.localdomain (x.x.x.x): icmp_seq=1 ttl=64 time=0.064 ms

What might be going wrong, so that my fence_vmware script is not able to find 
esx5?

Thanks,
Amit

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Amit Jathar
Sent: Wednesday, March 30, 2011 10:08 PM
To: General Linux-HA mailing list
Subject: [Linux-HA] Need help on the femce_vmware configuration

Hi,

I am having two RHEL6 vmware images running on Windows machines.
I have configured STONITH on those RHEL6 images.

I have installed the VI API on those machines.
I get error as cound not connect http://x.x.x.x/SDK/webService; when I 
manually try /usr/sbin/fence_vmware -o reboot -a x.x.x.x -l xxx -p xxx -n xxx

My question is :- can I use this fence_vmware script on the RHEL6 vmware images 
running on Windows machines ?
 or I must use the RHEL6 vmware images running on ESXi 
server ?

Thanks,
Amit



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Need help on the femce_vmware configuration

2011-03-30 Thread Amit Jathar
Hi,

I am having two RHEL6 vmware images running on Windows machines.
I have configured STONITH on those RHEL6 images.

I have installed the VI API on those machines.
I get error as cound not connect http://x.x.x.x/SDK/webService; when I 
manually try /usr/sbin/fence_vmware -o reboot -a x.x.x.x -l xxx -p xxx -n xxx

My question is :- can I use this fence_vmware script on the RHEL6 vmware images 
running on Windows machines ?
 or I must use the RHEL6 vmware images running on ESXi 
server ?

Thanks,
Amit



This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] need help to configure the fence_ifmib for stonith

2011-03-17 Thread Amit Jathar
Hi,

I would like to try the fence_ifmib as the fencing agent.

I can see it is present in my machine.
[root@OEL6_VIP_1 fence]# ls /usr/sbin/fence_ifmib
/usr/sbin/fence_ifmib

Also, I can see some python scripts present on my machine :-
[root@OEL6_VIP_1 fence]# pwd
/usr/share/fence
[root@OEL6_VIP_1 fence]# ls
fencing.py  fencing.pyc  fencing.pyo  fencing_snmp.py  fencing_snmp.pyc  
fencing_snmp.pyo
[root@OEL6_VIP_1 fence]#

Is there any chance I can configure the if_mib as the stonith agent.
If yes, then which MIB files shall I need ?

Thanks,
Amit




This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] failover issues faced with pacemaker/corosync

2011-02-24 Thread Amit Jathar
Hi,

I am trying to use Pacemaker with corosync  facing following issues.
I want to know whether these are due to misconfiguration or these are known 
issues.

I have two nodes in the cluster :- VIP-1  VIP-2
The corosync version is :-
Corosync Cluster Engine, version '1.2.7' SVN revision '3008'

==
The crm_mon output is :-


Last updated: Thu Feb 24 17:44:33 2011
Stack: openais
Current DC: VIP-1 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
3 Resources configured.


Online: [ VIP-1 VIP-2 ]

ClusterIP   (ocf::heartbeat:IPaddr2):   Started VIP-1
WebSite (ocf::heartbeat:apache):Started VIP-1
My_Tomcat   (ocf::heartbeat:tomcat):Started VIP-

==
My configuration is :-

[root@VIP-1 local]# crm configure show
node VIP-1
node VIP-2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip=172.16.201.23 cidr_netmask=32 \
op monitor interval=5s
primitive My_Tomcat ocf:heartbeat:tomcat \
params catalina_home=/root/Softwares/apache-tomcat-6.0.26 
java_home=/root/Softwares/Java/linux/jdk1.6.0_21 \
op monitor interval=5s
primitive WebSite ocf:heartbeat:apache \
params configfile=/etc/httpd/conf/httpd.conf \
op monitor interval=5s
property $id=cib-bootstrap-options \
dc-version=1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
stonith-enabled=false \
no-quorum-policy=ignore \
last-lrm-refresh=1298547656
rsc_defaults $id=rsc-options \
resource-stickiness=2
===


Issue -1)

I observed that If any service is manually shutdown on the VIP-1, then the 
corosync restarts it on the same node.
In the logs, I can see this :-
=
Feb 24 18:14:32 VIP-1 pengine: [28098]: info: get_failcount: My_Tomcat has 
failed 35 times on VIP-1
Feb 24 18:14:32 VIP-1 pengine: [28098]: notice: common_apply_stickiness: 
My_Tomcat can fail 65 more times on VIP-1 before being forced off
==

I have not configured to restart the service for INFINITY times on VIP-1, so is 
this default behavior?
Is there any configuration to tell the corosync to restart the service only for 
two times on VIP-1   if not started, then start it on VIP-2 ?

Issue -2)

I have changed the error codes in the Apache  Tomcat RA scripts,  returned 
the error code=2 if the monitor fails.
Now, if I manually stop the service,  then it is not restarted on the VIP-1 but 
it is started on VIP-2.
The fail count of that service on VIP-1 is showing as 1.

Now, if I make the service manually  down on the VIP-2, then it is not getting 
started on the VIP-1 untill I clean up the resource.

So, is this known behavior or I have missed any configuration?

Let me know if you need more information.

Thanks,
Amit











This email (message and any attachment) is confidential and may be privileged. 
If you are not certain that you are the intended recipient, please notify the 
sender immediately by replying to this message, and delete all copies of this 
message and attachments. Any other use of this email by you is prohibited.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems