Re: [Linux-ha-dev] [Openais] An OCF agent for LXC (Linux Containers)

2011-04-26 Thread Dejan Muhamedagic
On Tue, Apr 26, 2011 at 03:36:35PM +0200, Florian Haas wrote:
 Thanks Darren!
 
 Thanks for the contribution! Can I suggest
 
 - we move this discussion to the linux-ha-dev list (where most OCF RA
 related discussions and reviews take place);
 
 - you give the RA a makeover following the OCF RA developer's guide
 (http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html);
 
 - you set up your own github fork off of
 https://github.com/ClusterLabs/resource-agents, and push your RA to that
 so we can eventually pull it into the mainline repo?
 
 Also, can you explain what the advantages of your approach are, versus
 using libvirt-managed lxc containers which Pacemaker can tie into via
 the existing VirtualDomain agent?

Yes, this is the first thing I thought about too.

A few remarks:

- the required attributes in meta-data need to be reviewed,
  a parameter is either required or has a default, cannot be
  both

- why use screen(1) in start?

BTW, since lxc seems to be easy to setup, it would be great to
supply an ocft test file along with the RA. It's quite
straightforward, just make a copy of one of the existing test
files from tools/ocft.

Cheers,

Dejan

 Thanks!
 Cheers,
 Florian
 



 ___
 Openais mailing list
 open...@lists.linux-foundation.org
 https://lists.linux-foundation.org/mailman/listinfo/openais
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] STONITH plugin for VMware vCenter

2011-04-26 Thread Dejan Muhamedagic
Hi,

On Thu, Apr 21, 2011 at 11:08:08AM +0200, Nhan Ngo Dinh wrote:
 Hi,
 
 On Tue, 2011-04-19 at 14:21 +0200, Dejan Muhamedagic wrote:
   longdesc lang=en
   The VMware vCenter address (default: localhost)
  
  The defaults should go into the content element (see other
  stonith plugins, e.g. external/ipmi).
 
 These defaults come from the vSphere Perl SDK, they are not handled
 inside this code. Does it make any difference? Anyway I've changed as
 you said.
 
   Enable/disable a PowerOnVM on reset when the target virtual machine is off
   Allowed values: 0, 1
  
  This should default to 1. For better or worse, that's what
  stonith prescribes and other plugins adhere to.
 
 Ok. I've also added an error if RESETPOWERON is set and machine is
 powered off.

OK.

  Is this the only error which can happen? If not, then no error
  will be logged in that case. Ditto for another occurence below.
 
 This is what happens according to SDK, however I've added also a generic
 error handling procedure to die() if anything other fails.

Good. One (probably) never knows future.

I'll push the plugin now to the public repository.

We just need one more thing to fix. The info commands such as
getinfo-xml have to work without software which would otherwise
be required for the plugin's operation, in this case it's the
VMware::VIRuntime module. I guess that you need to use the eval
command.

Many thanks for the contribution. Not least for the
documentation!

Cheers,

Dejan

 Best regards,
 Nhan
 
 


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Cluster Stack - Ubuntu Developer Summit

2011-04-26 Thread Andres Rodriguez
Greetings Everyone!

In a couple weeks from today, the Ubuntu Developer Summit (UDS) will be
kicking in. UDS's, are the events on which discussion happens towards the
next Ubuntu Release in various aspects, such as Server, Desktop,
Foundations, etc. In this case, the UDS will be held in Budapest, Hungary
between 9-13 of May, towards Ubuntu 11.10 Oneiric Ocelot, due in October
2011.

As it has become a custom from past UDS', this time we will also have a
session for the Cluster Stack. This session will have as primary objective
the discussion of the adaption of Pacemaker 1.1.X and related software as a
technology preview for what's yet to come, and in preparation for the next
Ubuntu release 12.04, which is a Long Term Support release, or discuss the
upgrade-path options to partially upgrade some components while leaving
others (i.e. fence-agents, resource-agents). Additionally, we also discuss
some other features of things we would like to see for the Cluster Stack in
Ubuntu, as well as spread is adoption, improve documentation, etc.

UDS' are open-to-public events, and I believe it would be great if upstream
could participate and maybe even further the discussion about the Cluster
Stack. For more information about UDS, please visit [1]. The specific
date/time for the Cluster Stack session is not yet available.

If you require any further information please don't hesitate to contact me.

[1]: http://uds.ubuntu.com/


-- 
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
Systems Engineer
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Bug in crm shell or pengine

2011-04-26 Thread Dejan Muhamedagic
Hi,

On Tue, Apr 19, 2011 at 09:22:41AM -0600, Serge Dubrouski wrote:
 On Tue, Apr 19, 2011 at 1:12 AM, Andrew Beekhof and...@beekhof.net wrote:
  On Mon, Apr 18, 2011 at 11:38 PM, Serge Dubrouski serge...@gmail.com 
  wrote:
  Ok, I've read the documentation. It's not a bug, it's a feature :-)
 
  Might be nice if the shell could somehow prevent such configs, but it
  would be non-trivial to implement.
 
 Or may be as trivial as checking for such duplicates and in case of
 different roles adjusting interval time with plus or minus 1.

A good idea. Could you please file a bugzilla lest we forget
about it.

Thanks,

Dejan

 
 
  On Mon, Apr 18, 2011 at 3:01 PM, Serge Dubrouski serge...@gmail.com 
  wrote:
  Hello -
 
  Looks like there is a bug in crm shell Pacemaker version 1.1.5 or in 
  pengine.
 
 
  primitive pg_drbd ocf:linbit:drbd \
         params drbd_resource=drbd0 \
         op monitor interval=60s role=Master timeout=10s \
         op monitor interval=60s role=Slave timeout=10s
 
  Log file:
 
  Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Operation
  pg_drbd-monitor-60s-0 is a duplicate of pg_drbd-monitor-60s
  Apr 17 04:05:29 cs51 crmd: [5535]: info: do_state_transition: Starting
  PEngine Recheck Timer
  Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Do not use the
  same (name, interval) combination more than once per resource
  Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Operation
  pg_drbd-monitor-60s-0 is a duplicate of pg_drbd-monitor-60s
  Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Do not use the
  same (name, interval) combination more than once per resource
  Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Operation
  pg_drbd-monitor-60s-0 is a duplicate of pg_drbd-monitor-60s
 
  Plus strange behavior of the cluster like inability to mover resources
  from one node to another.
 
  --
  Serge Dubrouski.
 
 
 
 
  --
  Serge Dubrouski.
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 
 
 
 
 -- 
 Serge Dubrouski.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Pacemaker] Resource Agents 1.0.4: HA LVM Patch

2011-04-26 Thread Dejan Muhamedagic
Hi,

On Tue, Apr 19, 2011 at 03:56:16PM +0200, Ulf wrote:
 Hi,
 
 I attached a patch to enhance the LVM agent with the capability to set a tag 
 on the VG (set_hosttag = true) in conjunction with a volume_list filter this 
 can prevent to activate a VG on multiple host. Unfortunately active VGs will 
 stay active in case of unclean operation.

Can you please elaborate on the benefits this patch would bring.
Is it supposed to prevent a VG from being mounted on more than
one node?

Looking at the code, it seems like that on the start operation,
the existing tag would be overwritten regardless.

Thanks,

Dejan

P.S. Moving the discussion to the proper mailing list.

 The tag is always the hostname.
 Some configuration hints can be found here: 
 http://sources.redhat.com/cluster/wiki/LVMFailover
 
 Cheers,
 Ulf
 -- 
 GMX DSL Doppel-Flat ab 19,99 Euro/mtl.! Jetzt mit 
 gratis Handy-Flat! http://portal.gmx.net/de/go/dsl


 ___
 Pacemaker mailing list: pacema...@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Bug in crm shell or pengine

2011-04-26 Thread Serge Dubrouski
On Tue, Apr 26, 2011 at 1:03 PM, Dejan Muhamedagic deja...@fastmail.fm wrote:
 Hi,

 On Tue, Apr 19, 2011 at 09:22:41AM -0600, Serge Dubrouski wrote:
 On Tue, Apr 19, 2011 at 1:12 AM, Andrew Beekhof and...@beekhof.net wrote:
  On Mon, Apr 18, 2011 at 11:38 PM, Serge Dubrouski serge...@gmail.com 
  wrote:
  Ok, I've read the documentation. It's not a bug, it's a feature :-)
 
  Might be nice if the shell could somehow prevent such configs, but it
  would be non-trivial to implement.

 Or may be as trivial as checking for such duplicates and in case of
 different roles adjusting interval time with plus or minus 1.

 A good idea. Could you please file a bugzilla lest we forget
 about it.

Bug 2586.


-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] Interested in Contributing

2011-04-26 Thread Tim Serong
On 4/23/2011 at 03:55 AM, Michael Thrift mike.thr...@schryvermedical.com
wrote: 
 All, 
  
 I've recently started diving into Linux-HA, and I must say I am very  
 impressed.

Welcome!

 I'm developing some in house HA solutions, leveraging the Linux-HA  
 project, and it's going very well.  One of the projects I've been working on  
 is Squid HA, and I found that the OCF script was a little too limited for our 
  
 deployment of multiple Squid instances on the same box.  I've modified the  
 OCF to include a new OCF_RESKEY named squid_address.  This allowed the script 
  
 to check the health of a specific squid instance, rather then just check if  
 Squid is running in general.  I'd like to contribute this to the project, but 
  
 I'm not sure the best place to do so...  Any thoughts on this?  I'm happy to  
 share my mods to the OCF script for those who are interested.  Thanks! 

Try the linux-ha-dev list for RA patches/tweaks/contributions/etc.

Regards,

Tim


-- 
Tim Serong tser...@novell.com
Senior Clustering Engineer, OPS Engineering, Novell Inc.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread SEILLIER Mathieu
Hi all,
First I'm french so sorry in advance for my English...

I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers 
under Linux RedHat 5.3. The first server is active, the other one is passive. 
The master is called servappli01, with IP address 186.20.100.40, the slave is 
called servappli02, with IP address 186.20.100.39.
I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when 
server is started, this is Heartbeat which starts Tomcat when it's running.
My problem is : When heartbeat is started on the first server, then on the 
second server, the VIP is assigned to the 2 servers ! also, Tomcat is started 
on each server, and each node see the other node as dead !

Here is my configuration :

ha.cf file (the same on each server) :

logfile /var/log/ha-log

debugfile /var/log/ha-debug

logfacility none

keepalive 2

warntime 6

deadtime 10

initdead 90

bcast eth0

node servappli01 servappli02

auto_failback yes

respawn hacluster /usr/lib/heartbeat/ipfail

apiauth ipfail gid=haclient uid=hacluster


haresources file (the same on each server) :

servappli01 IPaddr::186.20.100.41/24/eth0 tomcat


Result of ifconfig command on the first server (servappli01) :

eth0  Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

  inet adr:186.20.100.40  Bcast:186.20.100.255  Masque:255.255.255.0

  adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

  RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0

  TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 lg file transmission:1000

  RX bytes:385833 (3.5 GiB)  TX bytes:2694953468 (2.5 GiB)

  Interruption:177 Memoire:fa00-fa012100



eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

  inet adr:186.20.100.41  Bcast:186.20.100.255  Masque:255.255.255.0

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  Interruption:177 Memoire:fa00-fa012100

Result of ifconfig command on the second server (servappli02) at the same time :

eth0  Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

  inet adr:186.20.100.39  Bcast:186.20.100.255  Masque:255.255.255.0

  adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

  RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0

  TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0

  collisions:0 lg file transmission:1000

  RX bytes:2620027933 (2.4 GiB)  TX bytes:3595896739 (3.3 GiB)

  Interruption:177 Memoire:fa00-fa012100



eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

  inet adr:186.20.100.41  Bcast:186.20.100.255  Masque:255.255.255.0

  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  Interruption:177 Memoire:fa00-fa012100

Result of /usr/bin/cl_status listnodes command (on each server) :

servappli02

servappli01


Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 :

active

Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 :

dead

Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :

dead

Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 :

active

And of course, if I kill Tomcat on master server, there's no switch to the 
second server (a call to a webapp using the VIP doesn't work).

Can somebody help me please ?
I guess there's is something wrong but I don't know what !
Thanx

Mathieu
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] mgmtd: [xxx]: ERROR: on_listen attach server socket failed

2011-04-26 Thread exx libris
Hi,

We are getting mgmtd: [xxx]: ERROR: on_listen attach server socket failed
errors in logs time to time.

Any idea what it means and what is the cause ?  Cluster looks OK though.


Thanks,
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] XEN NPIV with Brocade bfa driver anyone?

2011-04-26 Thread Ulrich Windl
Hi!

I just found out that XEN4's NPIV (Fibre Channel NPort Virtualozation) does not 
work with Brocade's bfa driver in SLES 11 SP1. That is because of 
non-standard sysfs entries being used for virtual ports (similar to Emulex, but 
still different). I wonder whether anybody did hack the block-npiv-common to 
make that work.

Sorry if that's not very closely related to HA, but when wanting to move VMs, 
virtual ports are quite nice to have...

Regards,
Ulrich


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread mike
On 11-04-22 06:25 AM, SEILLIER Mathieu wrote:
 Hi all,
 First I'm french so sorry in advance for my English...

 I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers 
 under Linux RedHat 5.3. The first server is active, the other one is passive. 
 The master is called servappli01, with IP address 186.20.100.40, the slave is 
 called servappli02, with IP address 186.20.100.39.
 I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when 
 server is started, this is Heartbeat which starts Tomcat when it's running.
 My problem is : When heartbeat is started on the first server, then on the 
 second server, the VIP is assigned to the 2 servers ! also, Tomcat is started 
 on each server, and each node see the other node as dead !

 Here is my configuration :

 ha.cf file (the same on each server) :

 logfile /var/log/ha-log

 debugfile /var/log/ha-debug

 logfacility none

 keepalive 2

 warntime 6

 deadtime 10

 initdead 90

 bcast eth0

 node servappli01 servappli02

 auto_failback yes

 respawn hacluster /usr/lib/heartbeat/ipfail

 apiauth ipfail gid=haclient uid=hacluster


 haresources file (the same on each server) :

 servappli01 IPaddr::186.20.100.41/24/eth0 tomcat


 Result of ifconfig command on the first server (servappli01) :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.40  Bcast:186.20.100.255  Masque:255.255.255.0

adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0

TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:385833 (3.5 GiB)  TX bytes:2694953468 (2.5 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.41  Bcast:186.20.100.255  Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of ifconfig command on the second server (servappli02) at the same 
 time :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.39  Bcast:186.20.100.255  Masque:255.255.255.0

adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0

TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:2620027933 (2.4 GiB)  TX bytes:3595896739 (3.3 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.41  Bcast:186.20.100.255  Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of /usr/bin/cl_status listnodes command (on each server) :

 servappli02

 servappli01


 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 :

 active

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 :

 active

 And of course, if I kill Tomcat on master server, there's no switch to the 
 second server (a call to a webapp using the VIP doesn't work).

 Can somebody help me please ?
 I guess there's is something wrong but I don't know what !
 Thanx

 Mathieu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


It almost sounds like the nodes are unaware of each other. Could be a 
network thing maybe. Here's some things to try:
Can you ssh or ping one node from the other?
Bring up one node with the VIP running - leave the other node up but 
heartbeat down. an you ping the VIP from the node NOT running HA?
What happens when you look at the cluster when both nodes are running - 
use the crm_mon command and paste what you see in here.

I'm thinking you have some sort of network issue.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread Amit Jathar
Have you generated the authkey by corosync-keygen command on one node  then 
copied that file to other node ?

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of mike
Sent: Tuesday, April 26, 2011 5:41 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

On 11-04-22 06:25 AM, SEILLIER Mathieu wrote:
 Hi all,
 First I'm french so sorry in advance for my English...

 I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers 
 under Linux RedHat 5.3. The first server is active, the other one is passive. 
 The master is called servappli01, with IP address 186.20.100.40, the slave is 
 called servappli02, with IP address 186.20.100.39.
 I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when 
 server is started, this is Heartbeat which starts Tomcat when it's running.
 My problem is : When heartbeat is started on the first server, then on the 
 second server, the VIP is assigned to the 2 servers ! also, Tomcat is started 
 on each server, and each node see the other node as dead !

 Here is my configuration :

 ha.cf file (the same on each server) :

 logfile /var/log/ha-log

 debugfile /var/log/ha-debug

 logfacility none

 keepalive 2

 warntime 6

 deadtime 10

 initdead 90

 bcast eth0

 node servappli01 servappli02

 auto_failback yes

 respawn hacluster /usr/lib/heartbeat/ipfail

 apiauth ipfail gid=haclient uid=hacluster


 haresources file (the same on each server) :

 servappli01 IPaddr::186.20.100.41/24/eth0 tomcat


 Result of ifconfig command on the first server (servappli01) :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.40  Bcast:186.20.100.255
 Masque:255.255.255.0

adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0

TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:385833 (3.5 GiB)  TX bytes:2694953468 (2.5
 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.41  Bcast:186.20.100.255
 Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of ifconfig command on the second server (servappli02) at the same 
 time :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.39  Bcast:186.20.100.255
 Masque:255.255.255.0

adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0

TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:2620027933 (2.4 GiB)  TX bytes:3595896739 (3.3
 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.41  Bcast:186.20.100.255
 Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of /usr/bin/cl_status listnodes command (on each server) :

 servappli02

 servappli01


 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 :

 active

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 :

 active

 And of course, if I kill Tomcat on master server, there's no switch to the 
 second server (a call to a webapp using the VIP doesn't work).

 Can somebody help me please ?
 I guess there's is something wrong but I don't know what !
 Thanx

 Mathieu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


It almost sounds like the nodes are unaware of each other. Could be a network 
thing maybe. Here's some things to try:
Can you ssh or ping one node from the other?
Bring up one node with the VIP running - leave the other node up but heartbeat 
down. an you ping the VIP from the node NOT running HA?
What happens when you look at the cluster when both nodes are running - use the 
crm_mon command and paste what you see in here.

I'm thinking you have some sort of network issue.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: 

Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread Dimitri Maziuk
On 4/22/2011 4:25 AM, SEILLIER Mathieu wrote:

 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 :
 active

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 :
 dead

 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :
 dead

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 :
 active

iptables?

 And of course, if I kill Tomcat on master server, there's no switch
 to  the second server (a call to a webapp using the VIP doesn't work).

You need mon for that.

Dima
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread Dejan Muhamedagic
On Tue, Apr 26, 2011 at 12:29:28PM +, Amit Jathar wrote:
 Have you generated the authkey by corosync-keygen command on one node  
 then copied that file to other node ?

Heartbeat != Corosync

Thanks,

Dejan

 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of mike
 Sent: Tuesday, April 26, 2011 5:41 PM
 To: General Linux-HA mailing list
 Subject: Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(
 
 On 11-04-22 06:25 AM, SEILLIER Mathieu wrote:
  Hi all,
  First I'm french so sorry in advance for my English...
 
  I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers 
  under Linux RedHat 5.3. The first server is active, the other one is 
  passive. The master is called servappli01, with IP address 186.20.100.40, 
  the slave is called servappli02, with IP address 186.20.100.39.
  I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when 
  server is started, this is Heartbeat which starts Tomcat when it's running.
  My problem is : When heartbeat is started on the first server, then on the 
  second server, the VIP is assigned to the 2 servers ! also, Tomcat is 
  started on each server, and each node see the other node as dead !
 
  Here is my configuration :
 
  ha.cf file (the same on each server) :
 
  logfile /var/log/ha-log
 
  debugfile /var/log/ha-debug
 
  logfacility none
 
  keepalive 2
 
  warntime 6
 
  deadtime 10
 
  initdead 90
 
  bcast eth0
 
  node servappli01 servappli02
 
  auto_failback yes
 
  respawn hacluster /usr/lib/heartbeat/ipfail
 
  apiauth ipfail gid=haclient uid=hacluster
 
 
  haresources file (the same on each server) :
 
  servappli01 IPaddr::186.20.100.41/24/eth0 tomcat
 
 
  Result of ifconfig command on the first server (servappli01) :
 
  eth0  Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38
 
 inet adr:186.20.100.40  Bcast:186.20.100.255
  Masque:255.255.255.0
 
 adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien
 
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 
 RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0
 
 TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0
 
 collisions:0 lg file transmission:1000
 
 RX bytes:385833 (3.5 GiB)  TX bytes:2694953468 (2.5
  GiB)
 
 Interruption:177 Memoire:fa00-fa012100
 
 
 
  eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38
 
 inet adr:186.20.100.41  Bcast:186.20.100.255
  Masque:255.255.255.0
 
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 Interruption:177 Memoire:fa00-fa012100
 
  Result of ifconfig command on the second server (servappli02) at the same 
  time :
 
  eth0  Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C
 
 inet adr:186.20.100.39  Bcast:186.20.100.255
  Masque:255.255.255.0
 
 adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien
 
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 
 RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0
 
 TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0
 
 collisions:0 lg file transmission:1000
 
 RX bytes:2620027933 (2.4 GiB)  TX bytes:3595896739 (3.3
  GiB)
 
 Interruption:177 Memoire:fa00-fa012100
 
 
 
  eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C
 
 inet adr:186.20.100.41  Bcast:186.20.100.255
  Masque:255.255.255.0
 
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 Interruption:177 Memoire:fa00-fa012100
 
  Result of /usr/bin/cl_status listnodes command (on each server) :
 
  servappli02
 
  servappli01
 
 
  Result of /usr/bin/cl_status nodestatus servappli01 command on 
  servappli01 :
 
  active
 
  Result of /usr/bin/cl_status nodestatus servappli02 command on 
  servappli01 :
 
  dead
 
  Result of /usr/bin/cl_status nodestatus servappli01 command on 
  servappli02 :
 
  dead
 
  Result of /usr/bin/cl_status nodestatus servappli02 command on 
  servappli02 :
 
  active
 
  And of course, if I kill Tomcat on master server, there's no switch to the 
  second server (a call to a webapp using the VIP doesn't work).
 
  Can somebody help me please ?
  I guess there's is something wrong but I don't know what !
  Thanx
 
  Mathieu
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 
 It almost sounds like the nodes are unaware of each other. Could be a network 
 thing maybe. Here's some things to try:
 Can you ssh or ping one node from the other?
 Bring up one node with the VIP running - leave the other node up but 
 heartbeat down. an you ping the VIP from the node NOT running HA?
 What happens when you look at the cluster when both nodes are running 

Re: [Linux-HA] JAVA sun.jnu.encoding ignored when process started from BP. Not when started manually

2011-04-26 Thread Dejan Muhamedagic
Hi,

On Thu, Apr 21, 2011 at 05:45:37PM -0500, Mike Toler wrote:
 I have a java process that I am started by Linux HA.
 
 I have create an OCF script called BillingProcessor.
 
 That script calls an outside script (pm.pl) which starts the process.
 
 The JAVA command is shown here.   Note, I am including the
 -Dsun.jnu.encoding=UTF-8 directive.
 
 java -Dsun.jnu.encoding=UTF-8 -cp
 ../lib/RSCBillingProcessor.jar:../lib/RSCBillingCollector.jar:../lib/fw_
 alarms.jar:../lib/fw_app.jar:../lib/fw_base.jar:../lib/fw_comm.jar:../li
 b/fw_config.jar:../lib/fw_dom4j.jar:../lib/fw_file.jar:../lib/fw_jdom.ja
 r:../lib/fw_metaobject.jar:../lib/fw_staged.jar:../lib/fw_stats.jar:../l
 ib/fw_util.jar:../lib/fw_xmpp.jar:../lib/3rdPartyLib/log4j.jar:../lib/3r
 dPartyLib/jdom.jar:../lib/3rdPartyLib/jcore.jar:../lib/3rdPartyLib/commo
 ns-cli-1.0.jar:../lib/3rdPartyLib/snmp.jar:../config
 com.prodeasystems.rsc.bc.processor.BillingProcApp BP ../config/BP.xml
 
 When I start the process using my script alone, I see:
 sun.jnu.encoding = UTF-8
 
 When it is started from Heartbeat, I see:
 sun.jnu.encoding = ANSI_X3.4-1968

See how? In the java process? Or with ps(1)? The latter would be
really strange.

Otherwise, I cannot say what's going on. You can try to debug
the script, just add 'set -x' somewhere and the trace will be
logged. Perhaps also to dump the environment just before
invoking java. Redirect to stderr (set 2), stdout is logged
only with debug on. Or do exec 2 at the top of the script.

Thanks,

Dejan

 I can't for the life of me figure out HOW heartbeat can be causing this,
 but it is 100% consistent over 4 installations on 3 OS's (Redhat 5.4,
 Centos 5.4 and Centos 6.0).  The process started from the command line
 has encoding of UTF-8.  The process started from heartbeat has
 ANSI_X3.4-1968.
 
 Has anyone ever seen anything like this??
 
 Michael Toler
 (214) 278-1834 (Office)
 (972) 816-7790 (mobile)
 Senior Systems Integration Engineer
 Prodea Systems.
 
 
 
 
 
 This message is confidential to Prodea Systems, Inc unless otherwise 
 indicated 
 or apparent from its nature. This message is directed to the intended 
 recipient 
 only, who may be readily determined by the sender of this message and its 
 contents. If the reader of this message is not the intended recipient, or an 
 employee or agent responsible for delivering this message to the intended 
 recipient:(a)any dissemination or copying of this message is strictly 
 prohibited; and(b)immediately notify the sender by return message and destroy 
 any copies of this message in any form(electronic, paper or otherwise) that 
 you 
 have.The delivery of this message and its information is neither intended to 
 be 
 nor constitutes a disclosure or waiver of any trade secrets, intellectual 
 property, attorney work product, or attorney-client communications. The 
 authority of the individual sending this message to legally bind Prodea 
 Systems  
 is neither apparent nor implied,and must be independently verified.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pacemaker not reconnecting

2011-04-26 Thread Dejan Muhamedagic
Hi,

On Thu, Apr 21, 2011 at 04:52:23PM +0200, Jean-Baptiste GIRARD wrote:
 Hi,
 
 
 
 I have a two node cluster with pacemaker (heartbeat).
 
 Regularly after a cluster partition there is a problem with the membership.
 Both nodes see the other one offline and you can see the following log:
 
 
 
 
 
 Apr 20 14:13:18 ACSDUPLI-M heartbeat: [13151]: CRIT: Cluster node acsdupli-s
 returning after partition.
 
 Apr 20 14:13:18 ACSDUPLI-M heartbeat: [13151]: info: For information on
 cluster partitions, See URL: http://linux-ha.org/wiki/Split_Brain
 
 Apr 20 14:13:18 ACSDUPLI-M heartbeat: [13151]: WARN: Deadtime value may be
 too small.
 
 Apr 20 14:13:18 ACSDUPLI-M heartbeat: [13151]: info: See FAQ for information
 on tuning deadtime.
 
 Apr 20 14:13:18 ACSDUPLI-M heartbeat: [13151]: info: URL:
 http://linux-ha.org/wiki/FAQ#Heavy_Load
 
 Apr 20 14:13:18 ACSDUPLI-M heartbeat: [13151]: info: Link acsdupli-s:eth0
 up.
 
 Apr 20 14:13:18 ACSDUPLI-M heartbeat: [13151]: WARN: Late heartbeat: Node
 acsdupli-s: interval 75040 ms
 
 Apr 20 14:13:18 ACSDUPLI-M heartbeat: [13151]: info: Status update for node
 acsdupli-s: status active
 
 Apr 20 14:13:18 ACSDUPLI-M cib: [13168]: WARN: cib_peer_callback: Discarding
 cib_apply_diff message (2ce2b) from acsdupli-s: not in our membership
 
 Apr 20 14:13:18 ACSDUPLI-M pingd: [13173]: notice: pingd_lstatus_callback:
 Status update: Ping node acsdupli-s now has status [up]
 
 Apr 20 14:13:18 ACSDUPLI-M pingd: [13173]: notice: pingd_nstatus_callback:
 Status update: Ping node acsdupli-s now has status [up]
 
 Apr 20 14:13:18 ACSDUPLI-M pingd: [13173]: notice: pingd_nstatus_callback:
 Status update: Ping node acsdupli-s now has status [active]
 
 Apr 20 14:13:18 ACSDUPLI-M crmd: [13172]: notice: crmd_ha_status_callback:
 Status update: Node acsdupli-s now has status [active] (DC=true)
 
 Apr 20 14:13:18 ACSDUPLI-M crmd: [13172]: info: crm_update_peer_proc:
 acsdupli-s.ais is now online
 
 Apr 20 14:13:19 ACSDUPLI-M ccm: [13167]: info: Break tie for 2 nodes cluster
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: mem_handle_event: Got an
 event OC_EV_MS_INVALID from ccm
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: mem_handle_event: no
 mbr_track info
 
 Apr 20 14:13:19 ACSDUPLI-M crmd: [13172]: info: mem_handle_event: Got an
 event OC_EV_MS_INVALID from ccm
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: mem_handle_event: Got an
 event OC_EV_MS_NEW_MEMBERSHIP from ccm
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: mem_handle_event:
 instance=91, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
 
 Apr 20 14:13:19 ACSDUPLI-M crmd: [13172]: info: mem_handle_event: no
 mbr_track info
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: cib_ccm_msg_callback:
 Processing CCM event=NEW MEMBERSHIP (id=91)
 
 Apr 20 14:13:19 ACSDUPLI-M crmd: [13172]: info: mem_handle_event: Got an
 event OC_EV_MS_NEW_MEMBERSHIP from ccm
 
 Apr 20 14:13:19 ACSDUPLI-M crmd: [13172]: info: mem_handle_event:
 instance=91, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
 
 Apr 20 14:13:19 ACSDUPLI-M crmd: [13172]: info: crmd_ccm_msg_callback:
 Quorum (re)attained after event=NEW MEMBERSHIP (id=91)
 
 Apr 20 14:13:19 ACSDUPLI-M crmd: [13172]: info: ccm_event_detail: NEW
 MEMBERSHIP: trans=91, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
 
 Apr 20 14:13:19 ACSDUPLI-M crmd: [13172]: info: ccm_event_detail:
 CURRENT: acsdupli-m [nodeid=0, born=91]
 
 Apr 20 14:13:19 ACSDUPLI-M crmd: [13172]: info: populate_cib_nodes_ha:
 Requesting the list of configured nodes
 
 Apr 20 14:13:19 ACSDUPLI-M ccm: [13167]: info: Break tie for 2 nodes cluster
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: mem_handle_event: Got an
 event OC_EV_MS_INVALID from ccm
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: mem_handle_event: no
 mbr_track info
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: mem_handle_event: Got an
 event OC_EV_MS_NEW_MEMBERSHIP from ccm
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: mem_handle_event:
 instance=92, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
 
 Apr 20 14:13:19 ACSDUPLI-M cib: [13168]: info: cib_ccm_msg_callback:
 Processing CCM event=NEW MEMBERSHIP (id=92)
 
 Apr 20 14:13:20 ACSDUPLI-M cib: [13168]: info: cib_process_request:
 Operation complete: op cib_modify for section nodes (origin=local/crmd/1001,
 version=0.185.2): ok (rc=0)
 
 Apr 20 14:13:20 ACSDUPLI-M ccm: [13167]: info: Break tie for 2 nodes cluster
 
 Apr 20 14:13:20 ACSDUPLI-M cib: [13168]: info: mem_handle_event: Got an
 event OC_EV_MS_INVALID from ccm
 
 Apr 20 14:13:20 ACSDUPLI-M cib: [13168]: info: mem_handle_event: no
 mbr_track info
 
 Apr 20 14:13:20 ACSDUPLI-M cib: [13168]: info: mem_handle_event: Got an
 event OC_EV_MS_NEW_MEMBERSHIP from ccm
 
 Apr 20 14:13:20 ACSDUPLI-M cib: [13168]: info: mem_handle_event:
 instance=93, nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
 
 Apr 20 14:13:20 ACSDUPLI-M cib: [13168]: info: cib_ccm_msg_callback:
 Processing CCM event=NEW 

Re: [Linux-HA] Pacemaker installation errors

2011-04-26 Thread Dejan Muhamedagic
Hi,

On Thu, Apr 21, 2011 at 11:28:26AM +, Amit Jathar wrote:
 Hi ,
 
 I tried to install pacemaker. While installing  'Resource Agents', I run make 
 command and got attached errors. I tried twice (did make clean also) and on 
 both occasions, error was bit different (as attached).
 
 The steps I was performing was :-
 wget -O resource-agents.tar.bz2 
 http://hg.linux-ha.org/agents/archive/tip.tar.bz2
 
 tar jxvf resource-agents.tar.bz2
  cd Cluster-Resource-Agents-*
 
 ./autogen.sh  ./configure --prefix=$PREFIX
 
 make
  sudo make install
 
 I am using CenOS 5.6 64-bit .

Try rpmbuild? Or just 'make rpm'?

Thanks,

Dejan

 Or can I use the Pacemaker with this erred source ?
 
 Thanks,
 Amit
 
 
 
 This email (message and any attachment) is confidential and may be 
 privileged. If you are not certain that you are the intended recipient, 
 please notify the sender immediately by replying to this message, and delete 
 all copies of this message and attachments. Any other use of this email by 
 you is prohibited.
 
 
 

Content-Description: ResAgent_make_error1.txt
 
 
 Note: Writing ocf_heartbeat_ClusterMon.7
 OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/CTDB meta-data  
 metadata-CTDB.xml
 /usr/bin/xsltproc --novalid \
 --stringparam package resource-agents \
 --stringparam version 1.0.4 \
 --output ocf_heartbeat_CTDB.xml \
 ra2refentry.xsl metadata-CTDB.xml
 /usr/bin/xsltproc \
 --xinclude \
 
 http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl 
 ocf_heartbeat_CTDB.xml
 Note: Writing ocf_heartbeat_CTDB.7
 OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/Delay meta-data  
 metadata-Delay.xml
 /usr/bin/xsltproc --novalid \
 --stringparam package resource-agents \
 --stringparam version 1.0.4 \
 --output ocf_heartbeat_Delay.xml \
 ra2refentry.xsl metadata-Delay.xml
 /usr/bin/xsltproc \
 --xinclude \
 
 http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl 
 ocf_heartbeat_Delay.xml
 Note: Writing ocf_heartbeat_Delay.7
 OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/Dummy meta-data  
 metadata-Dummy.xml
 /usr/bin/xsltproc --novalid \
 --stringparam package resource-agents \
 --stringparam version 1.0.4 \
 --output ocf_heartbeat_Dummy.xml \
 ra2refentry.xsl metadata-Dummy.xml
 /usr/bin/xsltproc \
 --xinclude \
 
 http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl 
 ocf_heartbeat_Dummy.xml
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue: '%' forbidden except for entities references
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue: '%' forbidden except for entities references
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue: '%' forbidden except for entities references
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue: '%' forbidden except for entities references
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue: '%' forbidden except for entities references
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue: '%' forbidden except for entities references
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue:  or ' expected
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 xmlParseEntityDecl: entity list.class not terminated
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 XML conditional section not closed
 
 ^
 unable to parse ocf_heartbeat_Dummy.xml
 gmake[1]: *** [ocf_heartbeat_Dummy.7] Error 6
 rm metadata-CTDB.xml metadata-Delay.xml metadata-Dummy.xml 
 metadata-ClusterMon.xml metadata-AudibleAlarm.xml
 gmake[1]: Leaving directory 
 `/usr/local/src/Cluster-Resource-Agents-7a11934b142d/doc'
 make: *** [all-recursive] Error 1

Content-Description: ResAgent_make_error2.txt
 gmake[1]: Entering directory 
 `/usr/local/src/Cluster-Resource-Agents-7a11934b142d/doc'
 OCF_ROOT=. OCF_FUNCTIONS_DIR=../heartbeat ../heartbeat/AoEtarget meta-data  
 metadata-AoEtarget.xml
 /usr/bin/xsltproc --novalid \
 --stringparam package resource-agents \
 --stringparam version 1.0.4 \
 --output ocf_heartbeat_AoEtarget.xml \
 ra2refentry.xsl metadata-AoEtarget.xml
 /usr/bin/xsltproc \
 --xinclude \
 
 http://docbook.sourceforge.net/release/xsl/current/manpages/docbook.xsl 
 ocf_heartbeat_AoEtarget.xml
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue: '%' forbidden except for entities references
 
 ^
 http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd:171: parser error : 
 EntityValue: 

Re: [Linux-HA] UDP / DHCP / LDIRECTORD

2011-04-26 Thread Brian Carpio
Hi Simon,

Is there any way we can coerce you to offer us some assistance with this again? 
I'm sure you are a very busy person but any help you can offer would be 
appreciated, also if you need any further information from me that would be 
greatly appreciated. 

Brian Carpio 
Senior Systems Engineer

Office: +1.303.962.7242
Mobile: +1.720.319.8617
Email: bcar...@broadhop.com


-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Brian Carpio
Sent: Monday, April 25, 2011 4:30 AM
To: General Linux-HA mailing list; 'Simon Horman'
Cc: 'lvs-devel'; 'Julian Anastasov'
Subject: Re: [Linux-HA] UDP / DHCP / LDIRECTORD

Hi,

It looks like there also might be a memory leak in this patch.. over the last 
few months we have seen memory grow slowly but lately the traffic has increased 
and the memory utilization of the Linux box is now growing faster. I put in a 
few scripts to try and detect where this memory leak was coming from and when 
watching /proc/meminfo over the last few days I saw that slab was growing. 

So I put in a new script to watch slabtop and I can see that ip_vs_conn is 
growing. The number of SLABS just grows and grows, and so does the CACHE_SIZE.  
Is there any way you have a chance to look into this for us? Any additional 
information I can give to you about this problem?

Thanks a lot,
Brian Carpio

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Brian Carpio
Sent: Friday, February 25, 2011 12:14 PM
To: General Linux-HA mailing list; 'Simon Horman'
Cc: 'lvs-devel'; 'Julian Anastasov'
Subject: Re: [Linux-HA] UDP / DHCP / LDIRECTORD

Apparently this is related to some sort of race condition (possibly a problem 
with my ldirectord start script which does an edit on the ipvsadm config after 
ldirectord has started) if ldirectord starts to receive traffic on port 67/68 
before the following commands are run:

ipvsadm -E -u 10.10.10.10:67 -o -s rr
ipvsadm -E -u 10.10.10.10:68 -o -s rr

Then it will be stuck sending traffic to the fist server in the list. 



Brian Carpio 
Senior Systems Engineer

Office: +1.303.962.7242
Mobile: +1.720.319.8617
Email: bcar...@broadhop.com


-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Brian Carpio
Sent: Thursday, February 24, 2011 3:47 PM
To: 'Simon Horman'
Cc: 'lvs-devel'; 'Julian Anastasov'; 'linux-ha@lists.linux-ha.org'
Subject: Re: [Linux-HA] UDP / DHCP / LDIRECTORD

All,

So this patch has been working for us flawlessly for the last 5 months or so. 

Our infrastructure is 100% virtualized, the other day our loadbalacner01 had a 
memory leak and crashed, since we use ldirectord with heartbeat loadbalacner02 
took over, however ever since then it seems like the single packet UDP 
scheduling has stopped working. Even if I fail back over the loadbalacner01 VM, 
I still see all the DHCP traffic going to only one backend server. 

If I run ipvsadm -L -n I can see that ipvsadm thinks both of the backend 
servers are up since the weight is set to 1 for each server, if I reboot the 
second backend server the one which is not receiving any traffic then run 
ipvsadm -L -n I can see its weight go to 0 and in the ldirectord log I can see 
that its marked dead. 

I have exported one of the loadblancers and one of the backend servers (using 
VMware) and imported them into another ESXi server, once I boot up the 
loadbalacner it works perfectly... I'm very stumped why this would happen, is 
there any additional logging you can think of that I might want to enable to 
see where the exact problem is?

Here are my configs:

 
/etc/ha.d/ldirectord.conf

checktimeout=10
checkinterval=2
autoreload=yes
logfile=/var/log/ldirectord.log
quiescent=yes
virtual=10.10.10.10:67
real=backend_server01:67 masq
real=backend_server02:67 masq
protocol=udp
checktype=ping
scheduler=rr
virtual=10.10.10.10:68
real=back_endserver01:68 masq
real=backend_server02:68 masq
protocol=udp
checktype=ping
scheduler=rr


I had to rewrite the ldirectord start script and added the following lines in 
the start and restart sections:

ipvsadm -E -u 10.10.10.10:67 -o -s rr
ipvsadm -E -u 10.10.10.10:68 -o -s rr


Here is the output of ipvsadm -L -n when both backend servers are up (working 
environment):


IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler 
Flags
  - RemoteAddress:Port   Forward Weight ActiveConn InActConn
UDP  10.10.10.10:67 rr ops
  - backend_server01:67Masq1  0  16731 
  - backend_server02:67Masq1  0  17447 
UDP  192.168.181.67:68 rr ops
  - backend_server01:68Masq1  0  0 
  - backend_server02:68Masq1  0  

Re: [Linux-HA] Problem using Stonith external/ipmi device

2011-04-26 Thread Dejan Muhamedagic
On Tue, Apr 19, 2011 at 02:46:06PM +0200, Andrew Beekhof wrote:
 On Tue, Apr 19, 2011 at 12:43 PM, Dejan Muhamedagic deja...@fastmail.fm 
 wrote:
  Hi,
 
  On Mon, Apr 11, 2011 at 09:41:12AM +0200, Andrew Beekhof wrote:
  On Fri, Apr 8, 2011 at 11:07 AM, Matthew Richardson
  m.richard...@ed.ac.uk wrote:
   On 07/04/11 16:36, Dejan Muhamedagic wrote:
   For whatever reason stonith-ng doesn't think that
   stonithipmidisk1 can manage this node. Which version of
   Pacemaker do you run? Perhaps this has been fixed in the
   meantime. I cannot recall right now if there has been such a
   problem, but it's possible. You can also try to turn debug on
   and see if there are more clues.
  
   I'm using Pacemaker 1.1.5 from the clusterlabs rpm-next repositories on 
   el5.
  
   I've tried turning on debug, but there's no more information coming out
   in the logs.
 
  man stonithd has the bits you need.
  start with pcmk_host_check
 
  That defaults to dynamic-list which should query the resource.
  Right?
 
 Right.
 
  Apparently, something's not quite ok there.
 
 the list command doesn't work perhaps?

Yes, it does work. And it's been working since forever, as you
know. Unless there's something wrong with the installation.

Whatever happened here? Matthew?

Thanks,

Dejan

  BTW, I've
  been doing tests with external/ssh and it did work fine.
 
 also fine with fence_xvm
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems