Re: [Linux-HA] [Pacemaker] [Cluster-devel] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread David Vossel


- Original Message -
 
  On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree l...@suse.com wrote:
  
  On 2014-11-24T16:16:05, Fabio M. Di Nitto fdini...@redhat.com wrote:
  
  Yeah, well, devconf.cz is not such an interesting event for those who do
  not wear the fedora ;-)
  That would be the perfect opportunity for you to convert users to Suse ;)
  
  I´d prefer, at least for this round, to keep dates/location and explore
  the option to allow people to join remotely. Afterall there are tons of
  tools between google hangouts and others that would allow that.
  That is, in my experience, the absolute worst. It creates second class
  participants and is a PITA for everyone.
  I agree, it is still a way for people to join in tho.
  
  I personally disagree. In my experience, one either does a face-to-face
  meeting, or a virtual one that puts everyone on the same footing.
  Mixing both works really badly unless the team already knows each
  other.
  
  I know that an in-person meeting is useful, but we have a large team in
  Beijing, the US, Tasmania (OK, one crazy guy), various countries in
  Europe etc.
  Yes same here. No difference.. we have one crazy guy in Australia..
  
  Yeah, but you're already bringing him for your personal conference.
  That's a bit different. ;-)
  
  OK, let's switch tracks a bit. What *topics* do we actually have? Can we
  fill two days? Where would we want to collect them?
 
 Personally I'm interested in talking about scaling - with pacemaker-remoted
 and/or a new messaging/membership layer.

If we're going to talk about scaling, we should throw in our new docker support
in the same discussion. Docker lends itself well to the pet vs cattle analogy.
I see management of docker with pacemaker making quite a bit of sense now that 
we
have the ability to scale into the cattle territory.

 Other design-y topics:
 - SBD
 - degraded mode
 - improved notifications
 - containerisation of services (cgroups, docker, virt)
 - resource-agents (upstream releases, handling of pull requests, testing)

Yep, We definitely need to talk about the resource-agents.

 
 User-facing topics could include recent features (ie. pacemaker-remoted,
 crm_resource --restart) and common deployment scenarios (eg. NFS) that
 people get wrong.

Adding to the list, it would be a good idea to talk about Deployment
integration testing, what's going on with the phd project and why it's
important regardless if you're interested in what the project functionally
does.

-- Vossel

 ___
 Pacemaker mailing list: pacema...@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: http://bugs.clusterlabs.org
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] RHEL Server 6.6 HA Configuration

2014-11-21 Thread David Vossel


- Original Message -
 I was trying to install Corosync and Cman using
 yum install -y pacemaker cman pcs ccs resource-agents
 
 This works fine on Centos 6.3. Tried the same on Redhat Redhat Enterprise
 Linux Server 6.6 and ran into issues. It gives error like
 
 Loaded plugins: product-id, refresh-packagekit, rhnplugin, security,
 subscription-manager
 There was an error communicating with RHN.
 RHN Satellite or RHN Classic support will be disabled.
 
 Error Message:
 Please run rhn_register as root on this client
 Error Class Code: 9
 Error Class Info: Invalid System Credentials.
 Explanation:
  An error has occurred while processing your request. If this problem
  persists please enter a bug report at bugzilla.redhat.com.
  If you choose to submit the bug report, please be sure to include
  details of what you were trying to do when this error occurred and
  details on how to reproduce this problem.
 
 Setting up Install Process
 No package pacemaker available.
 No package cman available.
 No package pcs available.
 No package ccs available.
 Nothing to do
 
 centos.repo is as follows...
 vim /etc/yum.repos.d/centos.repo is as below
 [centos-6-base]
 name=CentOS-$releasever - Base
 mirrorlist=http://mirrorlist.centos.org/?release=$releaseverarch=$basearchrepo=os
 enabled=0
 #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
 
 Realized that this Redhat version does not have the High availability addon
 package. This package needs to be bought OR the version needs to be upgraded
 to 7. I got information like Pacemaker has been available as part of RHEL,
 since 6.0 as part of the High Availability (HA) add-on.
 
 Question:
 
 1.Is the above understanding correct?

yes

 
2.Are there significant differences in the manner Corosync and CMan are
 configured on Enterprise server Vs Centos?

Pacemaker in Centos 6.6 and RHEL 6.6 should be configured the same way.

-- Vossel

 
 Thank You,
 Ranjan
 
 
 
 --
 View this message in context:
 http://linux-ha.996297.n3.nabble.com/RHEL-Server-6-6-HA-Configuration-tp15945.html
 Sent from the Linux-HA mailing list archive at Nabble.com.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat in Amazon VMs doest not create virtaul ip address

2014-11-14 Thread David Vossel


- Original Message -
 Hi, I installed on HeartBeat,Centos 6.5 on 2 Amazon EC2 machinesthis is the

If you have an option, I'd strongly recommend using the Pacemaker+CMAN stack
in rhel 6.5. Red Hat began supporting pacemaker in 6.5, so it should be 
available
to you.

-- Vossel

 version:
 [root@ip-10-0-2-68 ha.d]# rpm -qa | grep heartbeat
 heartbeat-libs-3.0.4-2.el6.x86_64
 heartbeat-3.0.4-2.el6.x86_64
 heartbeat-devel-3.0.4-2.el6.x86_64
 
 the floating IP is [root@ip-10-0-2-68 ha.d]# cat haresources
 ip-10-0-2-68 10.0.2.70
 but it is not created on any machine, it does not matter where I do the
 takeover or standby commands
 what am I missing? is this even possible ? these are my setting in ha.cf
 logfacility local0
 ucast eth0 10.0.2.69
 auto_failback on
 node ip-10-0-2-68 ip-10-0-2-69
 ping 10.0.2.1
 use_logd yes
 logfacility local0
 ucast eth0 10.0.2.68
 auto_failback on
 node ip-10-0-2-68 ip-10-0-2-69
 ping 10.0.2.1
 use_logd yes
 
 these is the output of the route command
 [root@ip-10-0-2-68 ha.d]# route -n
 Kernel IP routing table
 Destination Gateway Genmask Flags Metric Ref    Use Iface
 10.0.2.0    0.0.0.0 255.255.255.0   U 0  0    0 eth0
 0.0.0.0 10.0.2.1    0.0.0.0 UG    0  0    0 eth0
 [root@ip-10-0-2-68 ha.d]#
 
 this is how the interfaces eth0 are set up on machine 1:[root@ip-10-0-2-68
 ha.d]# ifconfig
 eth0  Link encap:Ethernet  HWaddr 12:23:49:EF:3A:53
   inet addr:10.0.2.68  Bcast:10.0.2.255  Mask:255.255.255.0
   inet6 addr: fe80::1023:49ff:feef:3a53/64 Scope:Link
   UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
   RX packets:269823 errors:0 dropped:0 overruns:0 frame:0
   TX packets:192305 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:167802149 (160.0 MiB)  TX bytes:48341828 (46.1 MiB)
   Interrupt:247
 
 
 these are the logs showing everything going on fine but when doing ifconfig
 the interface is not there:
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: node
 ip-10-0-2-69: is dead
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Comm_now_up():
 updating status to active
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Local status
 now set to: 'active'
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: No STONITH
 device configured.
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: Shared disks
 are not protected.
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Resources being
 acquired from ip-10-0-2-69.
 Nov 11 21:37:39 ip-10-0-2-68 mach_down(default)[14769]: info:
 /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: mach_down
 takeover complete.
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Initial
 resource acquisition complete (mach_down)
 Nov 11 21:37:39 ip-10-0-2-68
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[14845]: INFO:
   Resource is stopped
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14701]: [14701]: info: Local Resource
 acquisition completed.
 Nov 11 21:37:40 ip-10-0-2-68
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[14958]: INFO:
   Resource is stopped
 Nov 11 21:37:40 ip-10-0-2-68 IPaddr(IPaddr_10.0.2.70)[15057]: INFO:
 /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
 /var/run/resource-agents/send_arp-10.0.2.70 eth0 10.0.2.70 auto not_used
 not_used
 Nov 11 21:37:40 ip-10-0-2-68
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[15064]: INFO:
   Success
 Nov 11 21:37:49 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Local Resource
 acquisition completed. (none)
 Nov 11 21:37:49 ip-10-0-2-68 heartbeat[14681]: [14681]: info: local resource
 transition completed.
 
 
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: node ip-10-0-2-68: is
 dead
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Comm_now_up():
 updating status to active
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Local status now set
 to: 'active'
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: No STONITH device
 configured.
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: Shared disks are not
 protected.
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Resources being
 acquired from ip-10-0-2-68.
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18360]: info: No local resources
 [/usr/share/heartbeat/ResourceManager listkeys ip-10-0-2-69] to acquire.
 Nov 11 21:38:17 ip-10-0-2-69
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[18441]: INFO:
   Resource is stopped
 Nov 11 21:38:17 ip-10-0-2-69 IPaddr(IPaddr_10.0.2.70)[18537]: INFO:
 /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
 /var/run/resource-agents/send_arp-10.0.2.70 eth0 10.0.2.70 auto not_used
 not_used
 Nov 11 21:38:17 ip-10-0-2-69
 

Re: [Linux-HA] MySQL resource agent and MySQL Master/Slave replication

2014-11-14 Thread David Vossel


- Original Message -
 Dear guys,
 
 I have set up a MySQL Master/Slave replication and tried to make it HA using
 corosync/pacemaker. I used the Mysql Resource Agent currently provided by
 Debian
 Wheezy and set up the corosync primitives as mentioned in the examples[1].
 When migrating the mysql resource from one node to the other, the mysql
 resource agent didn't handle the slaves at all. As I inspected the code of
 the mysql
 resource agent, I figured out, that all the slave handling is done in the
 mysql_notify
 function of the resource agent. To enable the multi state resource ms_mysql
 to
 send notifications to the mysql resource agent, I added:
 
 ms ms_mysql p_mysql \
   meta clone-max=3 \
   *meta notify=true*
 
 Now the MySQL replication slaves get started/stopped as expected.
 
 You may add this parameter to your documentation[2], so other users don't
 have to
 figure it out themselves :-)

good observation, thanks for the feedback!

-- Vossel

 
 cheers
 
 Tom
 
 --
 unixum
 Tom Hutter
 Hufnertwiete 3
 D-22305 Hamburg
 Telefon: +49 40 67 39 21 52
 mobil  : +49 174 400 24 16
 E-Mail: tom.hut...@unixum.de
 
 www.unixum.de
 
 Steuer-Nr.: 43/102/01125
 
 Geschäftsführung: Tom Hutter
 
 
 
 [1] http://www.linux-ha.org/wiki/Mysql_%28resource_agent%29
 [2]
 http://www.linux-ha.org/wiki/Mysql_%28resource_agent%29#MySQL_master.2Fslave_replication
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-28 Thread David Vossel


- Original Message -
 22.10.2014 12:02, Dejan Muhamedagic wrote:
  On Mon, Oct 20, 2014 at 07:12:23PM +0300, Vladislav Bogdanov wrote:
  20.10.2014 18:23, Dejan Muhamedagic wrote:
  Hi Vladislav,
 
  Hi Dejan!
 
 
  On Mon, Oct 20, 2014 at 09:03:40AM +0300, Vladislav Bogdanov wrote:
  Hi Kristoffer,
 
  do you plan to add support for recently added remote node attributes
  feature to chmsh?
 
  Currently (at least as of 2.1, and I do not see anything relevant in the
  git log) crmsh fails to update CIB if it contains node attributes for
  remote (bare-metal) node, complaining that duplicate element is found.
 
  No wonder :) The uname effectively dubs as an element id.
 
  But for bare-metal nodes it is natural to have ocf:pacemaker:remote
  resource with name equal to remote node uname (I doubt it can be
  configured differently).
 
  Is that required?
 
  Didn't look in code, but seems like yes, :remote resource name is the
  only place where pacemaker can obtain that node name.
  
  I find it surprising that the id is used to carry information.
  I'm not sure if we had a similar case (apart from attributes).
  
  If I comment check for 'obj_id in id_set', then it fails to update CIB
  because it inserts above primitive definition into the node section.
 
  Could you please show what would the CIB look like with such a
  remote resource (in crmsh notation).
 
 
 
  node 1: node01
  node rnode001:remote \
 attributes attr=value
  primitive rnode001 ocf:pacemaker:remote \
  params server=192.168.168.20 \
  op monitor interval=10 \
  meta target-role=Started
  
  What do you expect to happen when you reference rnode001, in say:
 
 That is not me ;) I just want to be able to use crmsh to assign remote
 node operational and utilization (?) attributes and to work with it
 after that.
 
 Probably that is not yet set in stone, and David may change that
 allowing to f.e. new 'node_name' parameter to ocf:pacemaker:remote
 override remote node name guessed from the primitive name.
 
 David, could you comment please?

why we would want to separate the remote-node from the resource's primative
instance name?

-- David

 
 Best,
 Vladislav
 
  
  crm configure show rnode001
  
  I'm still trying to digest having hostname used to name some
  other element. Wonder what/where else will we have issues for
  this reason.
  
  Cheers,
  
  Dejan
  
  Best,
  Vladislav
 
  Given that nodes are for the most part referenced by uname
  (instead of by id), do you think that a configuration where
  a primitive element is named the same as a node, the user can
  handle that in an efficient manner? (NB: No experience here with
  ocf:pacemaker:remote :)
 
 
 
 
  Cheers,
 
  Dejan
 
 
 
  Best,
  Vladislav
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
  
 
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Virtual address for slave

2014-08-01 Thread David Vossel


- Original Message -
 Hello!
 
   I'd like to have two virtual adresses: vip-master and vip-slave.
 vip-master should be bound to master mode, vip-slave should be bound to
 slave node.
   How can I do it ?

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_multi_state_constraints

 Best regards
 Jarek
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Managed Failovers w/ NFS HA Cluster

2014-07-21 Thread David Vossel


- Original Message -
 I feel like this is something that must have been covered extensively already
 but I've done a lot of googling, looked at a lot of cluster configs, but
 have not found the solution.
 
 I have an HA NFS cluster (corosync+pacemaker).  The relevant rpms are listed
 below but I'm not sure they are that important to the question which is
 this...
 
 When performing managed failovers of the NFS-exported file system resource
 from one node to the other (crm resource move), any active NFS clients
 experience an I/O error when the file system is unexported.  In other words,
 you must unexport it to unmount it.  As soon as it is unexported, clients
 are no longer able to write to it and experience an I/O error (rather than
 just blocking).
 
 In a failure scenario this is not a problem becuase the file system is never
 unexported on the primary server.  Rather the server just goes down, the
 secondary takes over the resources and client I/O blocks until the process
 is complete and then goes about its business.   We would like this same
 behavior for a *managed* failover but have not found a mount or export
 option/scenario that works.   Is it possible?  What am I missing?
 
 I realize this is more of an nfs/exportfs question but I would think that
 those implementing NFS HA clusters would be familiar with the scenario I'm
 describing.

read this.

NFS Active/Passive
https://github.com/davidvossel/phd/blob/master/doc/presentations/nfs-ap-overview.pdf?raw=true

NFS Active/Active
https://github.com/davidvossel/phd/blob/master/doc/presentations/nfs-aa-overview.pdf?raw=true

Note that the nfsnotify agent and the nfsserver agents have had a lot of work
done to them in the last month or two upstream. Depending on what distro you 
are using, you may benefit from using the latest upstream agents (if rhel based
definitely use the upstream agents.)


-- Vossel


 Regards,
 
 Charlie Taylor
 
 pacemaker-cluster-libs-1.1.7-6.el6.x86_64
 pacemaker-cli-1.1.7-6.el6.x86_64
 pacemaker-1.1.7-6.el6.x86_64
 pacemaker-libs-1.1.7-6.el6.x86_64
 resource-agents-3.9.2-40.el6.x86_64
 fence-agents-3.1.5-35.el6.x86_64
 
 Red Hat Enterprise Linux Server release 6.3 (Santiago)
 
 Linux biostor3.ufhpc 2.6.32-279.19.1.el6.x86_64 #1 SMP Sat Nov 24 14:35:28
 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
 
 [root@biostor4 bs34]# crm status
 
 Last updated: Thu Jul 17 10:55:04 2014
 Last change: Thu Jul 17 07:59:47 2014 via crmd on biostor3.ufhpc
 Stack: openais
 Current DC: biostor3.ufhpc - partition with quorum
 Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
 2 Nodes configured, 2 expected votes
 20 Resources configured.
 
 
 Online: [ biostor3.ufhpc biostor4.ufhpc ]
 
  Resource Group: grp_b3v0
  vg_b3v0  (ocf::heartbeat:LVM):   Started biostor3.ufhpc
  fs_b3v0  (ocf::heartbeat:Filesystem):Started biostor3.ufhpc
  ip_vbio3 (ocf::heartbeat:IPaddr2):   Started biostor3.ufhpc
  ex_b3v0_1(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  ex_b3v0_2(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  ex_b3v0_3(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  ex_b3v0_4(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  ex_b3v0_5(ocf::heartbeat:exportfs):  Started biostor3.ufhpc
  Resource Group: grp_b4v0
  vg_b4v0  (ocf::heartbeat:LVM):   Started biostor4.ufhpc
  fs_b4v0  (ocf::heartbeat:Filesystem):Started biostor4.ufhpc
  ip_vbio4 (ocf::heartbeat:IPaddr2):   Started biostor4.ufhpc
  ex_b4v0_1(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  ex_b4v0_2(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  ex_b4v0_3(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  ex_b4v0_4(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  ex_b4v0_5(ocf::heartbeat:exportfs):  Started biostor4.ufhpc
  st_bio3  (stonith:fence_ipmilan):Started biostor4.ufhpc
  st_bio4  (stonith:fence_ipmilan):Started biostor3.ufhpc
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Patch/recommendation for ocf:heartbeat:Filesystem cifs

2014-06-19 Thread David Vossel




- Original Message -
 From: Stefan Bauer (IZLBW Extern) stefan.ba...@iz.bwl.de
 To: linux-ha@lists.linux-ha.org
 Sent: Wednesday, June 18, 2014 1:42:14 AM
 Subject: [Linux-HA] Patch/recommendation for ocf:heartbeat:Filesystem cifs
 
 Dear Users/Developers,
 
 we're using ocf:heartbeat:Filesystem but fail to unmount cifs mounts if the
 cifs server went down.
 Please consider adding -l (lazy umount) to the umount_force variable in the
 RA.

Does the umount -f option even make sense for cifs, or should we completely 
replace -f with -l when cifs is in use?  the '-f' option only references NFS in 
the man page.

You could propose this patch as a git pull request if you want. The 
resource-agent's source code is located here. 
https://github.com/ClusterLabs/resource-agents


-- Vossel

 With the above option in use, we could unmounts the cifs share cleanly
 without running in any timeouts.
 
 Cheers
 
 Stefan
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Packemaker resources for Galera cluster

2014-06-09 Thread David Vossel




- Original Message -
 From: Razvan Oncioiu ronci...@gmail.com
 To: linux-ha@lists.linux-ha.org
 Sent: Wednesday, June 4, 2014 11:48:01 PM
 Subject: [Linux-HA] Packemaker resources for Galera cluster
 
 Hello,
 
 I can't seem to find  a proper way of setting up resources in pacemaker to
 manager my Galera cluster. I want a VIP that will failover betwen 5 boxes (
 this works ), but I would also like to tie this into a resources that
 monitors mysql as well. if a mysql instance goes down, the VIP should move
 to another box that has mysql actually running. But I do not want pacemaker
 to start or stop the mysql service. Here is my current configuration:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-crmsh/html-single/Pacemaker_Explained/index.html#s-resource-options

Make a cloned mysql resource and set the 'is-managed=false' meta attribute on 
the resource. Pacemaker will monitor if mysql is up, but not attempt to 
start/stop it.

Like emmanuel said, you'll need the order constraint start mysql then start 
VIP.

You'll also need a colocation constraint that will force the VIP to locate to a 
node with an active mysql service. colocate VIP with mysql

so...
- make VIP resource
- make cloned mysql resource with is-managed=false
- order start start mysql-clone then VIP
- colocate VIP with mysql-clone

Good luck!

-- Vossel



 node galera01
 node galera02
 node galera03
 node galera04
 node galera05
 primitive ClusterIP IPaddr2 \
 params ip=10.10.10.178 cidr_netmask=24 \
 meta is-managed=true \
 op monitor interval=5s
 primitive p_mysql mysql \
 params pid=/var/lib/mysql/mysqld.pid test_user=root
 test_passwd=goingforbroke \
 meta is-managed=false \
 op monitor interval=5s OCF_CHECK_LEVEL=10 \
 op start interval=0 timeout=60s \
 op stop interval=0 timeout=60s on-fail=standby
 group g_mysql p_mysql ClusterIP
 order order_mysql_before_ip Mandatory: p_mysql ClusterIP
 property cib-bootstrap-options: \
 dc-version=1.1.10-14.el6_5.3-368c726 \
 cluster-infrastructure=classic openais (with plugin) \
 stonith-enabled=false \
 no-quorum-policy=ignore \
 expected-quorum-votes=5 \
 last-lrm-refresh=1401942846
 rsc_defaults rsc-options: \
 resource-stickiness=100
 
 
 
 
 --
 View this message in context:
 http://linux-ha.996297.n3.nabble.com/Packemaker-resources-for-Galera-cluster-tp15668.html
 Sent from the Linux-HA mailing list archive at Nabble.com.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Problem with migration, priority, stickiness

2014-05-22 Thread David Vossel




- Original Message -
 From: Tony Stocker tony.stoc...@nasa.gov
 To: Linux HA Cluster Development List linux-ha@lists.linux-ha.org
 Sent: Tuesday, May 20, 2014 8:18:52 AM
 Subject: [Linux-HA] Problem with migration, priority, stickiness
 
 Cluster s/w specs:
 Kernel: 2.6.32-431.17.1.el6.x86_64
 OS: CentOS 6.5
 corosync-1.4.1-17.el6_5.1.x86_64
 pacemaker-1.1.10-14.el6_5.3.x86_64
 crmsh-2.0+git46-1.1.x86_64
 
 
 Attached to this email are two text files, one contains the output of 'crm
 configure show' (addresses sanitized) and the other contains the output of
 'crm_simulate -sL'
 
 Here is the situation, and we've encountered this multiple times now and
 I've been unable to solve it:
 
  * A machine in the cluster fails
  * There is a spare node, unused, in the cluster available for
  assignment
  * The resource group that was on the failed machine, instead of
  being put onto the spare, unused node is placed on a node where
  another resource group is already running
  * The displaced resource group then is launched on the spare,
  unused node
 
 As an example, this morning the following occurred:
 
  Resource Group NRTMASTER is running on system gpmhac01
  Resource Group NRTPNODE1 is running on system gpmhac02
  Resource Group NRTPNODE2 is running on system gpmhac05
  Resource Group NRTPNODE3 is running on system gpmhac04
  Resource Group NRTPNODE4 is running on system gpmhac03
 
  system gpmhac06 is up, available, and unused
 
  system gpmhac04 fails and powers off
 
  Resource Group NRTPNODE3 is moved to system gpmhac05
  Resource Group NRTNPODE2 is moved to system gpmhac06
 
 
 One of the big things that seems to occur here is that while the group
 NRTPNODE3 is being launched on gpmhac05, the group NRTPNODE2 is being shut
 down simultaneously which is causing race conditions where one start
 script is putting a state file in place, while the stop script is erasing
 it.  This leaves the system in an unuseable state because required files,
 parameters, and settings are missing/corrupted.
 
 Secondly, there is simply no reason to kill a perfectly healthy resource
 group, that is operating just fine in order to launch a resource group
 whose machine has failed when:
  1. There's a spare node available
  2. The resource groups have equal priority with each other, i.e.
  all of the NRTPNODE# resource groups have priority 60
 
 
 So I really need some help here in getting this setup so that it behaves
 the way we *think* it should be doing based on what we understand of the
 Pacemaker architecture.  Obviously we're missing something since this
 resource group shuffling occurs when there's a failed system, despite
 having an unused, spare node available for immediate use, and has bitten
 us several times.  The fact that the race condition between startup and
 shutdown is also causing the system that is brought up to be useless is
 exacerbating the situation immensely.
 
 Ideally, this is what we want:
 
  1. If a system fails, the resources/resource group running on it
  are moved to an unused, available system.  No other resource
  shuffling occurs amongst system occurs.
 
  2. If a system fails and there is not an unused, available
  system to fail over to, then IF the resource group has a higher
  priority than another resource group, the group with the lower
  priority is shutdown.  Only when that shutdown is complete will
  the resource group with the higher priority start its startup of
  resources.
 
  3. If a system fails and there is not an unused, available
  system to fail over to, then IF the resource group has the same
  or lower priority to all other resource groups, then it will not
  attempt to launch itself on any other node, nor cause any other
  resource group to stop or migrate.
 
  4. Unless specifically, and manually, ordered to move OR if the
  hardware system fails, a resource group should remain on its
  current hardware system.  It should never be forced to migrate to
  a new system because something of equal or lower priority failed
  and migrated to a new system.
 
  5. We do not need resource groups to fail back to original nodes,
  when running we want them to stay running on their current system
  until/unless a hardware failure occurs and forces them off the
  system, or we manually tell them to move.
 
 
 Can someone please look over our configuration, and the bizzare scores
 that I see from the crm_simulate output, and help get me to the point
 where I can achieve an HA cluster that doesn't kill healthy resources in
 some kind of game of musical chairs when there's an empty chair available.
 Can you also tell me why or help 

[Linux-HA] Active/Active nfs server lock recovery?

2014-04-21 Thread David Vossel
Hey,

Has anyone had any success with deploying an Active/Active NFS server?  I'm 
curious how lock recovery is performed.

In a typically Active/Passive scenario we have a nfs-server instance coupled 
with the exportfs. The nfs lock info is stored on some shared storage that 
follows that nfs server and the exportfs instances around the cluster.  This 
allows us to alert the nfs clients after the failover that the server rebooted 
and that they need to re-establish their locks.

With an Active/Active setup, we'd have multiple nfs servers and exportfs 
instances, non of which are tied to one another. Meaning that the exportfs 
resources could run on any of the nfs server instances within the cluster.  On 
failover, if we wanted the exportfs resources on a failed node to be taken over 
by another already existing nfs server on another node. In this instance does 
anyone know of a good way to alert the nfs clients previously connected to the 
old (failed) node that they need to re-establish their locks with the new node? 
 It seems like the statd info from both the failed node's nfs server and the 
new node's nfs server would have to be merged or something.

any thoughts?

-- Vossel
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How to tell pacemaker to process a new event during a long-running resource operation

2014-03-14 Thread David Vossel
- Original Message -
 From: Maloja01 maloj...@arcor.de
 To: Linux-HA linux-ha@lists.linux-ha.org
 Sent: Friday, March 14, 2014 5:32:34 AM
 Subject: [Linux-HA] How to tell pacemaker to process a new event during a 
 long-running resource operation
 
 Hi all,
 
 I have a resource which could in special cases have a very long-running
 start operation.

in-flight operations always have to complete before we can process a new 
transition.  The only way we can transition earlier is by killing the in-flight 
process, which results in failure recovery and possibly fencing depending on 
what operation it is.

There's really nothing that can be done to speed this up except work on 
lowering the startup time of that resource.

-- Vossel

 If I have a new event (like switching a standby node back to online)
 during the already running transition (cluster is still
 S_TRANSITION_ENGINE) I would like the cluster to process them as soon
 as possible and not only after the other resource came up.
 
 Is that possible? I tried already batch-limit but I guess this is only
 to make actions parallel in a combined transition, right?
 
 Thanks in advance
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Announcing docker resource-agent

2014-01-27 Thread David Vossel
Hey,

I've created a docker resource agent that allows docker containers to be 
managed with pacemaker. The agent is up for review here, 
https://github.com/ClusterLabs/resource-agents/pull/370

Docker is a relatively new and fast moving project.  I'd be surprised if anyone 
here is using it in production yet, but I'm sure some of you have investigated 
how it could be used.  For review feedback, I'm not so much interested in a 
code review as much as a use-case analysis.  How do you use or foresee yourself 
using docker containers in an HA environment, and does this agent work for your 
use-case?

Thanks,
-- Vossel
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Announcing a new HA KVM tutorial!

2014-01-07 Thread David Vossel
- Original Message -
 From: Digimer li...@alteeve.ca
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Monday, January 6, 2014 10:19:05 AM
 Subject: [Linux-HA] Announcing a new HA KVM tutorial!
 
 Almost exactly two years ago, I released the first tutorial for building
 an HA platform for KVM VMs. In that time, I have learned a lot, created
 some tools to simplify management and refined the design to handle
 corner-cases seen in the field.
 
 Today, the culmination of that learning is summed up in the 2nd
 Edition of that tutorial, now called AN!Cluster Tutorial 2.
 
 https://alteeve.ca/w/AN!Cluster_Tutorial_2
 
 These HA KVM platforms have been in production for over two years now in
 facilities all over the world; Universities, municipal governments,
 corporate DCs, manufacturing facilities, etc. I've gotten wonderful
 feedback from users and all that real-world experience has been
 integrated into this new tutorial.
 
 As always, everything is 100% open source and free-as-in-beer!
 
 The major changes are:
 
 * SELinux and iptables are enabled and used.
 * Numerous slight changes made to the OS and cluster stack configuration
 to provide better corner-case fault handling.
 
 * Architecture refinements;
 ** Redundant PSUs, UPSes and fence methods emphasized.
 ** Monitoring multiple UPSes added via modified apcupsd
 ** Detailed monitoring of LSI-based RAID controllers and drives
 ** Discussion on hardware considerations for VM performance based on
 anticipated work loads
 
 * Naming convention changes to support the new AN!CDB dashboard[1]
 ** New alert system covered with fault and notable event alerting
 
 * Wider array of guest OSes are covered;
 ** Windows 7
 ** Windows 8
 ** Windows 2008 R2
 ** Windows 2012
 ** Solaris 11
 ** FreeBSD 9
 ** RHEL 6
 ** SLES 11
 
 Beyond that, the formatting of the tutorial itself has been slightly
 modified. I do think it is the easiest to follow tutorial I have yet
 been able to produce. I am very proud of this one! :D
 
 As always, feedback is always very much appreciated. Everything from
 typos/grammar mistakes, functional problems or anything else is very
 valuable. I take all the feedback I get and use it to helping make the
 tutorials better.
 
 Enjoy!

wow, that's a seriously awesome tutorial. Excellent work :)

-- Vossel

 
 Digimer, who now can now start the next tutorial in earnest!
 
 1. https://alteeve.ca/w/AN!CDB
 
 --
 Digimer
 Papers and Projects: https://alteeve.ca/w/
 What if the cure for cancer is trapped in the mind of a person without
 access to education?
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] kamailio OCF resource agent for pacemaker

2014-01-07 Thread David Vossel
- Original Message -
 From: WENK Stefan stefan.w...@frequentis.com
 To: sr-...@lists.sip-router.org, linux-ha@lists.linux-ha.org
 Sent: Tuesday, January 7, 2014 3:12:58 AM
 Subject: [Linux-HA] kamailio OCF resource agent for pacemaker
 
 Hello,
 
 attached you find an initial version of a kamailio OCF compliant resource
 agent for pacemaker, which is currently running within a prototype
 laboratory on Redhat Enterprise Linux 6.x. Please keep in mind that it
 survived testing in a very controlled environment and it is young and some
 issues/bugs are likely to be found.
 
 I was allowed by FREQUENTIS to provide this script to the community under the
 GPL v2 license with the goal that putting this resources agent under the
 maintenance of the community, the safer and better it becomes long term.

I'd hate to see this work disappear. If there is a community member who is 
interested testing this agent and working to get it pushed upstream, I'd 
happily help in the review process.

To start this review process, we need someone to create a pull request for the 
upstream resource-agents git repo.
https://github.com/ClusterLabs/resource-agents

Thanks,
-- Vossel

 Please note that I won't be able to respond on questions, because my mail
 account is going to be closed in a few days.
 
 Regards,
   
 Stefan Wenk
 
 Internal note: The released version corresponds with rel_0_19_0.
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] FAQ(?): What's the process list?

2014-01-02 Thread David Vossel




- Original Message -
 From: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
 To: linux-ha@lists.linux-ha.org
 Sent: Friday, December 27, 2013 1:29:36 AM
 Subject: [Linux-HA] FAQ(?): What's the process list?
 
 Hi!
 
 I wonder what information the following log message contains:
  corosync[20745]:  [pcmk  ] info: update_member: Node (...) process list:
  00151212 (1380882)
 
 Are ther individual bits that describe some features, or how ist this list
 built?

This list consists of the pacemaker components (lrmd, attrd, cib, crmd, 
stonithd, pengine).  Different bits represent different pacemaker components.

 
 Regards,
 Ulrich
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pacemaker restarts services (on the same node) when failed node returns

2013-12-13 Thread David Vossel
- Original Message -
 From: Peto Michalak peto.micha...@gmail.com
 To: dvos...@redhat.com, linux-ha linux-ha@lists.linux-ha.org
 Sent: Wednesday, December 11, 2013 2:26:02 AM
 Subject: Re: [Linux-HA] pacemaker restarts services (on the same node) when 
 failed node returns
 
 Hi David,
 
 I've attached a crm_report which should show the restart of services, when
 failed node returns.
 
 I will go through the report as well to see if I find something there.


The constraint in the xml below is what causes the restart. You are telling 
pacemaker place the PGServer on node drpg-02... When drpg-02 joins the cluster, 
PGserver restarts because it is being relocated to prpg-02. This should be 
expected. 

  rsc_location id=cli-prefer-PGServer rsc=PGServer
rule id=cli-prefer-rule-PGServer score=INFINITY boolean-op=and
  expression id=cli-prefer-expr-PGServer attribute=#uname 
operation=eq value=drpg-02 type=string/
/rule
  /rsc_location


 
 Thank you for your help.
 
 Best Regards,
 -Peter
 
  Hello,
 
  I really searched for the answer before posting : ).
 
  I have a pacemaker setup + corosync + drbd in Active/Passive mode running
  in 2 node cluster on Ubuntu 12.04.3.
 
  Everything works fine and on node failure the services are taken care of
 by
  the other node (THANKS guys!), well the problem is that I've noticed, that
  once the failed node comes back alive, the pacemaker restarts the
  postgresql and virtual IP and that takes around 4-7 seconds (but keeps it
  on the same node as I wanted, what's the point? :) ). Is this really
  necessary, or I've messed up something in the configuration?
 
 any chance you could provide us with a crm_report during the time frame of
 this unwanted restart?
 
 -- Vossel
 
  My pacemaker config:
 
  node drpg-01 attributes standby=off
  node drpg-02 attributes standby=off
  primitive drbd_pg ocf:linbit:drbd \
  params drbd_resource=drpg \
  op monitor interval=15 \
  op start interval=0 timeout=240 \
  op stop interval=0 timeout=120
  primitive pg_fs ocf:heartbeat:Filesystem \
  params device=/dev/drbd/by-res/drpg directory=/db/pgdata
  options=noatime,nodiratime fstype=xfs \
  op start interval=0 timeout=60 \
  op stop interval=0 timeout=120
  primitive pg_lsb lsb:postgresql \
  op monitor interval=30 timeout=60 \
  op start interval=0 timeout=60 \
  op stop interval=0 timeout=60
  primitive pg_vip ocf:heartbeat:IPaddr2 \
  params ip=10.34.2.60 iflabel=pgvip \
  op monitor interval=5
  group PGServer pg_fs pg_lsb pg_vip
  ms ms_drbd_pg drbd_pg \
  meta master-max=1 master-node-max=1 clone-max=2
  clone-node-max=1 notify=true
  colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master
  order ord_pg inf: ms_drbd_pg:promote PGServer:start
  property $id=cib-bootstrap-options \
  dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \
  cluster-infrastructure=openais \
  expected-quorum-votes=2 \
  no-quorum-policy=ignore \
  pe-warn-series-max=1000 \
  pe-input-series-max=1000 \
  pe-error-series-max=1000 \
  default-resource-stickiness=1000 \
  cluster-recheck-interval=5min \
  stonith-enabled=false \
  last-lrm-refresh=1385646505
 
  Thank you.
 
  Best Regards,
  -Peter
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Problem not in our membership

2013-12-10 Thread David Vossel
- Original Message -
 From: Moullé Alain alain.mou...@bull.net
 To: linux-ha@lists.linux-ha.org
 Sent: Tuesday, December 10, 2013 1:50:34 AM
 Subject: Re: [Linux-HA] Problem not in our membership
 
 Hi
 
 Sorry to ask again about this problem , does somebody has the answer ?

Well, it certainly looks like it will fix the error you're seeing.  There's 
only one way to know for sure. Give it a try :)  I remember running into things 
similar to that a couple of years ago.  I don't know if that patch was the only 
one involved.

-- Vossel

 
 Thanks
 Alain
 
 Le 06/12/2013 08:57, Moullé Alain a écrit :
  Hi,
  I've found a thread talking about this problem on 1.1.7, but at the
  end , is the patch :
  https://github.com/ClusterLabs/pacemaker/commit/03f6105592281901cc10550b8ad19af4beb5f72f
 
  sufficient and correct to solve the problem ?
  Thanks
  Alain
 
  Le 03/12/2013 10:15, Moullé Alain a écrit :
  Hi,
 
  with :   pacemaker-1.1.7-6   corosync-1.4.1-15
 
  On crm migrate , I'm randomly facing this problem :
 
  ... node1 daemon warning cib  warning: cib_peer_callback: Discarding
  cib_apply_diff message (342) from node2: not in our membership
 
  whereas the node2 is healthy and always member of the cluster.
 
  Is-it a known problem ?
  Is there already a patch ?
 
  Thanks
  Alain
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] pacemaker restarts services (on the same node) when failed node returns

2013-12-09 Thread David Vossel
- Original Message -
 From: Peto Michalak peto.micha...@gmail.com
 To: linux-ha@lists.linux-ha.org
 Sent: Monday, December 9, 2013 1:14:23 PM
 Subject: [Linux-HA] pacemaker restarts services (on the same node) when   
 failed node returns
 
 Hello,
 
 I really searched for the answer before posting : ).
 
 I have a pacemaker setup + corosync + drbd in Active/Passive mode running
 in 2 node cluster on Ubuntu 12.04.3.
 
 Everything works fine and on node failure the services are taken care of by
 the other node (THANKS guys!), well the problem is that I've noticed, that
 once the failed node comes back alive, the pacemaker restarts the
 postgresql and virtual IP and that takes around 4-7 seconds (but keeps it
 on the same node as I wanted, what's the point? :) ). Is this really
 necessary, or I've messed up something in the configuration?

any chance you could provide us with a crm_report during the time frame of this 
unwanted restart?

-- Vossel

 My pacemaker config:
 
 node drpg-01 attributes standby=off
 node drpg-02 attributes standby=off
 primitive drbd_pg ocf:linbit:drbd \
 params drbd_resource=drpg \
 op monitor interval=15 \
 op start interval=0 timeout=240 \
 op stop interval=0 timeout=120
 primitive pg_fs ocf:heartbeat:Filesystem \
 params device=/dev/drbd/by-res/drpg directory=/db/pgdata
 options=noatime,nodiratime fstype=xfs \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=120
 primitive pg_lsb lsb:postgresql \
 op monitor interval=30 timeout=60 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60
 primitive pg_vip ocf:heartbeat:IPaddr2 \
 params ip=10.34.2.60 iflabel=pgvip \
 op monitor interval=5
 group PGServer pg_fs pg_lsb pg_vip
 ms ms_drbd_pg drbd_pg \
 meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true
 colocation col_pg_drbd inf: PGServer ms_drbd_pg:Master
 order ord_pg inf: ms_drbd_pg:promote PGServer:start
 property $id=cib-bootstrap-options \
 dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 no-quorum-policy=ignore \
 pe-warn-series-max=1000 \
 pe-input-series-max=1000 \
 pe-error-series-max=1000 \
 default-resource-stickiness=1000 \
 cluster-recheck-interval=5min \
 stonith-enabled=false \
 last-lrm-refresh=1385646505
 
 Thank you.
 
 Best Regards,
 -Peter
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM RA

2013-12-04 Thread David Vossel




- Original Message -
 From: Lars Marowsky-Bree l...@suse.com
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Wednesday, December 4, 2013 3:49:17 AM
 Subject: Re: [Linux-HA] Antw: Re: SLES11 SP2 HAE: problematic change for LVM 
 RA
 
 On 2013-12-04T10:25:58, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
 wrote:
 
   You thought it was working, but in fact it wasn't. ;-)
  working meaning the resource started.
  not working meaning the resource does not start
  
  You see I have minimal requirements ;-)
 
 I'm sorry; we couldn't possibly test all misconfigurations. So this
 slipped through, we didn't expect someone to set that for a
 non-clustered VG previously.

Updates have been made to the LVM agent to allow exclusive activation without 
clvmd.

http://www.davidvossel.com/wiki/index.php?title=HA_LVM

-- Vossel

 
   You could argue that it never should have worked. Anyway: If you want
   to activate a VG on exactly one node you should not need cLVM; only if
   you man to activate the VG on multiple nodes (as for a cluster file
   system)...
   
   You don't need cLVM to activate a VG on exactly one node. Correct. And
   you don't. The cluster stack will never activate a resource twice.
  
  Occasionally two safty lines are better than one. We HAD filesystem
  corruptions due to the cluster doing things it shouldn't do.
 
 And that's perfectly fine. All you need to do to activate this is
 vgchange -c y on the specific volume group, and the exclusive=true
 flag will work just fine.
 
   If you don't want that to happen, exclusive=true is not what you want to
   set.
  That makes sense, but what I don't like is that I have to mess with local
  lvm.conf files...
 
 You don't. Just drop exclusive=true, or set the clustered flag on the
 VG.
 
 You only have to change anything in the lvm.conf if you want to use tags
 for exclusivity protection (I defer to the LVM RA help for how to use
 that, I've never tried it).
 
 
 Regards,
 Lars
 
 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: establishing a new resource-agent package provider

2013-07-30 Thread David Vossel




- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Tuesday, July 30, 2013 7:46:25 AM
 Subject: Re: [Linux-HA] Antw: Re: establishing a new resource-agent package   
 provider
 
 
 On 30/07/2013, at 4:21 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
 wrote:
 
  David Vossel dvos...@redhat.com schrieb am 30.07.2013 um 01:20 in
  Nachricht
  1719265415.18819216.1375140025306.javamail.r...@redhat.com:
  
  [...]
  How does this compare to the Red Hat fence/resource-agent packages? I'm
  very happy to see heartbeat and it's inherent confusion go away, so I
  am fundamentally for this. I only question core and how it will relate
  to those fence and resource agents.
  
  core would only be related to the ocf standard.  I don't think this
  should
  have any relation to the fence agents.
  [...]
  
  I wonder: ocf:base:... or ocf:standard:... instaed of ocf:core:...
  
  My personal associations are a bit like this:
  core == essential
  base == basic functions

 many are not basic
 
  standard == somewhat standardized
 
 nor are they a standard (although they do conform to one)... they're just the
 ones that the people upstream ships.
 I like this one the least.

yeah, standard sounds confusing to me.  Splitting agents between core and base 
is going to be difficult as well.  If we did something like that, I'd probably 
want to do 'core' and 'extended'... where core supported agents are ones the 
community takes ownership of, and extended agents are agents that exist in the 
project, but are only maintained by a subset of the community.  I don't really 
want to do something like this though.

 
 common perhaps?

perhaps, but I still prefer 'core' over 'common'

These are the 'core' resource agents that the ocf community supports. Agents 
outside of the 'core' provider are supported by different projects and subsets 
of the community (like linbit and the drbd agent). To me 'common' refers to 
something that is shared... like a library or something. That probably isn't 
what we're going for. 

-- Vossel

 I don't much care beyond saying that continuing to call them heartbeat is a
 continuing source of confusion to people just arriving to our set of
 projects.
 Calling them heartbeat made sense originally, but now its an historical
 anachronism. IMHO.
  
  Regards,
  Ulrich
  
  
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] establishing a new resource-agent package provider

2013-07-29 Thread David Vossel
hey,

Historically the ocf resource agents have been shipped under the 'heartbeat' 
provider alias.  Now that pacemaker exists, the legacy name heartbeat is 
slightly confusing since it refers to another project.  We should change this.

How would you all feel about moving all the 'heartbeat' provider agents into a 
new provider called 'core', and then for legacy purposes create a 'heartbeat' 
symlink that points to the 'core' directory so no one's configuration breaks... 
Eventually some day we could move in the direction of depreciating the use of 
the 'heartbeat' provider entirely.

good plan? any thoughts?

-- Vossel
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] establishing a new resource-agent package provider

2013-07-29 Thread David Vossel
- Original Message -
 From: Digimer li...@alteeve.ca
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Cc: David Vossel dvos...@redhat.com
 Sent: Monday, July 29, 2013 5:21:00 PM
 Subject: Re: [Linux-HA] establishing a new resource-agent package provider
 
 On 29/07/13 18:19, David Vossel wrote:
  hey,
 
  Historically the ocf resource agents have been shipped under the
  'heartbeat' provider alias.  Now that pacemaker exists, the legacy name
  heartbeat is slightly confusing since it refers to another project.  We
  should change this.
 
  How would you all feel about moving all the 'heartbeat' provider agents
  into a new provider called 'core', and then for legacy purposes create a
  'heartbeat' symlink that points to the 'core' directory so no one's
  configuration breaks... Eventually some day we could move in the direction
  of depreciating the use of the 'heartbeat' provider entirely.
 
  good plan? any thoughts?
 
  -- Vossel
 
 How does this compare to the Red Hat fence/resource-agent packages? I'm
 very happy to see heartbeat and it's inherent confusion go away, so I
 am fundamentally for this. I only question core and how it will relate
 to those fence and resource agents.

core would only be related to the ocf standard.  I don't think this should 
have any relation to the fence agents.

-- Vossel

 
 --
 Digimer
 Papers and Projects: https://alteeve.ca/w/
 What if the cure for cancer is trapped in the mind of a person without
 access to education?
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] PCS and ping resources?

2013-07-01 Thread David Vossel
- Original Message -
 From: Jakob Curdes j...@info-systems.de
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Sunday, June 30, 2013 6:04:58 AM
 Subject: [Linux-HA] PCS and ping resources?
 
 Hello, I have configured a cluster on CentOS 6.x using PCS. All fine,
 but I miss the information how to create ping primitives and use them to
 ensure connectivity for the active machine.
 With crmsh this was done configuring a ocf:pacemaker:ping primitive and
 a clone; I could not figure out how to do this with PCS.

$ pcs resource help
.
.
.
create resource id class:provider:type|type [resource options]
   [op operation action operation options [operation action
   operation options]...] [meta meta options...] [--clone|--master]
Create specified resource.  If --clone is used a clone resource is
created (with options specified by --cloneopt clone_option=value),
if --master is specified a master/slave resource is created.
.
.
.

I'm guessing the following would work.

$ pcs resource pingrsc ping --clone


-- Vossel

 I could not
 find any document describing this. Did I miss something?
 
 Regards,
 Jakob Curdes
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-06-18 Thread David Vossel




- Original Message -
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: linux-ha@lists.linux-ha.org
 Sent: Tuesday, June 18, 2013 4:30:30 AM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 On Mon, Jun 17, 2013 at 05:53:57PM -0400, David Vossel wrote:
   The plan to set 'volume_list=[@*]' in lvm.conf and override the tags
   during
   the activation using vgchange -ay --config 'tags{ mytag{} }' vg0 does
   not
   work.
   
   As a similar alternative, I am forcing the volume_list in lvm.conf to be
   initialized and not contain the tag the cluster is using for exclusive
   activation, and then overriding the volume_list during the activation to
   allow volume groups with the cluster tag to be activated.  This is a very
   similar approach to what Lars original proposed.
   
   I have finished my initial work on the new set of patches. They can be
   found
   in this pull request.
   https://github.com/ClusterLabs/resource-agents/pull/252
  
  This patch is going in tomorrow. Speak up now if you have any reservations
  concerning it.
 
 Manny thanks for all the work,
 as tomorrow has passed,
 I just merged it.
 
 Added a tag parameter description.
 
 I'm not sure why we would need to strip the tag on stop though?

I see stripping the tag on stop as cleaning up the magic the LVM agent is doing 
behind the scenes.  The tag is only needed to allow activation, activation is 
still prevented outside of the cluster management with or without the tag being 
present on the vg.

 
 Or why we would override a different tag on start.
 
 As the cluster tag is not supposed to change,
 we could just require the admin to set it once.

Yep, we could do this.  It puts another step in the admin's hands that we can 
automate though.

 
 Has the side-effect that an admin can revoke the cluster rights
 by simply re-tagging with something != the cluster tag.
 
 Do we still want the single-lv activation feature as well?

I gave this a lot of thought.  single-lv activation adds additional complexity 
to this whole exclusive activation with tags. Rather than complicate the 
situation any further, I am in favor of not pulling support for lv activation 
to the heartbeat agent.  This agent is already complex to manage as it is.

-- Vossel

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 
 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-06-17 Thread David Vossel




- Original Message -
 From: David Vossel dvos...@redhat.com
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Tuesday, June 4, 2013 4:41:06 PM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 - Original Message -
  From: David Vossel dvos...@redhat.com
  To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
  Sent: Monday, June 3, 2013 10:50:01 AM
  Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
  
  - Original Message -
   From: Lars Ellenberg lars.ellenb...@linbit.com
   To: linux-ha@lists.linux-ha.org
   Sent: Tuesday, May 21, 2013 5:58:05 PM
   Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
   
   On Tue, May 21, 2013 at 05:52:39PM -0400, David Vossel wrote:
- Original Message -
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: Brassow Jonathan jbras...@redhat.com
 Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org,
 Lars
 Marowsky-Bree l...@suse.com, Fabio M. Di
 Nitto fdini...@redhat.com
 Sent: Monday, May 20, 2013 3:50:49 PM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 On Fri, May 17, 2013 at 02:00:48PM -0500, Brassow Jonathan wrote:
  
  On May 17, 2013, at 10:14 AM, Lars Ellenberg wrote:
  
   On Thu, May 16, 2013 at 10:42:30AM -0400, David Vossel wrote:
   
   The use of 'auto_activation_volume_list' depends on updates
   to
   the
   LVM
   initscripts - ensuring that they use '-aay' in order to
   activate
   logical
   volumes.  That has been checked in upstream.  I'm sure it
   will
   go
   into
   RHEL7 and I think (but would need to check on) RHEL6.
   
   Only that this is upstream here, so it better work with
   debian oldstale, gentoo or archlinux as well ;-)
   
   
   Would this be good enough:
   
   vgchange --addtag pacemaker $VG
   and NOT mention the pacemaker tag anywhere in lvm.conf ...
   then, in the agent start action,
   vgchange -ay --config tags { pacemaker {} } $VG
   
   (or have the to be used tag as an additional parameter)
   
   No retagging necessary.
   
   How far back do the lvm tools understand the --config ...
   option?
  
  --config option goes back years and years - not sure of the exact
  date,
  but
  could probably tell with 'git bisect' if you wanted me to.
  
  The above would not quite be sufficient.
  You would still have to change the 'volume_list' field in lvm.conf
  (and
  update the initrd).
 
 You have to do that anyways if you want to make use of tags in this
 way?
 
  What you are proposing would simplify things in that you would not
  need different 'volume_list's on each machine - you could copy
  configs
  between machines.
 
 I thought volume_list = [ ... , @* ] in lvm.conf,
 assuming that works on all relevant distributions as well,
 and a command line --config tag would also propagate into that @*.
 It did so for me.
 
 But yes, vlumen_list = [ ... , pacemaker ] would be fine as well.

wait, did we just go around in a circle.  If we add pacemaker to the
volume list, and use that in every cluster node's config, then we've
by-passed the exclusive activation part have we not?!
   
   No.  I suggested to NOT set that pacemaker tag in the config
   (lvm.conf),
   but only ever explicitly set that tag from the command line as used from
   the resource agent ( --config tags { pacemaker {} } )
   
   That would also mean to either override volume_list with the same
   command line, or to have the tag mentioned in the volume_list in
   lvm.conf (but not set it in the tags {} section).
   
Also, we're not happy with the auto_activate list because it won't
work with old distros?!  It's a new feature, why do we have to work
with old distros that don't support it?
   
   You are right, we only have to make sure we don't break existing setup
   by rolling out a new version of the RA.  So if the resource agent
   won't accidentally use a code path where support of a new feature
   (of LVM) would be required, that's good enough compatibility.
   
   Still it won't hurt to pick the most compatible implementation
   of several possible equivalent ones (RA-feature wise).
   
   I think the proposed --config tags { pacemaker {} }
   is simpler (no retagging, no re-writing of lvm meta data),
   and will work for any setup that knows about tags.
  
  I've had a good talk with Jonathan about the --config tags { pacemaker {}
  }
  approach.  This was originally complicated for us because we were using the
  --config option for a device filter during activation in certain
  situations... using the --config option twice caused problems which made
  adding the tag in the config difficult.
  
  We've

Re: [Linux-HA] Antw: ocf HA_RSCTMP directory location

2013-06-14 Thread David Vossel
- Original Message -
 From: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Friday, June 14, 2013 1:34:58 AM
 Subject: [Linux-HA] Antw:  ocf HA_RSCTMP directory location
 
 Hi!
 
 I think the location of the temporary directory is not that important,

It is because pacemaker has to make sure the directory exists every time it 
starts up. Otherwise agents will fail.

 because you don't exchange data between RAs. For the design, I think it's
 sufficient if an RA that actually uses the temporary directory does check
 ist existence (for validate-all, maybe). But even that is not necessary,
 because if the RA cannot write it's PID where it wanted to, the star
 toperation should fail, and the user will notice the problem.
 
 Amazingly I had seen a situation with samba recently, where smbd start exited
 OK, but did not start, because the PID file (an obsolete one) already
 existed. I had to remove the PID file manually. Clearly a bug in samba.

If the temp directories don't get cleaned out on restart, this is a possibility.

 
 I think RAs should not rely on the fact that temp directories are clean when
 a resource is going to be started.

The resource tmp directory has to get cleaned out on startup, if it doesn't I 
don't think there is a good solution for resource agents to detect a stale pid 
file from one that is current.  Nearly all the agents depend on this tmp 
directory to get reinitialized.  If we decided not to depend on this logic, 
every agent would have to be altered to account for this.  This would mean 
adding a layer of complexity to the agents that should otherwise be unnecessary.

-- Vossel

 
 Regards,
 Ulrich
 
  David Vossel dvos...@redhat.com schrieb am 13.06.2013 um 23:59 in
  Nachricht
 194863966.11352160.1371160774187.javamail.r...@redhat.com:
  Hey,
  
  Andrew and I have been running into some inconsistencies between
  resource-agent packages that we need to get cleared up.
  
  There's an ocf variable, HA_RSCTMP, used in many of the resource agents
  that
  represents a place the agent's can store their PID files and other
  temporary
  data.  This data needs to live under some directory in /var/run as that
  directory is typically cleared on startup.  This is important to prevent
  stale PID files and other transient data from being persistent across
  restarts.
  
  Anyway.  Here's the problem.
  
  Pacemaker thinks that data should live in '/var/run/heartbeat/rsctmp', but
  not all the resource-agent packages are consistent with that.  For example,
  Suse's resource-agent package sets HA_RSCTMP to '/var/run/resource-agents'
  (
  looking at this rpm,
  http://download.opensuse.org/distribution/11.4/repo/oss/suse/x86_64/resource-a
  gents-1.0.3-9.12.1.x86_64.rpm )
  
  We need to come to some sort of agreement because ultimately Pacemaker
  needs
  to make sure this directory exists on startup, whatever it is.  If
  pacemaker
  doesn't create the right directory, it's possible the resource agents won't
  be able to access it since /var/run is re-initialized on startup.
  
  so,
  HA_RSCTMP = /var/run/heartbeat/rsctmp
  or
  HA_RSCTMP = /var/run/resource-agents
  
  thoughts?
  
  -- Vossel
  
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] ocf HA_RSCTMP directory location

2013-06-13 Thread David Vossel
Hey,

Andrew and I have been running into some inconsistencies between resource-agent 
packages that we need to get cleared up.

There's an ocf variable, HA_RSCTMP, used in many of the resource agents that 
represents a place the agent's can store their PID files and other temporary 
data.  This data needs to live under some directory in /var/run as that 
directory is typically cleared on startup.  This is important to prevent stale 
PID files and other transient data from being persistent across restarts.

Anyway.  Here's the problem.

Pacemaker thinks that data should live in '/var/run/heartbeat/rsctmp', but not 
all the resource-agent packages are consistent with that.  For example, Suse's 
resource-agent package sets HA_RSCTMP to '/var/run/resource-agents' ( looking 
at this rpm, 
http://download.opensuse.org/distribution/11.4/repo/oss/suse/x86_64/resource-agents-1.0.3-9.12.1.x86_64.rpm
 )

We need to come to some sort of agreement because ultimately Pacemaker needs to 
make sure this directory exists on startup, whatever it is.  If pacemaker 
doesn't create the right directory, it's possible the resource agents won't be 
able to access it since /var/run is re-initialized on startup.

so,
HA_RSCTMP = /var/run/heartbeat/rsctmp
or
HA_RSCTMP = /var/run/resource-agents

thoughts?

-- Vossel

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Last call: removal of ocf:heartbeat:drbd in favor of ocf:linbit:heartbeat

2013-06-12 Thread David Vossel




- Original Message -
 From: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Wednesday, June 12, 2013 1:00:45 AM
 Subject: [Linux-HA] Antw: Last call: removal of ocf:heartbeat:drbd in favor 
 of ocf:linbit:heartbeat
 
 Besides merge or remove you could move it to ocf:unsupported:drbd ;-)

I'd like it to disappear entirely. The drbd agent is supported, just not by the 
heartbeat provider.  If there wasn't a duplicate agent already, this would make 
sense.

-- Vossel

  David Vossel dvos...@redhat.com schrieb am 11.06.2013 um 17:15 in
  Nachricht
 624463553.10357145.1370963709261.javamail.r...@redhat.com:
  Hey
  
  We need to get rid of this heartbeat drbd agent. It is outdated and linbit
  isn't supporting it.  Instead linbit is shipping their own supported drbd
  agent in their own package using the linbit ocf provider.  Any distro that
  is
  still using the heartbeat:drbd agent exclusively is old enough that a
  rebase
  of the resource-agents package is unlikely.
  
  Unless someone steps forward with a good argument against the removal of
  the
  heartbeat:drbd agent, the following pull request is going to be merged a
  week
  from today. (Tuesday 18th)
  
  https://github.com/ClusterLabs/resource-agents/pull/244
  
  thanks,
  -- Vossel
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Master/Slave status check using crm_mon

2013-06-12 Thread David Vossel




- Original Message -
 From: John M john332...@gmail.com
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Wednesday, June 12, 2013 11:49:21 AM
 Subject: Re: [Linux-HA] Master/Slave status check using crm_mon
 
 Dear All,
 
   I will try to setup pacemaker cluster in the coming weeks. Before that I
 have to complete the configuration using heartbeat 2.1.4.
   I would really appreciate if you could suggest the configuration for
 Master/Slave scenario mentioned in my previous mail.


http://clusterlabs.org/doc/

look through clusters from scratch. Read about multi-state resources in 
Pacemaker Explained.

-- Vossel

   Thanks in advance.
 
 BR,
 Mark
 
 On Tuesday, June 11, 2013, Lars Marowsky-Bree l...@suse.com wrote:
  On 2013-06-11T15:05:11, John M john332...@gmail.com wrote:
 
  Unfortunately I cannot install pacemaker :(
 
  I just installed heartbeat 2.1.4 and in crm_mon I am getting
  Master/Slave status.
 
  You seriously need to upgrade. Heartbeat 2.1.4 is ages old and has many,
  many known bugs. You'll not be able to secure community aid for that
  version any more.
 
 
  Regards,
  Lars
 
  --
  Architect Storage/HA
  SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
 Imendörffer, HRB 21284 (AG Nürnberg)
  Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Last call: removal of ocf:heartbeat:drbd in favor of ocf:linbit:heartbeat

2013-06-11 Thread David Vossel
Hey

We need to get rid of this heartbeat drbd agent. It is outdated and linbit 
isn't supporting it.  Instead linbit is shipping their own supported drbd agent 
in their own package using the linbit ocf provider.  Any distro that is still 
using the heartbeat:drbd agent exclusively is old enough that a rebase of the 
resource-agents package is unlikely.

Unless someone steps forward with a good argument against the removal of the 
heartbeat:drbd agent, the following pull request is going to be merged a week 
from today. (Tuesday 18th)

https://github.com/ClusterLabs/resource-agents/pull/244

thanks,
-- Vossel
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-06-03 Thread David Vossel


- Original Message -
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: linux-ha@lists.linux-ha.org
 Sent: Tuesday, May 21, 2013 5:58:05 PM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 On Tue, May 21, 2013 at 05:52:39PM -0400, David Vossel wrote:
  - Original Message -
   From: Lars Ellenberg lars.ellenb...@linbit.com
   To: Brassow Jonathan jbras...@redhat.com
   Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org, Lars
   Marowsky-Bree l...@suse.com, Fabio M. Di
   Nitto fdini...@redhat.com
   Sent: Monday, May 20, 2013 3:50:49 PM
   Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
   
   On Fri, May 17, 2013 at 02:00:48PM -0500, Brassow Jonathan wrote:

On May 17, 2013, at 10:14 AM, Lars Ellenberg wrote:

 On Thu, May 16, 2013 at 10:42:30AM -0400, David Vossel wrote:
 
 The use of 'auto_activation_volume_list' depends on updates to
 the
 LVM
 initscripts - ensuring that they use '-aay' in order to
 activate
 logical
 volumes.  That has been checked in upstream.  I'm sure it will
 go
 into
 RHEL7 and I think (but would need to check on) RHEL6.
 
 Only that this is upstream here, so it better work with
 debian oldstale, gentoo or archlinux as well ;-)
 
 
 Would this be good enough:
 
 vgchange --addtag pacemaker $VG
 and NOT mention the pacemaker tag anywhere in lvm.conf ...
 then, in the agent start action,
 vgchange -ay --config tags { pacemaker {} } $VG
 
 (or have the to be used tag as an additional parameter)
 
 No retagging necessary.
 
 How far back do the lvm tools understand the --config ... option?

--config option goes back years and years - not sure of the exact date,
but
could probably tell with 'git bisect' if you wanted me to.

The above would not quite be sufficient.
You would still have to change the 'volume_list' field in lvm.conf (and
update the initrd).
   
   You have to do that anyways if you want to make use of tags in this way?
   
What you are proposing would simplify things in that you would not
need different 'volume_list's on each machine - you could copy configs
between machines.
   
   I thought volume_list = [ ... , @* ] in lvm.conf,
   assuming that works on all relevant distributions as well,
   and a command line --config tag would also propagate into that @*.
   It did so for me.
   
   But yes, vlumen_list = [ ... , pacemaker ] would be fine as well.
  
  wait, did we just go around in a circle.  If we add pacemaker to the
  volume list, and use that in every cluster node's config, then we've
  by-passed the exclusive activation part have we not?!
 
 No.  I suggested to NOT set that pacemaker tag in the config (lvm.conf),
 but only ever explicitly set that tag from the command line as used from
 the resource agent ( --config tags { pacemaker {} } )
 
 That would also mean to either override volume_list with the same
 command line, or to have the tag mentioned in the volume_list in
 lvm.conf (but not set it in the tags {} section).
 
  Also, we're not happy with the auto_activate list because it won't
  work with old distros?!  It's a new feature, why do we have to work
  with old distros that don't support it?
 
 You are right, we only have to make sure we don't break existing setup
 by rolling out a new version of the RA.  So if the resource agent
 won't accidentally use a code path where support of a new feature
 (of LVM) would be required, that's good enough compatibility.
 
 Still it won't hurt to pick the most compatible implementation
 of several possible equivalent ones (RA-feature wise).
 
 I think the proposed --config tags { pacemaker {} }
 is simpler (no retagging, no re-writing of lvm meta data),
 and will work for any setup that knows about tags.

I've had a good talk with Jonathan about the --config tags { pacemaker {} } 
approach.  This was originally complicated for us because we were using the 
--config option for a device filter during activation in certain situations... 
using the --config option twice caused problems which made adding the tag in 
the config difficult.

We've worked through those situations and it looks like it is actually safe to 
strip out the conflicting --config usage required for the resilient device 
filtering on activation.

The path forward is this.

1. Now that I know certain things are safe to remove, I'm going to re-evaluate 
my current patches and attempt to greatly simplify the number of changes to the 
original LVM agent.  All the looking at the cluster membership and resilient 
activation checking can be thrown out.

2. Next I'm going to introduce the proposed --config tags feature as a separate 
patch that enables exclusive activation functionality without clvmd.

The final result here is going to be a much less scary looking set of changes 
than what I currently have up

Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-05-24 Thread David Vossel
- Original Message -
 From: Lars Marowsky-Bree l...@suse.com
 To: linux-ha@lists.linux-ha.org
 Cc: Jonathan Brassow jbras...@redhat.com, Fabio M. Di Nitto 
 fdini...@redhat.com
 Sent: Friday, May 24, 2013 3:35:29 AM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 On 2013-05-15T13:50:45, Lars Ellenberg lars.ellenb...@linbit.com wrote:
 
 Are we, in this discussion, perhaps losing the focus on the base
 submission of the code merge?
 
 Can we separate that (IMHO rather worthwhile) patch set from the
 exclusive activation part?

I don't think we are losing focus, I think we're really close to finalizing the 
final bits here.

If no one objects to the idea you proposed about using a single tag for the 
entire cluster to ensure exclusive activation, the patches remain similar to 
the way it is now, I just strip out a bunch of unnecessary stuff.

I'm waiting to hear more feedback from Jonathan about this new direction before 
I act on anything.

-- Vossel

 (Which I happen to have no strong opinion on,
 unless it is already shipped - in which case we need to support it.)


 
 Regards,
 Lars
 
 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] vm live migration without shared storage

2013-05-23 Thread David Vossel
Hey,

I've been testing libvirt live migration without shared storage in Fedora 19 
alpha.  Specifically this feature, 
https://fedoraproject.org/wiki/Features/Virt_Storage_Migration.  They've done a 
lot of work to make it a much more solid option than it has been in the past.  
I had to work through a few bugs with the guys over there, but it's working 
great for me now.

So, do we want to use this in HA? It would be trivial to add this to the 
VirtualDomain resource agent, but does it make sense to do this?  Migration 
time, depending on network speed and hardware, is much longer than the shared 
storage option (minutes vs. seconds).   I don't mind adding this support to the 
agent, but I wanted to get peoples' feedback and make sure this is something we 
want before making that effort.

-- Vossel
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] vm live migration without shared storage

2013-05-23 Thread David Vossel
- Original Message -
 From: Greg Woods wo...@ucar.edu
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Thursday, May 23, 2013 2:45:16 PM
 Subject: Re: [Linux-HA] vm live migration without shared storage
 
 On Thu, 2013-05-23 at 15:00 -0400, David Vossel wrote:
   Migration time, depending on network speed and hardware, is much longer
   than the shared storage option (minutes vs. seconds).
 
 
 This is just one data point (of course), but for the vast majority of
 services that I run, if the live migration time is as long as it takes
 to shut down a VM and boot it on another server, then there isn't much
 of an advantage to doing the live migration. Especially if we're talking
 about an option that is a long way from being battle-tested, and
 critical services such as DNS and authentication. Most of these critical
 services do not use long-lived connections.
 
 I can see a few VMs that exist to provide ssh logins where a
 minutes-long live migration would be clearly preferable to a shut down
 and reboot, but in most cases, if it's as slow as rebooting, it isn't
 going to be any advantage to me.
 
 It will be interesting though to see how many applications people come
 up with where a minutes-long live migration is preferable to shutdown
 and reboot.

The actual migration takes awhile, but the transition between running on the 
source and running on the destination should be very fast.  The source stays 
running while the disk is being copied to the destination, once the copy is 
complete it's like flipping a switch... at least that's my understanding.

-- Vossel

 
 --Greg
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-05-21 Thread David Vossel
- Original Message -
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: Brassow Jonathan jbras...@redhat.com
 Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org, Lars 
 Marowsky-Bree l...@suse.com, Fabio M. Di
 Nitto fdini...@redhat.com
 Sent: Monday, May 20, 2013 3:50:49 PM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 On Fri, May 17, 2013 at 02:00:48PM -0500, Brassow Jonathan wrote:
  
  On May 17, 2013, at 10:14 AM, Lars Ellenberg wrote:
  
   On Thu, May 16, 2013 at 10:42:30AM -0400, David Vossel wrote:
   
   The use of 'auto_activation_volume_list' depends on updates to the
   LVM
   initscripts - ensuring that they use '-aay' in order to activate
   logical
   volumes.  That has been checked in upstream.  I'm sure it will go
   into
   RHEL7 and I think (but would need to check on) RHEL6.
   
   Only that this is upstream here, so it better work with
   debian oldstale, gentoo or archlinux as well ;-)
   
   
   Would this be good enough:
   
   vgchange --addtag pacemaker $VG
   and NOT mention the pacemaker tag anywhere in lvm.conf ...
   then, in the agent start action,
   vgchange -ay --config tags { pacemaker {} } $VG
   
   (or have the to be used tag as an additional parameter)
   
   No retagging necessary.
   
   How far back do the lvm tools understand the --config ... option?
  
  --config option goes back years and years - not sure of the exact date, but
  could probably tell with 'git bisect' if you wanted me to.
  
  The above would not quite be sufficient.
  You would still have to change the 'volume_list' field in lvm.conf (and
  update the initrd).
 
 You have to do that anyways if you want to make use of tags in this way?
 
  What you are proposing would simplify things in that you would not
  need different 'volume_list's on each machine - you could copy configs
  between machines.
 
 I thought volume_list = [ ... , @* ] in lvm.conf,
 assuming that works on all relevant distributions as well,
 and a command line --config tag would also propagate into that @*.
 It did so for me.
 
 But yes, vlumen_list = [ ... , pacemaker ] would be fine as well.

wait, did we just go around in a circle.  If we add pacemaker to the volume 
list, and use that in every cluster node's config, then we've by-passed the 
exclusive activation part have we not?!

Also, we're not happy with the auto_activate list because it won't work with 
old distros?!  It's a new feature, why do we have to work with old distros that 
don't support it?

-- Vossel

 
   Lars
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-05-21 Thread David Vossel
- Original Message -
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: linux-ha@lists.linux-ha.org
 Sent: Tuesday, May 21, 2013 5:58:05 PM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 On Tue, May 21, 2013 at 05:52:39PM -0400, David Vossel wrote:
  - Original Message -
   From: Lars Ellenberg lars.ellenb...@linbit.com
   To: Brassow Jonathan jbras...@redhat.com
   Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org, Lars
   Marowsky-Bree l...@suse.com, Fabio M. Di
   Nitto fdini...@redhat.com
   Sent: Monday, May 20, 2013 3:50:49 PM
   Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
   
   On Fri, May 17, 2013 at 02:00:48PM -0500, Brassow Jonathan wrote:

On May 17, 2013, at 10:14 AM, Lars Ellenberg wrote:

 On Thu, May 16, 2013 at 10:42:30AM -0400, David Vossel wrote:
 
 The use of 'auto_activation_volume_list' depends on updates to
 the
 LVM
 initscripts - ensuring that they use '-aay' in order to
 activate
 logical
 volumes.  That has been checked in upstream.  I'm sure it will
 go
 into
 RHEL7 and I think (but would need to check on) RHEL6.
 
 Only that this is upstream here, so it better work with
 debian oldstale, gentoo or archlinux as well ;-)
 
 
 Would this be good enough:
 
 vgchange --addtag pacemaker $VG
 and NOT mention the pacemaker tag anywhere in lvm.conf ...
 then, in the agent start action,
 vgchange -ay --config tags { pacemaker {} } $VG
 
 (or have the to be used tag as an additional parameter)
 
 No retagging necessary.
 
 How far back do the lvm tools understand the --config ... option?

--config option goes back years and years - not sure of the exact date,
but
could probably tell with 'git bisect' if you wanted me to.

The above would not quite be sufficient.
You would still have to change the 'volume_list' field in lvm.conf (and
update the initrd).
   
   You have to do that anyways if you want to make use of tags in this way?
   
What you are proposing would simplify things in that you would not
need different 'volume_list's on each machine - you could copy configs
between machines.
   
   I thought volume_list = [ ... , @* ] in lvm.conf,
   assuming that works on all relevant distributions as well,
   and a command line --config tag would also propagate into that @*.
   It did so for me.
   
   But yes, vlumen_list = [ ... , pacemaker ] would be fine as well.
  
  wait, did we just go around in a circle.  If we add pacemaker to the
  volume list, and use that in every cluster node's config, then we've
  by-passed the exclusive activation part have we not?!
 
 No.  I suggested to NOT set that pacemaker tag in the config (lvm.conf),
 but only ever explicitly set that tag from the command line as used from
 the resource agent ( --config tags { pacemaker {} } )
 
 That would also mean to either override volume_list with the same
 command line, or to have the tag mentioned in the volume_list in
 lvm.conf (but not set it in the tags {} section).
 
  Also, we're not happy with the auto_activate list because it won't
  work with old distros?!  It's a new feature, why do we have to work
  with old distros that don't support it?
 
 You are right, we only have to make sure we don't break existing setup
 by rolling out a new version of the RA.  So if the resource agent
 won't accidentally use a code path where support of a new feature
 (of LVM) would be required, that's good enough compatibility.
 
 Still it won't hurt to pick the most compatible implementation
 of several possible equivalent ones (RA-feature wise).
 
 I think the proposed --config tags { pacemaker {} }
 is simpler (no retagging, no re-writing of lvm meta data),
 and will work for any setup that knows about tags.
 
 (and I still think the RA should not try to second guess pacemaker and
 double check the membership...  which in the case of this cluster wide
 tag would not make sense anymore, anyways)
 
 Of course this cluster wide tag pacemaker could be made configurable
 itself, so not all clusters in the world using this feature would use
 the same tag.
 
 primitive ... params exclusive=1
   (implicitly does the right thing,
and internally guesses whatever that may be,
several times scanning and parsing LVM meta data just for that)
 
 becomes
 
 primitive ... params tag=weather-forecast-scratch-01
   (explicitly knows to use the _tag variants of start/monitor/stop,
no parsing and guessing, not try and retry, but just do-it-or-fail)
 
 Does that make sense at all?

yes, we are on the same page now.  I am in favor of this approach.

I am still confused about Jonathan's comment though when you proposed this 
solution...
The above would not quite be sufficient.  You would still have to change the 
'volume_list' field in lvm.conf (and update the initrd).

Why would

Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-05-16 Thread David Vossel
- Original Message -
 From: Brassow Jonathan jbras...@redhat.com
 To: David Vossel dvos...@redhat.com
 Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org, Lars 
 Marowsky-Bree l...@suse.com, Fabio M. Di
 Nitto fdini...@redhat.com, Jonathan Brassow jbras...@redhat.com
 Sent: Thursday, May 16, 2013 9:32:38 AM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 
 On May 16, 2013, at 9:08 AM, David Vossel wrote:
 
  - Original Message -
  From: Brassow Jonathan jbras...@redhat.com
  To: David Vossel dvos...@redhat.com
  Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org, Lars
  Marowsky-Bree l...@suse.com, Fabio M. Di
  Nitto fdini...@redhat.com, Jonathan Brassow jbras...@redhat.com
  Sent: Thursday, May 16, 2013 8:37:08 AM
  Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
  
  
  On May 15, 2013, at 7:04 PM, David Vossel wrote:
  
  
  
  
  
  - Original Message -
  From: Brassow Jonathan jbras...@redhat.com
  To: David Vossel dvos...@redhat.com
  Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org, Lars
  Marowsky-Bree l...@suse.com, Fabio M. Di
  Nitto fdini...@redhat.com
  Sent: Tuesday, May 14, 2013 5:01:02 PM
  Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
  
  
  On May 14, 2013, at 10:36 AM, David Vossel wrote:
  
  - Original Message -
  From: Lars Ellenberg lars.ellenb...@linbit.com
  To: Lars Marowsky-Bree l...@suse.com
  Cc: Fabio M. Di Nitto fdini...@redhat.com, General Linux-HA
  mailing
  list linux-ha@lists.linux-ha.org,
  Jonathan Brassow jbras...@redhat.com
  Sent: Tuesday, May 14, 2013 9:50:43 AM
  Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
  
  On Tue, May 14, 2013 at 04:06:09PM +0200, Lars Marowsky-Bree wrote:
  On 2013-05-14T09:54:55, David Vossel dvos...@redhat.com wrote:
  
  Here's what it comes down to.  You aren't guaranteed exclusive
  activation just because pacemaker is in control. There are scenarios
  with SAN disks where the node starts up and can potentially attempt
  to
  activate a volume before pacemaker has initialized.
  
  Yeah, from what I've read in the code, the tagged activation would
  also
  prevent a manual (or on-boot) vg/lv activation (because it seems lvm
  itself will refuse). That seems like a good idea to me. Unless I'm
  wrong, that concept seems sound, barring bugs that need fixing.
  
  Sure.
  
  And I'm not at all oposed to using tags.
  I want to get rid of the layer violation,
  which is the one Bad Thing I'm complaining about.
  
  Also, note that on stop, this strips all tags, leaving it untagged.
  On the next cluster boot, if that was really the concern,
  all nodes would grab and activate the VG, as it is untagged...
  
  That's not how it works.  You have to take ownership of the volume
  before
  you can activate it.  Untagged does not mean a node can activate it
  without first explicitly setting the tag.
  
  Ok, so I'm coming into this late.  Sorry about that.
  
  David has this right.  Tagging in conjunction with the 'volume_list'
  setting
  in lvm.conf is what is used to restrict VG/LV activation.  As he
  mentioned,
  you don't want a machine to boot up and start doing a resync on a mirror
  while user I/O is happening on the node where the service is active.  In
  that scenario, even if the LV is not mounted, there will be corruption.
  The
  LV must not be allowed activation in the first place.
  
  I think the HA scripts written for rgmanager could be considerably
  reduced
  in
  size.  We probably don't need the matrix of different methods (cLVM vs
  Tagging.  VG vs LV).  Many of these came about as customers asked for
  them
  and we didn't want to compromise backwards compatibility.  If we are
  switching, now's the time for clean-up.  In fact, LVM has something new
  in
  lvm.conf: 'auto_activation_volume_list'.  If the list is defined and a
  VG/LV
  is in the list, it will be automatically activated on boot; otherwise,
  it
  will not.  That means, forget tagging and forget cLVM.  Make users
  change
  'auto_activation_volume_list' to include only VGs that are not
  controlled
  by
  pacemaker.  The HA script should then make sure that
  'auto_activation_volume_list' is defined and does not contain the VG/LV
  that
  is being controlled by pacemaker.  It would be necessary to check that
  the
  lvm.conf copy in the initrd is properly set.
  
  The use of 'auto_activation_volume_list' depends on updates to the LVM
  initscripts - ensuring that they use '-aay' in order to activate logical
  volumes.  That has been checked in upstream.  I'm sure it will go into
  RHEL7
  and I think (but would need to check on) RHEL6.
  
  The 'auto_activation_volume_list' doesn't seem like it exactly what we
  want
  here though.  It kind of works for what we are wanting to achieve but as
  a
  side effect, and I'm not sure it would work for everyone's deployment.
  For example, is there a way to set

Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-05-15 Thread David Vossel
- Original Message -
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Cc: Jonathan Brassow jbras...@redhat.com, Lars Marowsky-Bree 
 l...@suse.com, Fabio M. Di Nitto
 fdini...@redhat.com
 Sent: Wednesday, May 15, 2013 6:50:45 AM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 On Tue, May 14, 2013 at 11:36:54AM -0400, David Vossel wrote:
  - Original Message -
   From: Lars Ellenberg lars.ellenb...@linbit.com
   To: Lars Marowsky-Bree l...@suse.com
   Cc: Fabio M. Di Nitto fdini...@redhat.com, General Linux-HA mailing
   list linux-ha@lists.linux-ha.org,
   Jonathan Brassow jbras...@redhat.com
   Sent: Tuesday, May 14, 2013 9:50:43 AM
   Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
   
   On Tue, May 14, 2013 at 04:06:09PM +0200, Lars Marowsky-Bree wrote:
On 2013-05-14T09:54:55, David Vossel dvos...@redhat.com wrote:

 Here's what it comes down to.  You aren't guaranteed exclusive
 activation just because pacemaker is in control. There are scenarios
 with SAN disks where the node starts up and can potentially attempt
 to
 activate a volume before pacemaker has initialized.

Yeah, from what I've read in the code, the tagged activation would also
prevent a manual (or on-boot) vg/lv activation (because it seems lvm
itself will refuse). That seems like a good idea to me. Unless I'm
wrong, that concept seems sound, barring bugs that need fixing.
   
   Sure.
   
   And I'm not at all oposed to using tags.
   I want to get rid of the layer violation,
   which is the one Bad Thing I'm complaining about.
   
   Also, note that on stop, this strips all tags, leaving it untagged.
   On the next cluster boot, if that was really the concern,
   all nodes would grab and activate the VG, as it is untagged...
  
  That's not how it works.  You have to take ownership of the volume
  before you can activate it.  Untagged does not mean a node can
  activate it without first explicitly setting the tag.
 
 The scenario is this (please correct me):
 
 There are a number of hosts that can see the PVs for this VG.
 
 Some of them may *not* be part of the pacemaker cluster.

I don't expect this to be the case.  If the node isn't a member of the cluster 
and it can see the PVs, then it is likely just starting up after being fenced 
and will rejoin the cluster shortly.

 
 But *ALL* of them have their lvm.conf contain the equivalent of
   global { locking_type = 1 }
   tags { hosttags = 1 }
   activation { volume_list = [ @* ] }
 
 If any node is able to see the PVs, but has volume_list undefined,
 vgchange -ay would activate it anyways.
 So we are back at Don't do that.

Ha, well if people want to shoot themselves in the foot, we can't help that 
either. The point of the feature is to give people a path to ensure exclusive 
activation without using clvmd+dlm.  The preferred and safest way is still to 
use clvmd.

 
  I've tested this specific scenario and was unable to activate the
  volume group manually without grabbing the tag first.  Have you tested
  this and found something contrary to my results?  This is how the
  feature is supposed to work.
 
 See above ;-) no hosttags =1, no volume_list, no checking against it.
 
 Granted, the lvm.conf would be prepared at deployment time,
 so let's assume it is setup ok on all hosts accross the site anyways.
 
 Still, I don't see what we gain by
   check tag against my name
   if not me, check tag againts membership list
  not present
  strip all tags and add my name

This is a redundancy check at the resource level to attempt prevent data 
corruption.  The if not me, and not in member list likely means the owner was 
a part of the member list at one point and has been fenced.  We feel safe 
taking the tags in that case. If the owner is a member of the cluster still 
then something is wrong that the admin should investigate. I'm guessing this 
could mean fencing failed and the admin unblocked the cluster in a weird state.

If I was writing this feature for the first time I probably wouldn't have 
thought to add this check in, but I don't see any harm in leaving it as it 
appears to serve a purpose.

-- Vossel

  try vgchange -ay

 instead of doing just
  strip all tags and add my name
  try vgchange -ay
 
 In what scenario (appart from Pacemaker not being able to count to 1)
 would the more elaborate version protect us better than the simple one,
 and against what?
 
 
 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 
 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Announcing Pacemaker Remote - extending high availability outside the cluster stack

2013-05-15 Thread David Vossel
Hi,

I'm excited to announce the initial development phase of the Pacemaker Remote 
daemon is complete and ready for testing in Pacemaker v1.1.10rc2

Below is the first draft of a deployment guide that outlines the initial 
supported Pacemaker Remote use-cases and provides walk-through examples.  Note 
however that Fedora 18 does not have the pacemaker-remote subpackage rpm 
available even though I do reference it in the documentation (this will be 
changed in a future draft) You'll have to use the 1.1.10rc2 tag in github for 
now.

Documentation: 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Remote/index.html

For those of you unfamiliar with Pacemake Remote, the pacemaker_remote service 
is a new daemon introduced in Pacemaker v1.1.10 that allows nodes not running 
the cluster stack (pacemaker+corosync) to integrate into the cluster and have 
the cluster manage their resources just as if they were a real cluster node. 
This means that Pacemaker clusters are now capable of managing both launching 
virtual environments (KVM/LXC) as well as launching the resources that live 
within those virtual environments without requiring the virtual environments to 
run pacemaker or corosync.

Usage of the pacemaker_remote daemon is currently limited to virtual guests 
such as KVM and Linux Containers, but several future enhancements to include 
additional use-cases are in the works.  These planned future enhancements 
include the following.

- Libvirt Sandbox Support
Once the libvirt-sandbox project is integrated with pacemaker_remote, we will 
gain the ability to preform per-resource linux container isolation with very 
little performance impact.  This functionality will allow resources living on a 
single node to be isolated from one another.  At that point CPU and memory 
limits could be set per-resource in the cluster dynamically just using the 
cluster config.

- Bare-metal Support
The pacemaker_remote daemon already has the ability to run on bare-metal 
hardware nodes, but the policy engine logic for integrating bare-metal nodes is 
not complete.  There are some complications involved with understanding a 
bare-metal node's state that virtual nodes don't have.  Once this logic is 
complete, pacemaker will be able to integrate bare-metal nodes in the same way 
virtual remote-nodes currently are. Some special considerations for fencing 
will need to be addressed.

- Virtual Remote-node Migration Support
Pacemaker's policy engine is limited in its ability to perform live migrations 
of KVM resources when resource dependencies are involved.  This limitation 
affects how resources living within a KVM remote-node are handled when a live 
migration takes place.  Currently when a live migration is performed on a KVM 
remote-node, all the resources within that remote-node have to be stopped 
before the migration takes place and started once again after migration has 
finished.  This policy engine limitation is fully explained in this bug report, 
http://bugs.clusterlabs.org/show_bug.cgi?id=5055#c3 

--
David Vossel dvos...@redhat.com
irc: dvossel on irc.freenode.net
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-05-15 Thread David Vossel




- Original Message -
 From: Brassow Jonathan jbras...@redhat.com
 To: David Vossel dvos...@redhat.com
 Cc: General Linux-HA mailing list linux-ha@lists.linux-ha.org, Lars 
 Marowsky-Bree l...@suse.com, Fabio M. Di
 Nitto fdini...@redhat.com
 Sent: Tuesday, May 14, 2013 5:01:02 PM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 
 On May 14, 2013, at 10:36 AM, David Vossel wrote:
 
  - Original Message -
  From: Lars Ellenberg lars.ellenb...@linbit.com
  To: Lars Marowsky-Bree l...@suse.com
  Cc: Fabio M. Di Nitto fdini...@redhat.com, General Linux-HA mailing
  list linux-ha@lists.linux-ha.org,
  Jonathan Brassow jbras...@redhat.com
  Sent: Tuesday, May 14, 2013 9:50:43 AM
  Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
  
  On Tue, May 14, 2013 at 04:06:09PM +0200, Lars Marowsky-Bree wrote:
  On 2013-05-14T09:54:55, David Vossel dvos...@redhat.com wrote:
  
  Here's what it comes down to.  You aren't guaranteed exclusive
  activation just because pacemaker is in control. There are scenarios
  with SAN disks where the node starts up and can potentially attempt to
  activate a volume before pacemaker has initialized.
  
  Yeah, from what I've read in the code, the tagged activation would also
  prevent a manual (or on-boot) vg/lv activation (because it seems lvm
  itself will refuse). That seems like a good idea to me. Unless I'm
  wrong, that concept seems sound, barring bugs that need fixing.
  
  Sure.
  
  And I'm not at all oposed to using tags.
  I want to get rid of the layer violation,
  which is the one Bad Thing I'm complaining about.
  
  Also, note that on stop, this strips all tags, leaving it untagged.
  On the next cluster boot, if that was really the concern,
  all nodes would grab and activate the VG, as it is untagged...
  
  That's not how it works.  You have to take ownership of the volume before
  you can activate it.  Untagged does not mean a node can activate it
  without first explicitly setting the tag.
 
 Ok, so I'm coming into this late.  Sorry about that.
 
 David has this right.  Tagging in conjunction with the 'volume_list' setting
 in lvm.conf is what is used to restrict VG/LV activation.  As he mentioned,
 you don't want a machine to boot up and start doing a resync on a mirror
 while user I/O is happening on the node where the service is active.  In
 that scenario, even if the LV is not mounted, there will be corruption.  The
 LV must not be allowed activation in the first place.
 
 I think the HA scripts written for rgmanager could be considerably reduced in
 size.  We probably don't need the matrix of different methods (cLVM vs
 Tagging.  VG vs LV).  Many of these came about as customers asked for them
 and we didn't want to compromise backwards compatibility.  If we are
 switching, now's the time for clean-up.  In fact, LVM has something new in
 lvm.conf: 'auto_activation_volume_list'.  If the list is defined and a VG/LV
 is in the list, it will be automatically activated on boot; otherwise, it
 will not.  That means, forget tagging and forget cLVM.  Make users change
 'auto_activation_volume_list' to include only VGs that are not controlled by
 pacemaker.  The HA script should then make sure that
 'auto_activation_volume_list' is defined and does not contain the VG/LV that
 is being controlled by pacemaker.  It would be necessary to check that the
 lvm.conf copy in the initrd is properly set.
 
 The use of 'auto_activation_volume_list' depends on updates to the LVM
 initscripts - ensuring that they use '-aay' in order to activate logical
 volumes.  That has been checked in upstream.  I'm sure it will go into RHEL7
 and I think (but would need to check on) RHEL6.

The 'auto_activation_volume_list' doesn't seem like it exactly what we want 
here though.  It kind of works for what we are wanting to achieve but as a side 
effect, and I'm not sure it would work for everyone's deployment.  For example, 
is there a way to set 'auto_activation_volume_list' as empty and still be able 
to ensure that no volume groups are initiated at startup?

What I'd really like to see is some sort of 'allow/deny' filter just for 
startup.  Then we could do something like this.

# start by denying everything on startup
auto_activation_deny_list=[ @* ]
# If we need to allow some vg on startup, we can explicitly enable them here.
allow_activation_allow_list=[ vg1, vg2 ]

Is something like the above possible yet?  Using a method like this, we lose 
the added security that the tags give us outside of the cluster management.  I 
trust pacemaker though :)

-- Vossel





-- Vossel

 
 brassow
 
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-05-14 Thread David Vossel
- Original Message -
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: linux-ha@lists.linux-ha.org
 Cc: David Vossel dvos...@redhat.com, Fabio M. Di Nitto 
 fdini...@redhat.com, Andrew Beekhof
 and...@beekhof.net, Lars Marowsky-Bree l...@suse.com, Lon Hohberger 
 l...@redhat.com, Jonathan Brassow
 jbras...@redhat.com, Dejan Muhamedagic deja...@fastmail.fm
 Sent: Tuesday, May 14, 2013 6:22:08 AM
 Subject: LVM Resource agent, exclusive activation
 
 
 This is about pull request
 https://github.com/ClusterLabs/resource-agents/pull/222
 Merge redhat lvm.sh feature set into heartbeat LVM agent
 
 Apologies to the CC for list duplicates.  Cc list was made by looking at
 the comments in the pull request, and some previous off-list thread.
 
 Even though this is about resource agent feature development,
 and thus actually a topic for the -dev list,
 I wanted to give this the maybe wider audience of the users list,
 to encourage feedback from people who actually *use* this feature
 with rgmanager, or intend to use it once it is in the pacemaker RA.
 
 
 
 Here is my perception of this pull request, as such very subjective, and
 I may have gotten some intentions or facts wrong, so please correct me,
 or add whatever I may have missed.
 
 
 Appart from a larger restructuring of the code, this introduces the
 feature of exclusive activation of LVM volume groups.
 
 From the commit message:
 
   This patch leaves the original LVM heartbeat functionality
   intact while adding these addition features from the redhat agent.
 
   1. Exclusive activation using volume group tags. This feature
   allows a volume group to live on shared storage within the cluster
   without requiring the use of cLVM for metadata locking.
 
   2. individual logical volume activation for local and cluster
   volume groups by using the new 'lvname' option.
 
   3. Better setup validation when the 'exclusive' option is enabled.
   This patch validates that when exclusive activation is enabled, either
   a cluster volume group is in use with cLVM, or the tags variant is
   configured correctly. These new checks also makes it impossible to
   enable the exclusive activation for cloned resources.
 
 
 That sounds great. Why even discuss it, of course we want that.
 
 But I feel it does not do what it advertises.
 Rather I think it gives a false sense of exclusivity
 that is actually not met.
 
 (point 2., individual LV activation is ok with me, I think;
  my difficulties are with the exclusive by tagging thingy)
 
 So what does it do.
 
 To activate a VG exclusively, it uses LVM tags (see the LVM
 documentation about these).
 
 Any VG or LV can be tagged with a number of tags.
 Here, only one tag is used (and any other tags will be stripped!).
 
 I try to contrast current behaviour and exclusive behaviour:
 
 start:
 non-exclusive:
   just (try to) activate the VG
 exclusive by tag:
   check if a the VG is currently tagged with my node name
   if not, is it tagged at all?
 if tagged, and that happens to be a node name that
  is in the current corosync membership:
 FAIL activation
   else, it is tagged, but that is not a node name,
  or not currently in the membership:
 strip any and all tags, then proceed
   if not FAILed because already tagged by an other member,
   re-tag with *my* nodename
   activate it.
 
 Also it does double check the ownership in
 monitor:
 non-exclusive:
 I think due to the high timeout potential under load
   when using any LVM commands, this just checks for the presence
   of the /dev/$VGNAME directory nowadays, which is lightweight,
   and usually good enough (as the services *using* the LVs are
   monitored anyways).
 exclusive by tag:
 it does the above, then, if active, double checks
   that the current node name is also the current tag value,
   and if not (tries to) deactivate (which will usually fail,
   as it can only succeed if it is unused), and returns failure
   to Pacemaker, which will then do its recovery cycle.
 
   By default, Pacemaker would stop all depending resources,
   stop this one, and restart the whole stack.
 
   Which will, in a real split brain situation just
 make sure that nodes will keep stealing it from each other;
 it does not prevent corruption in any way.
 
 In a non-split-brain case, this situation can not happen
 anyways.  Unless two nodes raced to activate it,
 when it was untagged.
 Oops, so it does not prevent that either.
 
 For completeness, on
 stop:
non-exclusive:
just deactivate the VG
exclusive by tag:
double check I am the tag owner
then strip that tag (so no tag remains, the VG becomes untagged)
and deactivate.
 
 So the resource agent tries

Re: [Linux-HA] LVM Resource agent, exclusive activation

2013-05-14 Thread David Vossel
- Original Message -
 From: Lars Ellenberg lars.ellenb...@linbit.com
 To: Lars Marowsky-Bree l...@suse.com
 Cc: Fabio M. Di Nitto fdini...@redhat.com, General Linux-HA mailing 
 list linux-ha@lists.linux-ha.org,
 Jonathan Brassow jbras...@redhat.com
 Sent: Tuesday, May 14, 2013 9:50:43 AM
 Subject: Re: [Linux-HA] LVM Resource agent, exclusive activation
 
 On Tue, May 14, 2013 at 04:06:09PM +0200, Lars Marowsky-Bree wrote:
  On 2013-05-14T09:54:55, David Vossel dvos...@redhat.com wrote:
  
   Here's what it comes down to.  You aren't guaranteed exclusive
   activation just because pacemaker is in control. There are scenarios
   with SAN disks where the node starts up and can potentially attempt to
   activate a volume before pacemaker has initialized.
  
  Yeah, from what I've read in the code, the tagged activation would also
  prevent a manual (or on-boot) vg/lv activation (because it seems lvm
  itself will refuse). That seems like a good idea to me. Unless I'm
  wrong, that concept seems sound, barring bugs that need fixing.
 
 Sure.
 
 And I'm not at all oposed to using tags.
 I want to get rid of the layer violation,
 which is the one Bad Thing I'm complaining about.
 
 Also, note that on stop, this strips all tags, leaving it untagged.
 On the next cluster boot, if that was really the concern,
 all nodes would grab and activate the VG, as it is untagged...

That's not how it works.  You have to take ownership of the volume before you 
can activate it.  Untagged does not mean a node can activate it without first 
explicitly setting the tag.

 
 So no, in the current form,
 it just *pretends* to protect against a number of things,
 but actually does not.
 
 And that is the other, even worse, Bad Thing.
 
  That's similar to what cLVM2 does and protects against, but without
  needing the cLVM2/DLM bits; that has, uhm, advantages too.
  
  In short, I'm in favor of this feature. (Clearly, lge has pointed out
  one or two issues that need fixing, that doesn't detract from the
  idea.)
 
 But that would be implemented simply by using tags, and on
 start:
   re-tag with my nodename
   activate
 
 That way, it is always tagged, so no stupid initrd, udev or boot script,
 not even a tired admin, will accidentally activate it.

 No need for anything else,
 no callout to membership necessary.
 All that smoke and mirrors adds complexity, and does not buy us anything,
 but a false sense of what that could possibly protect us against.
 
 If it was tagged with an other node name that is in the membership,
 then pacemaker would know about it, too, and had made sure it is not
 activated there.
 
 If that other node was not in the membership,
 we would re-tag and activate anyways.
 
 So why not just do that,
 document that it is done this way,
 and not pretend it would do more than that.
 It does not.

I've tested this specific scenario and was unable to activate the volume group 
manually without grabbing the tag first.  Have you tested this and found 
something contrary to my results?  This is how the feature is supposed to work.

-- Vossel 

   Lars
 
 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Q: limiting parallel execution of resource actions

2013-04-15 Thread David Vossel




- Original Message -
 From: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de
 To: linux-ha@lists.linux-ha.org
 Sent: Monday, April 15, 2013 1:39:34 AM
 Subject: [Linux-HA] Q: limiting parallel execution of resource actions
 
 Hi!
 
 I have a wuestion: If I want to limit parallel execution of some resources,
 how can I configure that?
 
 Background: Some resources may hit some OS bug when demanding high I/O (e.g.
 starting several Xen-VMs residing on a OCFS2 filesystem that is mirrored by
 cLVM). The I/O performance will drop approximately to zero when cLVM is also
 mirroring. That's not because of the I/O channel being saturated, but
 because of terrible programming regarding cLVM... anyway:
 
 Would it work to define an advisory ordering of resources (in addition to a
 different mandatory ordering), so if crm would schedule alle those resources
 at one (in parallel), it would actually schedule the sequentially?
 
 If I stop one resource, will those logically after the one being stopped be
 also stopped due to advisory ordering then?
 
 Maybe a new mechanism is needed to restrict parallelism based on resources
 (i.e. a new type of constraint).

hey,

'batch-limit' cluster option might help.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_available_cluster_options

-- Vossel

 Regards,
 Ulrich
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] IPaddr2 support of ipv6

2013-04-02 Thread David Vossel
- Original Message -
 From: Keisuke MORI keisuke.mori...@gmail.com
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Sent: Tuesday, April 2, 2013 1:41:21 AM
 Subject: Re: [Linux-HA] IPaddr2 support of ipv6
 
 Hi,
 
 2013/3/29 David Vossel dvos...@redhat.com:
  Hi,
 
  It looks like ipv6 support got added to the IPaddr2 agent last year.  I'm
  curious why the metadata only advertises that the 'ip' option should be an
  IPv4 address.
 
ip (required): The IPv4 address to be configured in dotted quad
notation, for example192.168.1.1.
 
  Is this just an oversight?  If so this patch would probably help.
  https://github.com/davidvossel/resource-agents/commit/07be0019a50b96743536ab50727b56d9175bf95f
 
 Ah, yes that's just an oversight. Thank you for pointing out.
 Would you submit your patch as a pull request?

:) done, https://github.com/ClusterLabs/resource-agents/pull/219

 Thanks,
 --
 Keisuke MORI
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Getting Unknown Error for HA + Asterisk

2013-04-02 Thread David Vossel
: David Vossel dvos...@redhat.com
  Subject: Re: [Linux-HA] Getting Unknown Error
  To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
  Message-ID: 2142020906.59260.1364830216244.javamail.r...@redhat.com
  Content-Type: text/plain; charset=utf-8
 
  - Original Message -
   From: Ahmed Munir ahmedmunir...@gmail.com
   To: linux-ha@lists.linux-ha.org
   Sent: Friday, March 29, 2013 11:26:50 AM
   Subject: [Linux-HA] Getting Unknown Error
  
   Hi,
  
   I recently configured Linux HA for Asterisk service (using Asterisk
   resource agent downloaded from link:
  
  https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/asterisk
   ).
   As per configuration it is working good but when I include
  monitor_sipuri=
   sip:42@10.3.152.103  parameter in primitive section it is giving me an
   errors like listed below;
  
   root@asterisk2 ~ crm_mon -1
  
   
  
   Last updated: Thu Mar 28 06:09:54 2013
  
   Stack: Heartbeat
  
   Current DC: asterisk2 (b966dfa2-5973-4dfc-96ba-b2d38319c174) - partition
   with quorum
  
   Version: 1.0.12-unknown
  
   2 Nodes configured, unknown expected votes
  
   1 Resources configured.
  
   
  
  
  
   Online: [ asterisk1 asterisk2 ]
  
  
  
   Resource Group: group_1
  
asterisk_2 (lsb:asterisk): Started asterisk1
 
  Do you have two asterisk instances in the cluster, a LSB and OCF one?! I'm
  confused by this.
 
  
IPaddr_10_3_152_103(ocf::heartbeat:IPaddr):Started
   asterisk1
  
  
  
   Failed actions:
  
   p_asterisk_start_0 (node=asterisk1, call=64, rc=1, status=complete):
   unknown error
  
   p_asterisk_start_0 (node=asterisk2, call=20, rc=1, status=complete):
   unknown error
  
  
   I tested the 'sipsak' tool on cli, it is executing without any issue i.e.
   returning 200 OK but when I remove this param monitor_sipuri I'm not
   getting the errors.
 
  Did you use the exact same SIP URI and test it on the box asterisk is
  running on? Look through the log output.  Perhaps the resource agent is
  outputting some information that could give you a clue as to what is going
  on.
 
  A trick I'd use is running wireshark on the box the asterisk resource is
  starting on, watch the OPTION request come in and see how it is responded
  to for the resource agent to rule out any problems there.
 
  -- Vossel
 
   Listing down the configuration below which I configured;
  
node $id=887bae58-1eb6-47d1-b539-d12a2ed3d836 asterisk1
   node $id=b966dfa2-5973-4dfc-96ba-b2d38319c174 asterisk2
   primitive IPaddr_10_3_152_103 ocf:heartbeat:IPaddr \
   op monitor interval=5s timeout=20s \
   params ip=10.3.152.103
   primitive p_asterisk ocf:heartbeat:asterisk \
   op monitor interval=10s \
   params realtime=true
   group group_1 p_asterisk IPaddr_10_3_152_103 \
   meta target-role=Started
   location rsc_location_group_1 group_1 \
   rule $id=preferred_location_group_1 100: #uname eq asterisk1
   colocation asterisk-with-ip inf: p_asterisk IPaddr_10_3_152_103
   property $id=cib-bootstrap-options \
   symmetric-cluster=true \
   no-quorum-policy=stop \
   default-resource-stickiness=0 \
   stonith-enabled=false \
   stonith-action=reboot \
   startup-fencing=true \
   stop-orphan-resources=true \
   stop-orphan-actions=true \
   remove-after-stop=false \
   default-action-timeout=120s \
   is-managed-default=true \
   cluster-delay=60s \
   pe-error-series-max=-1 \
   pe-warn-series-max=-1 \
   pe-input-series-max=-1 \
   dc-version=1.0.12-unknown \
   cluster-infrastructure=Heartbeat
  
   And the status I'm getting is listed below;
  
   root@asterisk1 ~ crm_mon -1
   
   Last updated: Fri Mar 29 12:25:10 2013
   Stack: Heartbeat
   Current DC: asterisk1 (887bae58-1eb6-47d1-b539-d12a2ed3d836) - partition
   with quorum
   Version: 1.0.12-unknown
   2 Nodes configured, unknown expected votes
   1 Resources configured.
   
  
   Online: [ asterisk1 asterisk2 ]
  
Resource Group: group_1
p_asterisk (ocf::heartbeat:asterisk):  Started asterisk1
IPaddr_10_3_152_103(ocf::heartbeat:IPaddr):Started
   asterisk1
  
  
   Please advise to overcome this issue.
  
   --
   Regards,
  
   Ahmed Munir Chohan
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
  
 
 
  --
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
  End of Linux-HA Digest, Vol 113, Issue 1

Re: [Linux-HA] Getting Unknown Error

2013-04-01 Thread David Vossel
- Original Message -
 From: Ahmed Munir ahmedmunir...@gmail.com
 To: linux-ha@lists.linux-ha.org
 Sent: Friday, March 29, 2013 11:26:50 AM
 Subject: [Linux-HA] Getting Unknown Error
 
 Hi,
 
 I recently configured Linux HA for Asterisk service (using Asterisk
 resource agent downloaded from link:
 https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/asterisk
 ).
 As per configuration it is working good but when I include monitor_sipuri=
 sip:42@10.3.152.103  parameter in primitive section it is giving me an
 errors like listed below;

 root@asterisk2 ~ crm_mon -1
 
 
 
 Last updated: Thu Mar 28 06:09:54 2013
 
 Stack: Heartbeat
 
 Current DC: asterisk2 (b966dfa2-5973-4dfc-96ba-b2d38319c174) - partition
 with quorum
 
 Version: 1.0.12-unknown
 
 2 Nodes configured, unknown expected votes
 
 1 Resources configured.
 
 
 
 
 
 Online: [ asterisk1 asterisk2 ]
 
 
 
 Resource Group: group_1
 
  asterisk_2 (lsb:asterisk): Started asterisk1

Do you have two asterisk instances in the cluster, a LSB and OCF one?! I'm 
confused by this.

 
  IPaddr_10_3_152_103(ocf::heartbeat:IPaddr):Started
 asterisk1
 
 
 
 Failed actions:
 
 p_asterisk_start_0 (node=asterisk1, call=64, rc=1, status=complete):
 unknown error
 
 p_asterisk_start_0 (node=asterisk2, call=20, rc=1, status=complete):
 unknown error
 
 
 I tested the 'sipsak' tool on cli, it is executing without any issue i.e.
 returning 200 OK but when I remove this param monitor_sipuri I'm not
 getting the errors.

Did you use the exact same SIP URI and test it on the box asterisk is running 
on? Look through the log output.  Perhaps the resource agent is outputting some 
information that could give you a clue as to what is going on.

A trick I'd use is running wireshark on the box the asterisk resource is 
starting on, watch the OPTION request come in and see how it is responded to 
for the resource agent to rule out any problems there.

-- Vossel

 Listing down the configuration below which I configured;
 
  node $id=887bae58-1eb6-47d1-b539-d12a2ed3d836 asterisk1
 node $id=b966dfa2-5973-4dfc-96ba-b2d38319c174 asterisk2
 primitive IPaddr_10_3_152_103 ocf:heartbeat:IPaddr \
 op monitor interval=5s timeout=20s \
 params ip=10.3.152.103
 primitive p_asterisk ocf:heartbeat:asterisk \
 op monitor interval=10s \
 params realtime=true
 group group_1 p_asterisk IPaddr_10_3_152_103 \
 meta target-role=Started
 location rsc_location_group_1 group_1 \
 rule $id=preferred_location_group_1 100: #uname eq asterisk1
 colocation asterisk-with-ip inf: p_asterisk IPaddr_10_3_152_103
 property $id=cib-bootstrap-options \
 symmetric-cluster=true \
 no-quorum-policy=stop \
 default-resource-stickiness=0 \
 stonith-enabled=false \
 stonith-action=reboot \
 startup-fencing=true \
 stop-orphan-resources=true \
 stop-orphan-actions=true \
 remove-after-stop=false \
 default-action-timeout=120s \
 is-managed-default=true \
 cluster-delay=60s \
 pe-error-series-max=-1 \
 pe-warn-series-max=-1 \
 pe-input-series-max=-1 \
 dc-version=1.0.12-unknown \
 cluster-infrastructure=Heartbeat
 
 And the status I'm getting is listed below;
 
 root@asterisk1 ~ crm_mon -1
 
 Last updated: Fri Mar 29 12:25:10 2013
 Stack: Heartbeat
 Current DC: asterisk1 (887bae58-1eb6-47d1-b539-d12a2ed3d836) - partition
 with quorum
 Version: 1.0.12-unknown
 2 Nodes configured, unknown expected votes
 1 Resources configured.
 
 
 Online: [ asterisk1 asterisk2 ]
 
  Resource Group: group_1
  p_asterisk (ocf::heartbeat:asterisk):  Started asterisk1
  IPaddr_10_3_152_103(ocf::heartbeat:IPaddr):Started
 asterisk1
 
 
 Please advise to overcome this issue.
 
 --
 Regards,
 
 Ahmed Munir Chohan
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] IPaddr2 support of ipv6

2013-03-28 Thread David Vossel
Hi,

It looks like ipv6 support got added to the IPaddr2 agent last year.  I'm 
curious why the metadata only advertises that the 'ip' option should be an IPv4 
address.

  ip (required): The IPv4 address to be configured in dotted quad notation, 
for example192.168.1.1.

Is this just an oversight?  If so this patch would probably help. 
https://github.com/davidvossel/resource-agents/commit/07be0019a50b96743536ab50727b56d9175bf95f

-- Vossel
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Linux HA for Asterisk

2013-03-21 Thread David Vossel
- Original Message -
 From: Ahmed Munir ahmedmunir...@gmail.com
 To: linux-ha@lists.linux-ha.org
 Sent: Tuesday, March 19, 2013 11:03:38 AM
 Subject: Re: [Linux-HA] Linux HA for Asterisk
 
 Thanks David,
 
 BTW, is there any document/material available on Internet for
 configuring
 it and testing the Asterisk OCF resource?
 
 Please advise.

I'm sure there is, but I am unfamiliar with the agent other than knowing it 
exists.  If you don't find any help here, perhaps try the asterisk-users 
mailing list (subscribe here, http://www.asterisk.org/community/discuss) 

-- Vossel

 Date: Tue, 19 Mar 2013 10:49:56 -0400 (EDT)
  From: David Vossel dvos...@redhat.com
  Subject: Re: [Linux-HA] Linux HA for Asterisk
  To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
  Message-ID:
  506620979.929845.1363704596090.javamail.r...@redhat.com
  Content-Type: text/plain; charset=utf-8
 
 
 
  - Original Message -
   From: Ahmed Munir ahmedmunir...@gmail.com
   To: linux-ha@lists.linux-ha.org
   Sent: Monday, March 18, 2013 12:31:17 PM
   Subject: [Linux-HA] Linux HA for Asterisk
  
   Hi all,
  
   The version for heartbeat I installed on two CentOS 5.9 boxes is
   3.0.3-2
   and it is for Asterisk fail over. As per default/standard
   configuration if
   I stop heartbeat service from either one machine, the virtual IP
   automatically assign to another machine and this setup is working
   good but
   it only applies on the system level. Whereas I'm looking forward
   for
   service level i.e. if Asterisk service isn't running on serverA,
   IP
   address
   automatically assign to serverB.
  
   Please advise to accomplish this case (service level failover) as
there
   are no such OCF resources exist for Asterisk.
 
  Have you seen this?
 
 
  https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/asterisk
 
  -- Vossel
 
   Listing down the standard configuration for Linux HA below;
  
nodes
 node id=887bae58-1eb6-47d1-b539-d12a2ed3d836
 uname=asterisk1
   type=normal/
 node id=b966dfa2-5973-4dfc-96ba-b2d38319c174
 uname=asterisk2
   type=normal/
   /nodes
   resources
 group id=group_1
   primitive class=ocf id=IPaddr_10_3_152_103
   provider=heartbeat type=IPaddr
 operations
   op id=IPaddr_10_3_152_103_mon interval=5s
   name=monitor
   timeout=5s/
 /operations
 instance_attributes
 id=IPaddr_10_3_152_103_inst_attr
   attributes
 nvpair id=IPaddr_10_3_152_103_attr_0 name=ip
   value=10.3.152.103/
   /attributes
 /instance_attributes
   /primitive
   primitive class=lsb id=asterisk_2
   provider=heartbeat
   type=asterisk
 operations
   op id=asterisk_2_mon interval=120s
   name=monitor
   timeout=60s/
 /operations
   /primitive
 /group
   /resources
   constraints
 rsc_location id=rsc_location_group_1 rsc=group_1
   rule id=preferred_location_group_1 score=100
 expression attribute=#uname
   id=preferred_location_group_1_expr operation=eq
   value=asterisk1/
   /rule
 /rsc_location
   /constraints
 /configuration
  
  
   --
   Regards,
  
   Ahmed Munir Chohan
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
  
 
 
 --
 Regards,
 
 Ahmed Munir Chohan
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Linux HA for Asterisk

2013-03-19 Thread David Vossel


- Original Message -
 From: Ahmed Munir ahmedmunir...@gmail.com
 To: linux-ha@lists.linux-ha.org
 Sent: Monday, March 18, 2013 12:31:17 PM
 Subject: [Linux-HA] Linux HA for Asterisk
 
 Hi all,
 
 The version for heartbeat I installed on two CentOS 5.9 boxes is
 3.0.3-2
 and it is for Asterisk fail over. As per default/standard
 configuration if
 I stop heartbeat service from either one machine, the virtual IP
 automatically assign to another machine and this setup is working
 good but
 it only applies on the system level. Whereas I'm looking forward for
 service level i.e. if Asterisk service isn't running on serverA, IP
 address
 automatically assign to serverB.
 
 Please advise to accomplish this case (service level failover) as
  there
 are no such OCF resources exist for Asterisk.

Have you seen this?

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/asterisk

-- Vossel

 Listing down the standard configuration for Linux HA below;
 
  nodes
   node id=887bae58-1eb6-47d1-b539-d12a2ed3d836
   uname=asterisk1
 type=normal/
   node id=b966dfa2-5973-4dfc-96ba-b2d38319c174
   uname=asterisk2
 type=normal/
 /nodes
 resources
   group id=group_1
 primitive class=ocf id=IPaddr_10_3_152_103
 provider=heartbeat type=IPaddr
   operations
 op id=IPaddr_10_3_152_103_mon interval=5s
 name=monitor
 timeout=5s/
   /operations
   instance_attributes id=IPaddr_10_3_152_103_inst_attr
 attributes
   nvpair id=IPaddr_10_3_152_103_attr_0 name=ip
 value=10.3.152.103/
 /attributes
   /instance_attributes
 /primitive
 primitive class=lsb id=asterisk_2 provider=heartbeat
 type=asterisk
   operations
 op id=asterisk_2_mon interval=120s name=monitor
 timeout=60s/
   /operations
 /primitive
   /group
 /resources
 constraints
   rsc_location id=rsc_location_group_1 rsc=group_1
 rule id=preferred_location_group_1 score=100
   expression attribute=#uname
 id=preferred_location_group_1_expr operation=eq
 value=asterisk1/
 /rule
   /rsc_location
 /constraints
   /configuration
 
 
 --
 Regards,
 
 Ahmed Munir Chohan
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Master/Slave - Master node not monitored after a failure

2013-02-04 Thread David Vossel


- Original Message -
 From: radurad radu@gmail.com
 To: linux-ha@lists.linux-ha.org
 Sent: Monday, February 4, 2013 1:51:49 AM
 Subject: Re: [Linux-HA] Master/Slave - Master node not monitored after a 
 failure
 
 
 Hi,
 
 I've installed from rpm's as it was faster (from sources I had to
 install a
 lot in devel packages and got stuck at libcpg).
 The issues is solved, master is being monitored after any numbers of
 failures. But, there is a new issue I'm facing now (if I'm not able
 to have
 it fixed I'll probably make a new post on forum - if one is not
 already
 created-): after a couple of failures and restarts at the next
 failure the
 mysql is not started anymore; on logs i got the message MySql is not
 running, but the start/ restart doesn't happen (made sure that
 failcount is
 0, as I have it reseted from time to time).

I haven't encountered anything like that. If you can gather the log and pengine 
cluster data using crm_report we should be able to help figure out what is 
going on.

-- Vossel

 
 Thanks again,
 Radu Rad.
 
 
 David Vossel wrote:
  
  
  
  - Original Message -
  From: radurad radu@gmail.com
  To: linux-ha@lists.linux-ha.org
  Sent: Wednesday, January 30, 2013 5:10:00 AM
  Subject: Re: [Linux-HA] Master/Slave - Master node not monitored
  after a
  failure
  
  
  Hi,
  
  Thank you for clarifying this.
  On CentOS 6 the latest pacemaker build is 1.1.7 (which i'm using
  now), do
  you see a problem if I'm installing from sources so that I'll have
  the 1.1.8
  pacemaker version?
  
  The only thing I can think of is that you might have to get a new
  version
  of libqb in order to use 1.1.8.  We already have a rhel 6 based
  package
  you can use if you want.
  
  http://clusterlabs.org/rpm-next/
  
  -- Vossel
  
  Best Regards,
  Radu Rad.
  
  
  
  David Vossel wrote:
   
   
   
   - Original Message -
   From: radurad radu@gmail.com
   To: linux-ha@lists.linux-ha.org
   Sent: Thursday, January 24, 2013 6:07:38 AM
   Subject: [Linux-HA] Master/Slave - Master node not monitored
   after
   a
   failure
   
   
   Hi,
   
   Using following installation under CentOS
   
   corosync-1.4.1-7.el6_3.1.x86_64
   resource-agents-3.9.2-12.el6.x86_64
   
   and having the following configuration for a Master/Slave mysql
   
   primitive mysqld ocf:heartbeat:mysql \
   params binary=/usr/bin/mysqld_safe
   config=/etc/my.cnf
   socket=/var/lib/mysql/mysql.sock datadir=/var/lib/mysql
   user=mysql
   replication_user=root replication_passwd=testtest \
   op monitor interval=5s role=Slave timeout=31s \
   op monitor interval=6s role=Master timeout=30s
   ms ms_mysql mysqld \
   meta master-max=1 master-node-max=1 clone-max=2
   clone-node-max=1 notify=true
   property $id=cib-bootstrap-options \
  
  dc-version=1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
   \
   cluster-infrastructure=openais \
   expected-quorum-votes=2 \
   no-quorum-policy=ignore \
   stonith-enabled=false \
   last-lrm-refresh=1359026356 \
   start-failure-is-fatal=false \
   cluster-recheck-interval=60s
   rsc_defaults $id=rsc-options \
   failure-timeout=50s
   
   Having only one node online (the Master; with a slave online
   the
   problem
   also occurs, but for simplification I've left only the Master
   online)
   
   I run into the bellow problem:
   - Stopping once the mysql process results in corosync
   restarting
   the
   mysql
   again and promoting it to Master.
   - Stopping again the mysql process results in nothing; the
   failure
   is
   not
   detected, corosync takes no action and still sees the node as
   Master
   and the
   mysql running.
   - The operation monitor is not running after the first failure,
   as
   there are
   not entries in log of type:  INFO: MySQL monitor succeeded
   (master).
   - Changing something in configuration results in corosync
   detecting
   immediately that mysql is not running and promotes it. Also the
   operation
   monitor will run until the first failure and which the same
   problem
   occurs.
   
   If you need more information let me know. I could attach the
   log
   in
   the
   messages files also.
   
   Hey,
   
   This is a known bug and has been resolved in pacemaker 1.1.8.
   
   Here's the related issue. The commits are listed in the
   comments.
   http://bugs.clusterlabs.org/show_bug.cgi?id=5072
   
   
   -- Vossel
   
   Thanks for now,
   Radu.
   
   --
   View this message in context:
  
  http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34939865.html
   Sent from the Linux-HA mailing list archive at Nabble.com.
   
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Master/Slave - Master node not monitored after a failure

2013-01-30 Thread David Vossel


- Original Message -
 From: radurad radu@gmail.com
 To: linux-ha@lists.linux-ha.org
 Sent: Wednesday, January 30, 2013 5:10:00 AM
 Subject: Re: [Linux-HA] Master/Slave - Master node not monitored after a 
 failure
 
 
 Hi,
 
 Thank you for clarifying this.
 On CentOS 6 the latest pacemaker build is 1.1.7 (which i'm using
 now), do
 you see a problem if I'm installing from sources so that I'll have
 the 1.1.8
 pacemaker version?

The only thing I can think of is that you might have to get a new version of 
libqb in order to use 1.1.8.  We already have a rhel 6 based package you can 
use if you want.

http://clusterlabs.org/rpm-next/

-- Vossel

 Best Regards,
 Radu Rad.
 
 
 
 David Vossel wrote:
  
  
  
  - Original Message -
  From: radurad radu@gmail.com
  To: linux-ha@lists.linux-ha.org
  Sent: Thursday, January 24, 2013 6:07:38 AM
  Subject: [Linux-HA] Master/Slave - Master node not monitored after
  a
  failure
  
  
  Hi,
  
  Using following installation under CentOS
  
  corosync-1.4.1-7.el6_3.1.x86_64
  resource-agents-3.9.2-12.el6.x86_64
  
  and having the following configuration for a Master/Slave mysql
  
  primitive mysqld ocf:heartbeat:mysql \
  params binary=/usr/bin/mysqld_safe config=/etc/my.cnf
  socket=/var/lib/mysql/mysql.sock datadir=/var/lib/mysql
  user=mysql
  replication_user=root replication_passwd=testtest \
  op monitor interval=5s role=Slave timeout=31s \
  op monitor interval=6s role=Master timeout=30s
  ms ms_mysql mysqld \
  meta master-max=1 master-node-max=1 clone-max=2
  clone-node-max=1 notify=true
  property $id=cib-bootstrap-options \
  dc-version=1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
  \
  cluster-infrastructure=openais \
  expected-quorum-votes=2 \
  no-quorum-policy=ignore \
  stonith-enabled=false \
  last-lrm-refresh=1359026356 \
  start-failure-is-fatal=false \
  cluster-recheck-interval=60s
  rsc_defaults $id=rsc-options \
  failure-timeout=50s
  
  Having only one node online (the Master; with a slave online the
  problem
  also occurs, but for simplification I've left only the Master
  online)
  
  I run into the bellow problem:
  - Stopping once the mysql process results in corosync restarting
  the
  mysql
  again and promoting it to Master.
  - Stopping again the mysql process results in nothing; the failure
  is
  not
  detected, corosync takes no action and still sees the node as
  Master
  and the
  mysql running.
  - The operation monitor is not running after the first failure, as
  there are
  not entries in log of type:  INFO: MySQL monitor succeeded
  (master).
  - Changing something in configuration results in corosync
  detecting
  immediately that mysql is not running and promotes it. Also the
  operation
  monitor will run until the first failure and which the same
  problem
  occurs.
  
  If you need more information let me know. I could attach the log
  in
  the
  messages files also.
  
  Hey,
  
  This is a known bug and has been resolved in pacemaker 1.1.8.
  
  Here's the related issue. The commits are listed in the comments.
  http://bugs.clusterlabs.org/show_bug.cgi?id=5072
  
  
  -- Vossel
  
  Thanks for now,
  Radu.
  
  --
  View this message in context:
  http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34939865.html
  Sent from the Linux-HA mailing list archive at Nabble.com.
  
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
  
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
  
  
 
 --
 View this message in context:
 http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34962132.html
 Sent from the Linux-HA mailing list archive at Nabble.com.
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Master/Slave - Master node not monitored after a failure

2013-01-29 Thread David Vossel


- Original Message -
 From: radurad radu@gmail.com
 To: linux-ha@lists.linux-ha.org
 Sent: Thursday, January 24, 2013 6:07:38 AM
 Subject: [Linux-HA] Master/Slave - Master node not monitored after a failure
 
 
 Hi,
 
 Using following installation under CentOS
 
 corosync-1.4.1-7.el6_3.1.x86_64
 resource-agents-3.9.2-12.el6.x86_64
 
 and having the following configuration for a Master/Slave mysql
 
 primitive mysqld ocf:heartbeat:mysql \
 params binary=/usr/bin/mysqld_safe config=/etc/my.cnf
 socket=/var/lib/mysql/mysql.sock datadir=/var/lib/mysql
 user=mysql
 replication_user=root replication_passwd=testtest \
 op monitor interval=5s role=Slave timeout=31s \
 op monitor interval=6s role=Master timeout=30s
 ms ms_mysql mysqld \
 meta master-max=1 master-node-max=1 clone-max=2
 clone-node-max=1 notify=true
 property $id=cib-bootstrap-options \
 dc-version=1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
 \
 cluster-infrastructure=openais \
 expected-quorum-votes=2 \
 no-quorum-policy=ignore \
 stonith-enabled=false \
 last-lrm-refresh=1359026356 \
 start-failure-is-fatal=false \
 cluster-recheck-interval=60s
 rsc_defaults $id=rsc-options \
 failure-timeout=50s
 
 Having only one node online (the Master; with a slave online the
 problem
 also occurs, but for simplification I've left only the Master online)
 
 I run into the bellow problem:
 - Stopping once the mysql process results in corosync restarting the
 mysql
 again and promoting it to Master.
 - Stopping again the mysql process results in nothing; the failure is
 not
 detected, corosync takes no action and still sees the node as Master
 and the
 mysql running.
 - The operation monitor is not running after the first failure, as
 there are
 not entries in log of type:  INFO: MySQL monitor succeeded (master).
 - Changing something in configuration results in corosync detecting
 immediately that mysql is not running and promotes it. Also the
 operation
 monitor will run until the first failure and which the same problem
 occurs.
 
 If you need more information let me know. I could attach the log in
 the
 messages files also.

Hey,

This is a known bug and has been resolved in pacemaker 1.1.8.

Here's the related issue. The commits are listed in the comments.
http://bugs.clusterlabs.org/show_bug.cgi?id=5072


-- Vossel

 Thanks for now,
 Radu.
 
 --
 View this message in context:
 http://old.nabble.com/Master-Slave---Master-node-not-monitored-after-a-failure-tp34939865p34939865.html
 Sent from the Linux-HA mailing list archive at Nabble.com.
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker

2012-11-30 Thread David Vossel
- Original Message -
 From: Hermes Flying flyingher...@yahoo.com
 To: linux-ha@lists.linux-ha.org
 Sent: Friday, November 30, 2012 4:04:34 PM
 Subject: [Linux-HA] Some help on understanding how HA issues are addressed
 by pacemaker
 
 Hi,
 I am looking into using your facilities to have high availability on
 my system. I am trying to figure out some things. I hope you guys
 could help me.
 I am interested in knowing how pacemaker migrates a VIP and how a
 splitbrain situation is address by your facilities.
 To be specific: I am interested in the following setup:
 
 2 linux machines. Each machine runs a load balancer and a Tomcat
 instance.
 If I understand correctly pacemaker will be responsible to assign the
 main VIP to one of the nodes.
 
 My questions are:
 1)Will pacemaker monitor/restart the load balancers on each machine
 in case of crash?
 2) How does pacemaker decide to migrate the VIP to the other node?
 3) Do the pacemakers in each machine communicate? If yes how do you
 handle network failure? Could I end up with split-brain?
 4) Generally how is split-brain addressed using pacemaker?
 5) Could pacemaker monitor Tomcat?
 
 As you can see I am interested in maintain quorum in a two-node
 configuration. If you can help me with this info to find a proper
 direction it would be much appreciated!
 
 Thank you


Hey, 

You may or may not have looked at this already, but this is a good place to 
start, http://clusterlabs.org/doc/

Read chapter one of this document.
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html

Running through this 2 node cluster exercise will likely answer many of your 
questions.
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html


-- Vossel
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems