Re: [Linux-ha-dev] Dovecot OCF Resource Agent
Hi Jeroen, On Fri, Jul 22, 2011 at 10:51:56AM +0200, jer...@intuxicated.org wrote: On Fri, 15 Apr 2011 14:45:59 +0200, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 04/15/2011 01:19 PM, Andrew Beekhof wrote: On Fri, Apr 15, 2011 at 12:53 PM, Raoul Bhatia [IPAX] r.bha...@ipax.at wrote: On 04/15/2011 11:10 AM, jer...@intuxicated.org wrote: Yes, it does the same thing but contains some additional features, like logging into a mailbox. first of all, i do not know how the others think about a ocf ra implemented in c. i'll suggest waiting for comments from dejan or fghass. the ipv6addr agent was written in C too the OCF standard does not dictate the language to be used - its really a matter of whether C is the best tool for this job thank you andrew! jeroen, can you please create a github fork off https://github.com/ClusterLabs/ (it's really easy!) and add your resource agent in the same fashion as IPv6addr.c [1] ? thanks, raoul [1] https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/IPv6addr.c Hi, I finally found some time to get the code on GitHub. https://github.com/perrit/dovecot-ocf-resource-agent As you can see it's kind of hard to merge the code in the same way as IPv6addr.c as it currently spans multiple files. Would you like me to just put it in a directory? Maybe it's a good idea to split the dovecot part and the mailbox login part, so that there's a mailbox login resource agent becomes more like the ping resource agent? I really hate to say it, since you obviously invested quite a bit of time to put together this agent, but C is arguably not the best suited programming language for resource agents. I guess that's why all init scripts are, well, shell scripts. And all but one of our OCF resource agents. The code is around 4kloc, which is as big as some of our subsystems. That's a lot of code to read and maintain. Was there a good reason to choose C for the implementation? Cheers, Dejan Regards, Jeroen ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] location and orders : Question about a behavior ...
Hi, On Tue, Aug 2, 2011 at 6:06 PM, alain.mou...@bull.net wrote: Hi I have this simple configuration of locations and orders between resources group-1 , group-2 and clone-1 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) : location loc1-group-1 group-1 +100: node2 location loc1-group-2 group-2 +100: node3 order order-group-1 inf: group-1 clone-1 order order-group-2 inf: group-2 clone-1 property $id=cib-bootstrap-options \ dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=true \ no-quorum-policy=ignore \ default-resource-stickiness=5000 \ I use it as: rsc_defaults $id=rsc-options \ resource-stickiness=1000 Instead of: property $id=cib-bootstrap-options \ default-resource-stickiness=5000 And the behavior is the expected one, no failback. HTH, Dan (and no current cli- preferences) When I stop the node2, the group-1 is well migrated on node3 But when node2 is up again, and that I start Pacemaker again on node2, the group-1 automatically comes back on node2 , and I wonder why ? I have other similar configuration with same location constraints and same default-resource-stickiness value, but without order with a clone resource, and the group does not come back automatically. But I don't understand why this order constraint would change this behavior ... Thanks for your help Alain Moullé ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] location and orders : Question about a behavior ...
Hi Thanks I don't think the 1000 or 5000 value makes any difference, so the rsc_options could make it work ? But do you have also the order with a clone ? Because on other of my configurations, I have also property $id=cib-bootstrap-options \ default-resource-stickiness=5000 and the resource does not failback automatically ... so ... Could somebody explain ? Thanks Alain De :Dan Frincu df.clus...@gmail.com A : General Linux-HA mailing list linux-ha@lists.linux-ha.org Date : 03/08/2011 13:00 Objet : Re: [Linux-HA] location and orders : Question about a behavior ... Envoyé par :linux-ha-boun...@lists.linux-ha.org Hi, On Tue, Aug 2, 2011 at 6:06 PM, alain.mou...@bull.net wrote: Hi I have this simple configuration of locations and orders between resources group-1 , group-2 and clone-1 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) : location loc1-group-1 group-1 +100: node2 location loc1-group-2 group-2 +100: node3 order order-group-1 inf: group-1 clone-1 order order-group-2 inf: group-2 clone-1 property $id=cib-bootstrap-options \ dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=true \ no-quorum-policy=ignore \ default-resource-stickiness=5000 \ I use it as: rsc_defaults $id=rsc-options \ resource-stickiness=1000 Instead of: property $id=cib-bootstrap-options \ default-resource-stickiness=5000 And the behavior is the expected one, no failback. HTH, Dan (and no current cli- preferences) When I stop the node2, the group-1 is well migrated on node3 But when node2 is up again, and that I start Pacemaker again on node2, the group-1 automatically comes back on node2 , and I wonder why ? I have other similar configuration with same location constraints and same default-resource-stickiness value, but without order with a clone resource, and the group does not come back automatically. But I don't understand why this order constraint would change this behavior ... Thanks for your help Alain Moullé ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] location and orders : Question about a behavior ...
Hi, On Wed, Aug 3, 2011 at 2:22 PM, alain.mou...@bull.net wrote: Hi Thanks I don't think the 1000 or 5000 value makes any difference, The values make little difference, it's about having a higher score atm. so the rsc_options could make it work ? Yes, I believe so. But do you have also the order with a clone ? No. Because on other of my configurations, I have also property $id=cib-bootstrap-options \ default-resource-stickiness=5000 and the resource does not failback automatically ... so ... Could somebody explain ? Try the following: crm_verify -L 21 | grep stick And see what scores (weights) are given to resources. Based on these weights it might make more sense. HTH, Dan Thanks Alain De : Dan Frincu df.clus...@gmail.com A : General Linux-HA mailing list linux-ha@lists.linux-ha.org Date : 03/08/2011 13:00 Objet : Re: [Linux-HA] location and orders : Question about a behavior ... Envoyé par : linux-ha-boun...@lists.linux-ha.org Hi, On Tue, Aug 2, 2011 at 6:06 PM, alain.mou...@bull.net wrote: Hi I have this simple configuration of locations and orders between resources group-1 , group-2 and clone-1 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) : location loc1-group-1 group-1 +100: node2 location loc1-group-2 group-2 +100: node3 order order-group-1 inf: group-1 clone-1 order order-group-2 inf: group-2 clone-1 property $id=cib-bootstrap-options \ dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=true \ no-quorum-policy=ignore \ default-resource-stickiness=5000 \ I use it as: rsc_defaults $id=rsc-options \ resource-stickiness=1000 Instead of: property $id=cib-bootstrap-options \ default-resource-stickiness=5000 And the behavior is the expected one, no failback. HTH, Dan (and no current cli- preferences) When I stop the node2, the group-1 is well migrated on node3 But when node2 is up again, and that I start Pacemaker again on node2, the group-1 automatically comes back on node2 , and I wonder why ? I have other similar configuration with same location constraints and same default-resource-stickiness value, but without order with a clone resource, and the group does not come back automatically. But I don't understand why this order constraint would change this behavior ... Thanks for your help Alain Moullé ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] location and orders : Question about a behavior ...
Hi, knowing that res1-[1-3] are in group-1 and res2-[1-3] are in group-2, the crm_verify -L 21 | grep stick displays: debug: unpack_config: Default stickiness: 5000 debug: common_apply_stickiness: Resource clone-1:0: preferring current location (node=node2, weight=1) debug: common_apply_stickiness: Resource res1-1: preferring current location (node=node2, weight=5000) debug: common_apply_stickiness: Resource res1-2: preferring current location (node=node2, weight=5000) debug: common_apply_stickiness: Resource res1-3: preferring current location (node=node2, weight=5000) debug: common_apply_stickiness: Resource clone:1: preferring current location (node=node3, weight=1) debug: common_apply_stickiness: Resource res2-1: preferring current location (node=node3, weight=5000) debug: common_apply_stickiness: Resource res2-2: preferring current location (node=node3, weight=5000) debug: common_apply_stickiness: Resource res2-3: preferring current location (node=node3, weight=5000) but I don't know how to make conclusions of this information ... Alain De :Dan Frincu df.clus...@gmail.com A : General Linux-HA mailing list linux-ha@lists.linux-ha.org Date : 03/08/2011 13:28 Objet : Re: [Linux-HA] location and orders : Question about a behavior ... Envoyé par :linux-ha-boun...@lists.linux-ha.org Hi, On Wed, Aug 3, 2011 at 2:22 PM, alain.mou...@bull.net wrote: Hi Thanks I don't think the 1000 or 5000 value makes any difference, The values make little difference, it's about having a higher score atm. so the rsc_options could make it work ? Yes, I believe so. But do you have also the order with a clone ? No. Because on other of my configurations, I have also property $id=cib-bootstrap-options \ default-resource-stickiness=5000 and the resource does not failback automatically ... so ... Could somebody explain ? Try the following: crm_verify -L 21 | grep stick And see what scores (weights) are given to resources. Based on these weights it might make more sense. HTH, Dan Thanks Alain De :Dan Frincu df.clus...@gmail.com A : General Linux-HA mailing list linux-ha@lists.linux-ha.org Date : 03/08/2011 13:00 Objet : Re: [Linux-HA] location and orders : Question about a behavior ... Envoyé par :linux-ha-boun...@lists.linux-ha.org Hi, On Tue, Aug 2, 2011 at 6:06 PM, alain.mou...@bull.net wrote: Hi I have this simple configuration of locations and orders between resources group-1 , group-2 and clone-1 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) : location loc1-group-1 group-1 +100: node2 location loc1-group-2 group-2 +100: node3 order order-group-1 inf: group-1 clone-1 order order-group-2 inf: group-2 clone-1 property $id=cib-bootstrap-options \ dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=true \ no-quorum-policy=ignore \ default-resource-stickiness=5000 \ I use it as: rsc_defaults $id=rsc-options \ resource-stickiness=1000 Instead of: property $id=cib-bootstrap-options \ default-resource-stickiness=5000 And the behavior is the expected one, no failback. HTH, Dan (and no current cli- preferences) When I stop the node2, the group-1 is well migrated on node3 But when node2 is up again, and that I start Pacemaker again on node2, the group-1 automatically comes back on node2 , and I wonder why ? I have other similar configuration with same location constraints and same default-resource-stickiness value, but without order with a clone resource, and the group does not come back automatically. But I don't understand why this order constraint would change this behavior ... Thanks for your help Alain Moullé ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] The active trap of the SNMP is delayed.
Hi Hideo, On 08/02/11 09:14, renayama19661...@ybb.ne.jp wrote: Hi Yan, I confirmed that a trap was transmitted with a patch definitely. OK, thanks! We request that we apply a patch to each pacemaker-mgmt of pacemaker1.0 and pacemaker1.1. Pushed. Since we don't have a separate branch, you might need to back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with pacemaker-1.0.x * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new version is released.( pacemaker-mgmt-2.1.0 ? ) We'll probably tag a new version in the near future. Regards, Gao,Yan -- Gao,Yan y...@suse.com Software Engineer China Server Team, SUSE. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] location and orders : Question about a behavior ...
Hi, On Wed, Aug 3, 2011 at 3:00 PM, alain.mou...@bull.net wrote: Hi, knowing that res1-[1-3] are in group-1 and res2-[1-3] are in group-2, the crm_verify -L 21 | grep stick displays: debug: unpack_config: Default stickiness: 5000 debug: common_apply_stickiness: Resource clone-1:0: preferring current location (node=node2, weight=1) debug: common_apply_stickiness: Resource res1-1: preferring current location (node=node2, weight=5000) debug: common_apply_stickiness: Resource res1-2: preferring current location (node=node2, weight=5000) debug: common_apply_stickiness: Resource res1-3: preferring current location (node=node2, weight=5000) debug: common_apply_stickiness: Resource clone:1: preferring current location (node=node3, weight=1) debug: common_apply_stickiness: Resource res2-1: preferring current location (node=node3, weight=5000) debug: common_apply_stickiness: Resource res2-2: preferring current location (node=node3, weight=5000) debug: common_apply_stickiness: Resource res2-3: preferring current location (node=node3, weight=5000) but I don't know how to make conclusions of this information ... Well, this isn't the only way to obtain information on score allocation, there is also ptest -saL and the crm_verify, adding -V's to each would increase the verbosity of the output. Anyway, you may have a case where the score for a group on a node is higher than the default stickiness value, therefore the failback occurs. Use this script to get a better idea of what scores are assigned to resources and then see what's causing this behavior. http://hg.clusterlabs.org/pacemaker/1.1/raw-file/01e86afaaa6d/extra/showscores.sh Regards, Dan Alain De : Dan Frincu df.clus...@gmail.com A : General Linux-HA mailing list linux-ha@lists.linux-ha.org Date : 03/08/2011 13:28 Objet : Re: [Linux-HA] location and orders : Question about a behavior ... Envoyé par : linux-ha-boun...@lists.linux-ha.org Hi, On Wed, Aug 3, 2011 at 2:22 PM, alain.mou...@bull.net wrote: Hi Thanks I don't think the 1000 or 5000 value makes any difference, The values make little difference, it's about having a higher score atm. so the rsc_options could make it work ? Yes, I believe so. But do you have also the order with a clone ? No. Because on other of my configurations, I have also property $id=cib-bootstrap-options \ default-resource-stickiness=5000 and the resource does not failback automatically ... so ... Could somebody explain ? Try the following: crm_verify -L 21 | grep stick And see what scores (weights) are given to resources. Based on these weights it might make more sense. HTH, Dan Thanks Alain De : Dan Frincu df.clus...@gmail.com A : General Linux-HA mailing list linux-ha@lists.linux-ha.org Date : 03/08/2011 13:00 Objet : Re: [Linux-HA] location and orders : Question about a behavior ... Envoyé par : linux-ha-boun...@lists.linux-ha.org Hi, On Tue, Aug 2, 2011 at 6:06 PM, alain.mou...@bull.net wrote: Hi I have this simple configuration of locations and orders between resources group-1 , group-2 and clone-1 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) : location loc1-group-1 group-1 +100: node2 location loc1-group-2 group-2 +100: node3 order order-group-1 inf: group-1 clone-1 order order-group-2 inf: group-2 clone-1 property $id=cib-bootstrap-options \ dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ stonith-enabled=true \ no-quorum-policy=ignore \ default-resource-stickiness=5000 \ I use it as: rsc_defaults $id=rsc-options \ resource-stickiness=1000 Instead of: property $id=cib-bootstrap-options \ default-resource-stickiness=5000 And the behavior is the expected one, no failback. HTH, Dan (and no current cli- preferences) When I stop the node2, the group-1 is well migrated on node3 But when node2 is up again, and that I start Pacemaker again on node2, the group-1 automatically comes back on node2 , and I wonder why ? I have other similar configuration with same location constraints and same default-resource-stickiness value, but without order with a clone resource, and the group does not come back automatically. But I don't understand why this order constraint would change this behavior ... Thanks for your help Alain Moullé ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] About cli-preference
Hi When we do a crm resource migrate resource-name, crm add a cli-preference in the configuration for this resource. I wonder if there is a way to tell Pacemaker that, once the resource is running on the new target node, it could automatically remove the cli-preference ? or for example after a given life-time for this cli-preference ? (knowing that we have a configuration with resource-stickiness etc. which avoids the automatic failback of the resource when the cli-preference is removed) Thanks Alain Moullé ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] About cli-preference
Hi, On Wed, Aug 3, 2011 at 5:23 PM, alain.mou...@bull.net wrote: Hi When we do a crm resource migrate resource-name, crm add a cli-preference in the configuration for this resource. I wonder if there is a way to tell Pacemaker that, once the resource is running on the new target node, it could automatically remove the cli-preference ? or for example after a given life-time for this cli-preference ? # crm resource help migrate Migrate a resource to a different node. If node is left out, the resource is migrated by creating a constraint which prevents it from running on the current node. Additionally, you may specify a lifetime for the constraint---once it expires, the location constraint will no longer be active. Usage: ... migrate rsc [node] [lifetime] ... You can specify a lifetime. HTH, Dan (knowing that we have a configuration with resource-stickiness etc. which avoids the automatic failback of the resource when the cli-preference is removed) Thanks Alain Moullé ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dan Frincu CCNA, RHCE ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Heartbeat Restart is not same as Stop and Start
Hi, Our system setup: Heartbeat 3.0.3 DRBD (to manage file system and it is one of the resource managed by CRM) Redhat Linux Pacemaker We have built an application on top of Linux-HA for users to configure cluster by giving IP addresses of the nodes, do operations like Restart system, Change host names, Resolve split-brain scenario etc. In our application, we ran into problem when we do heartbeat restart for some operation and then when user does Restart System which internally runs the command shutdown -r now. I believe this due to heartbeat lsb script and I have explained the scenario below. Problem: In the heartbeat lsb script, restart does not remove and touches the heartbeat lock file. On, heartbeat start, the lsb script starts heartbeat and touches /var/lock/subsys/heartbeat lock file. On, heartbeat stop, the lsb script stops heartbeat and removes the lock file at /var/lock/subsys/heartbeat. On, heartbeat restart, the lsb script stops heartbeat and starts heartbeat. But DOES NOT remove or touches the lock file. We call heartbeat restart instead of heartbeat start through our script because we are not sure whether heartbeat is already running or not. So when heartbeat restart is called when heartbeat is NOT running, heartbeat lsb script tries to stop but its not running so it just starts heartbeat BUT after starting, heartbeat lock file is not touched (because of restart in heartbeat lsb). So now, in the system heartbeat is running (can verify this by looking for heartbeat process or heartbeat status command) but there is no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux kernal to know what all process it has to stop when it shuts down (shutdown -r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not running (because there is no lock file) and does not stop heartbeat properly. When it comes back up, heartbeat is started but heartbeat state is not correct (because it was not stopped properly). Due to this, this node is identifies as Primary though the erstwhile Secondary node has become Primary now and this causes split-brain. So I believe, heartbeat restart should do exactly as heartbeat stop and heartbeat start which is not the case now. Can you please let me know if my understanding is correct and it is a bug in Heartbeat lsb script? Thanks for looking into it. I have given below the relevant code from heartbeat lsb script as well File: /etc/init.d/heartbeat start) RunStartStop pre-start StartHA RC=$? echo if [ $RC -eq 0 ] then [ ! -d $LOCKDIR ] mkdir -p $LOCKDIR touch $LOCKDIR/$SUBSYS fi RunStartStop post-start $RC ;; stop) RunStartStop pre-stop StopHA RC=$? echo if [ $RC -eq 0 ] then rm -f $LOCKDIR/$SUBSYS fi RunStartStop post-stop $RC ;; restart) sleeptime=`ha_parameter deadtime` StopHA echo echo -n Waiting to allow resource takeover to complete: sleep $sleeptime sleep 10 # allow resource takeover to complete (hopefully). echo_success echo StartHA echo ;; ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start
Permission problem perhaps? Not really sure what you're doing but the fact that you have users configuring the cluster (why do you do this btw?) may be pointing to a permission issue. -mgb On 11-08-03 06:57 PM, Rahul Kanna wrote: Hi, Our system setup: Heartbeat 3.0.3 DRBD (to manage file system and it is one of the resource managed by CRM) Redhat Linux Pacemaker We have built an application on top of Linux-HA for users to configure cluster by giving IP addresses of the nodes, do operations like Restart system, Change host names, Resolve split-brain scenario etc. In our application, we ran into problem when we do heartbeat restart for some operation and then when user does Restart System which internally runs the command shutdown -r now. I believe this due to heartbeat lsb script and I have explained the scenario below. Problem: In the heartbeat lsb script, restart does not remove and touches the heartbeat lock file. On, heartbeat start, the lsb script starts heartbeat and touches /var/lock/subsys/heartbeat lock file. On, heartbeat stop, the lsb script stops heartbeat and removes the lock file at /var/lock/subsys/heartbeat. On, heartbeat restart, the lsb script stops heartbeat and starts heartbeat. But DOES NOT remove or touches the lock file. We call heartbeat restart instead of heartbeat start through our script because we are not sure whether heartbeat is already running or not. So when heartbeat restart is called when heartbeat is NOT running, heartbeat lsb script tries to stop but its not running so it just starts heartbeat BUT after starting, heartbeat lock file is not touched (because of restart in heartbeat lsb). So now, in the system heartbeat is running (can verify this by looking for heartbeat process or heartbeat status command) but there is no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux kernal to know what all process it has to stop when it shuts down (shutdown -r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not running (because there is no lock file) and does not stop heartbeat properly. When it comes back up, heartbeat is started but heartbeat state is not correct (because it was not stopped properly). Due to this, this node is identifies as Primary though the erstwhile Secondary node has become Primary now and this causes split-brain. So I believe, heartbeat restart should do exactly as heartbeat stop and heartbeat start which is not the case now. Can you please let me know if my understanding is correct and it is a bug in Heartbeat lsb script? Thanks for looking into it. I have given below the relevant code from heartbeat lsb script as well File: /etc/init.d/heartbeat start) RunStartStop pre-start StartHA RC=$? echo if [ $RC -eq 0 ] then [ ! -d $LOCKDIR ] mkdir -p $LOCKDIR touch $LOCKDIR/$SUBSYS fi RunStartStop post-start $RC ;; stop) RunStartStop pre-stop StopHA RC=$? echo if [ $RC -eq 0 ] then rm -f $LOCKDIR/$SUBSYS fi RunStartStop post-stop $RC ;; restart) sleeptime=`ha_parameter deadtime` StopHA echo echo -n Waiting to allow resource takeover to complete: sleep $sleeptime sleep 10 # allow resource takeover to complete (hopefully). echo_success echo StartHA echo ;; ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] The active trap of the SNMP is delayed.
Hi Yan, Pushed. Since we don't have a separate branch, you might need to back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with pacemaker-1.0.x Thanks!! However, we need the release of pacemaker-mgmt for Pacemaker1.0. Is it impossible you apply a patch to a repository of pacemaker-mgmt-2.0.0, and to release? * http://hg.clusterlabs.org/pacemaker/pygui/rev/18332eae086e * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new version is released.( pacemaker-mgmt-2.1.0 ? ) We'll probably tag a new version in the near future. Ok. Beat Regards, Hideo Yamauchi. --- On Wed, 2011/8/3, Gao,Yan y...@novell.com wrote: Hi Hideo, On 08/02/11 09:14, renayama19661...@ybb.ne.jp wrote: Hi Yan, I confirmed that a trap was transmitted with a patch definitely. OK, thanks! We request that we apply a patch to each pacemaker-mgmt of pacemaker1.0 and pacemaker1.1. Pushed. Since we don't have a separate branch, you might need to back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with pacemaker-1.0.x * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new version is released.( pacemaker-mgmt-2.1.0 ? ) We'll probably tag a new version in the near future. Regards, Gao,Yan -- Gao,Yan y...@suse.com Software Engineer China Server Team, SUSE. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems