Re: [Linux-ha-dev] Uniquness OCF Parameters
On Wed, Jun 15, 2011 at 04:07:27PM +0200, Florian Haas wrote: On 2011-06-15 15:50, Alan Robertson wrote: On 06/14/2011 06:03 AM, Florian Haas wrote: On 2011-06-14 13:08, Dejan Muhamedagic wrote: Hi Alan, On Mon, Jun 13, 2011 at 10:32:02AM -0600, Alan Robertson wrote: On 06/13/2011 04:12 AM, Simon Talbot wrote: A couple of observations (I am sure there are more) on the uniqueness flag for OCF script parameters: Would it be wise for the for the index parameter of the SFEX ocf script to have its unique flag set to 1 so that the crm tool (and others) would warn if one inadvertantly tried to create two SFEX resource primitives with the same index? Also, an example of the opposite, the Stonith/IPMI script, has parameters such as interface, username and password with their unique flags set to 1, causing erroneous warnings if you use the same interface, username or password for multiple IPMI stonith primitives, which of course if often the case in large clusters? When we designed it, we intended that Unique applies to the complete set of parameters - not to individual parameters. It's like a multi-part unique key. It takes all 3 to create a unique instance (for the example you gave). That makes sense. Does it really? Then what would be the point of having some params that are unique, and some that are not? Or would the tuple of _all_ parameters marked as unique be considered unique? I don't know what you think I said, but A multi-part key to a database is a tuple which consists of all marked parameters. You just said what I said in a different way. So we agree. Jfyi, I was asking a question, not stating an opinion. Hence the use of a question mark. ;-) So then, if the uniqueness should be enforced for a unique key that is comprised of _all_ the parameters marked unique in a parameter set, then what would be the correct way to express required uniqueness of _individual_ parameters? In other words, if I have foo and bar marked unique, then one resource with foo=1 and bar=2, and another with foo=1, bar=3 does not violate the uniqueness constraint. What if I want both foo and bar to be unique in and of themselves, so any duplicate use of foo=1 should be treated as a uniqueness violation? With the current unique=true/false, you cannot express that. Depending on what we chose the meaning to be, parameters marked unique=true would be required to either be all _independently_ unique, or be unique as a tuple. If we want to be able to express both, we need a different markup. Of course, we can move the markup out of the parameter description, into an additional markup, that spells them out, like unique params=foo,bar /unique params=bla /. But using unique=0 as the current non-unique meaning, then unique=small-integer-or-even-named-label-who-cares, would name the scope for this uniqueness requirement, where parameters marked with the same such label would form a unique tuple. Enables us to mark multiple tuples, and individual parameters, at the same time. Question is: do we really want or need that. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 2011-06-16 09:03, Lars Ellenberg wrote: With the current unique=true/false, you cannot express that. Thanks. You learn something every day. :) Depending on what we chose the meaning to be, parameters marked unique=true would be required to either be all _independently_ unique, or be unique as a tuple. If we want to be able to express both, we need a different markup. Of course, we can move the markup out of the parameter description, into an additional markup, that spells them out, like unique params=foo,bar /unique params=bla /. But using unique=0 as the current non-unique meaning, then unique=small-integer-or-even-named-label-who-cares, would name the scope for this uniqueness requirement, where parameters marked with the same such label would form a unique tuple. Enables us to mark multiple tuples, and individual parameters, at the same time. Question is: do we really want or need that. That is a discussion for the updated OCF RA spec discussion, really. And the driver of that discussion is currently submerged. :) Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote: On 2011-06-16 09:03, Lars Ellenberg wrote: With the current unique=true/false, you cannot express that. Thanks. You learn something every day. :) Sorry that I left off the As you are well aware of, introductionary phrase. ;-) I just summarized the problem: Depending on what we chose the meaning to be, parameters marked unique=true would be required to either be all _independently_ unique, or be unique as a tuple. And made a suggestion how to solve it: If we want to be able to express both, we need a different markup. Of course, we can move the markup out of the parameter description, into an additional markup, that spells them out, like unique params=foo,bar /unique params=bla /. But using unique=0 as the current non-unique meaning, then unique=small-integer-or-even-named-label-who-cares, would name the scope for this uniqueness requirement, where parameters marked with the same such label would form a unique tuple. Enables us to mark multiple tuples, and individual parameters, at the same time. If we really think it _is_ a problem. Question is: do we really want or need that. That is a discussion for the updated OCF RA spec discussion, really. And the driver of that discussion is currently submerged. :) I guess this was @LMB? Hey there ... do you read? :) As to stood the test of time, well, no. Not these resource agent parameter hints. Not yet. Especially the unique and type hints have been mostly ignored until now, the type hints are still wrong for some resource agents last time I checked, and also mostly ignored, and the unique hint just starts to throw a warning in crm shell now. So, because these hints have been ignored so far, they have not been tested, not even by time... These hints are also not enforced by the cib (which does not know about them anyways), but only hints to some frontend. And because some frontend now started to at least consider these hints, we are having this discussion now... Lars ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 2011-06-16 10:51, Lars Ellenberg wrote: On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote: On 2011-06-16 09:03, Lars Ellenberg wrote: With the current unique=true/false, you cannot express that. Thanks. You learn something every day. :) Sorry that I left off the As you are well aware of, introductionary phrase. ;-) In case that wasn't clear earlier, I was very much not aware of this. I wasn't being ironic, for a change. :) Question is: do we really want or need that. That is a discussion for the updated OCF RA spec discussion, really. And the driver of that discussion is currently submerged. :) I guess this was @LMB? Hey there ... do you read? :) He is on a diving vacation in Croatia. Not only was I not being ironic; I referred to his literal submersion. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Linux-HA] Announcement for Heartbeat 3.0.5
About two month ago, dealing with a bug report of some paying customer, I fixed some long standing bugs in the heartbeat communication layer that caused heartbeat to segfault, and other bad behaviour. These bugs where triggered by misbehaving API clients, respectively massive packet loss on the communication links, and have been present basically since inception. The changelog does not really look spectacular, but is supposed to very much improve the robustness of the heartbeat communication stack if you experience massive packet loss on all channels, for whatever reason. As these are fixes that affect the heartbeat messaging core, they are relevant for both Pacemaker and haresources style clusters. Changelog: - do not request retransmission of lost messages from dead members - fix segfault due to recursion in api_remove_client_pid - properly cleanup pending delayed rexmit requests before reset of seqtrack - create HA_RSCTMP on start, if necessary - improve detection of pacemaker clusters in init script Tarball: http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/STABLE-3.0.5.tar.bz2 Enjoy! -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Antwort: Re: Patch to ocf:heartbeat:IPaddr2 to check if link status is up
Dejan Muhamedagic schrieb am 15.06.2011 15:40:21: On Tue, Jun 14, 2011 at 07:15:21PM +0200, alexander.kra...@basf.com wrote: Dejan Muhamedagic schrieb am 08.06.2011 18:32:16: Hi Alexander, On Mon, Jun 06, 2011 at 05:42:30PM +0200, alexander.kra...@basf.com wrote: Dejan Muhamedagic schrieb am 04.04.2011 14:35:34: On Fri, Mar 18, 2011 at 04:15:16PM +0100, alexander.kra...@basf.com wrote: Hi, Dejan Muhamedagic schrieb am 18.03.2011 14:31:08: Hi, On Wed, Mar 16, 2011 at 04:58:25PM +0100, Corvus Corax wrote: IPAddr2 puts the interface up on start and down on stop. But its not able to detect an UP or DOWN change in status or monitor. Therefore an ifconfig interface down from a thrird program or a careless administrator would drop the link without pacemaker noticing! Hmm, careless administrator is somewhat of a paradox, right? Really, what was your motivation for this? It makes me wonder, since this RA has existed for many years and so far nobody bothered to test this. Hm, maybe the idea behind is not totally new. Remember this thread: http://lists.community.tummy.com/pipermail/linux-ha-dev/2011-February/018184.html I would go with the remarks of LMB, that this is something closer to the pingd than to Ipaddr2. Isn't the real intention of both post, that you want to know, if your network interface is vital ? Yes. You may use pingd for that, but someone may be concerned to ping the right remote device (also a default-gateway might not be a very static thing in a modern network). My imagination is currently an agent (let's call it ethmonitor) that monitors a network interface with a combination of the fine methods that Robert Euhus has posted in his patch. Than you could define some rules in CIB how to react on the event of a failed network interface. Sure this assumes that you do your heartbeats over more than one interface. It would check: 1. interface link up ? 2. does the RX counter of the interface increase during a certain amout of time ? 3. do I have some other nodes in my arp-cache which I could arping ? 4. maybe retry all checks to overcome short outages If all questions are answered with NO - the interface is dead. I would add my vote for such a feature. Just took a look at the thread you referenced above. Unfortunately, the author didn't get back with the new code after review and short discussion. Now I took the code from Robert in the above referenced thread and put it into a complete new RA. It is based very much on the existing pind agent, but implements the monitoring like discussed above. Great! Please let me know, what you think about it. Does it work? :) Yes, it does. For me in my test environment. :-) I did review your comments and attached a new version of the agent (as it is not in the repository for diffs). Some comments of your comments below. Regards Alex See below for a few comments. Cheers, Dejan Cheers, Alex #!/bin/sh # # OCF Resource Agent compliant script. # Monitor the vitality of a local network interface. # #Based on the work by Robert Euhus and Lars Marowsky-Brée. # # Transfered from Ipaddr2 into ethmonitor by Alexander Krauth # # Copyright (c) 2011 Robert Euhus, Alexander Krauth, Lars Marowsky-Brée #All Rights Reserved. # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it would be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # Further, this software is distributed without any warranty that it is # free of the rightful claim of any third person regarding infringement # or the like. Any license provided herein, whether implied or # otherwise, applies only to this software file. Patent licenses, if # any, provided herein do not apply to combinations of this program with # other software, or any other product whatsoever. # # You should have received a copy of the GNU General Public License # along with this program; if not, write the Free Software Foundation, # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
Re: [Linux-ha-dev] Antwort: Re: Patch to ocf:heartbeat:IPaddr2 to check if link status is up
Hi, I pushed the RA to the repository. Just changed the meta-data a bit to mention the default for the name parameter and removed the check for probe before check_binary (it is not necessary in this case). Many thanks for the contribution! Cheers, Dejan On Thu, Jun 16, 2011 at 04:34:54PM +0200, alexander.kra...@basf.com wrote: Dejan Muhamedagic schrieb am 15.06.2011 15:40:21: On Tue, Jun 14, 2011 at 07:15:21PM +0200, alexander.kra...@basf.com wrote: Dejan Muhamedagic schrieb am 08.06.2011 18:32:16: Hi Alexander, On Mon, Jun 06, 2011 at 05:42:30PM +0200, alexander.kra...@basf.com wrote: Dejan Muhamedagic schrieb am 04.04.2011 14:35:34: On Fri, Mar 18, 2011 at 04:15:16PM +0100, alexander.kra...@basf.com wrote: Hi, Dejan Muhamedagic schrieb am 18.03.2011 14:31:08: Hi, On Wed, Mar 16, 2011 at 04:58:25PM +0100, Corvus Corax wrote: IPAddr2 puts the interface up on start and down on stop. But its not able to detect an UP or DOWN change in status or monitor. Therefore an ifconfig interface down from a thrird program or a careless administrator would drop the link without pacemaker noticing! Hmm, careless administrator is somewhat of a paradox, right? Really, what was your motivation for this? It makes me wonder, since this RA has existed for many years and so far nobody bothered to test this. Hm, maybe the idea behind is not totally new. Remember this thread: http://lists.community.tummy.com/pipermail/linux-ha-dev/2011-February/018184.html I would go with the remarks of LMB, that this is something closer to the pingd than to Ipaddr2. Isn't the real intention of both post, that you want to know, if your network interface is vital ? Yes. You may use pingd for that, but someone may be concerned to ping the right remote device (also a default-gateway might not be a very static thing in a modern network). My imagination is currently an agent (let's call it ethmonitor) that monitors a network interface with a combination of the fine methods that Robert Euhus has posted in his patch. Than you could define some rules in CIB how to react on the event of a failed network interface. Sure this assumes that you do your heartbeats over more than one interface. It would check: 1. interface link up ? 2. does the RX counter of the interface increase during a certain amout of time ? 3. do I have some other nodes in my arp-cache which I could arping ? 4. maybe retry all checks to overcome short outages If all questions are answered with NO - the interface is dead. I would add my vote for such a feature. Just took a look at the thread you referenced above. Unfortunately, the author didn't get back with the new code after review and short discussion. Now I took the code from Robert in the above referenced thread and put it into a complete new RA. It is based very much on the existing pind agent, but implements the monitoring like discussed above. Great! Please let me know, what you think about it. Does it work? :) Yes, it does. For me in my test environment. :-) I did review your comments and attached a new version of the agent (as it is not in the repository for diffs). Some comments of your comments below. Regards Alex See below for a few comments. Cheers, Dejan Cheers, Alex #!/bin/sh # # OCF Resource Agent compliant script. # Monitor the vitality of a local network interface. # #Based on the work by Robert Euhus and Lars Marowsky-Brée. # # Transfered from Ipaddr2 into ethmonitor by Alexander Krauth # # Copyright (c) 2011 Robert Euhus, Alexander Krauth, Lars Marowsky-Brée #All Rights Reserved. # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it would be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # Further, this software is distributed without any warranty that it is # free of the rightful claim of any third person regarding infringement
Re: [Linux-ha-dev] Uniquness OCF Parameters
On 06/16/2011 02:51 AM, Lars Ellenberg wrote: On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote: On 2011-06-16 09:03, Lars Ellenberg wrote: With the current unique=true/false, you cannot express that. Thanks. You learn something every day. :) Sorry that I left off the As you are well aware of, introductionary phrase. ;-) I just summarized the problem: Depending on what we chose the meaning to be, parameters marked unique=true would be required to either be all _independently_ unique, or be unique as a tuple. And made a suggestion how to solve it: If we want to be able to express both, we need a different markup. Of course, we can move the markup out of the parameter description, into an additional markup, that spells them out, likeunique params=foo,bar /unique params=bla /. But using unique=0 as the current non-unique meaning, then unique=small-integer-or-even-named-label-who-cares, would name the scope for this uniqueness requirement, where parameters marked with the same such label would form a unique tuple. Enables us to mark multiple tuples, and individual parameters, at the same time. If we really think it _is_ a problem. If one wanted to, one could say unique=1,3 or unique=1 unique=3 Then parameters which share the same uniqueness list are part of the same uniqueness grouping. Since RAs today normally say unique=1, if one excluded the unique group 0 from being unique, then this could be done in a completely upwards-compatible way for nearly all resources. -- Alan Robertsonal...@unix.sh Openness is the foundation and preservative of friendship... Let me claim from you at all times your undisguised opinions. - William Wilberforce ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Prototypic declaration is insufficient.
Hi all, Because there is not prototypic declaration, in the top of the source of glue, I cannot compile it. diff -r 7d9a54d5da6c main.c --- a/main.cFri Jun 17 18:34:21 2011 +0900 +++ b/main.cFri Jun 17 18:34:55 2011 +0900 @@ -78,6 +78,7 @@ void log_buf(int severity, char *buf); void log_msg(int severity, const char * fmt, ...)G_GNUC_PRINTF(2,3); void trans_log(int priority, const char * fmt, ...)G_GNUC_PRINTF(2,3); +void setup_cl_log(void); static int pil_loglevel_to_syslog_severity[] = { /* Indices: none=0, PIL_FATAL=1, PIL_CRIT=2, PIL_WARN=3, Best Regards, Hideo Yamauch. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-HA] Always Get a Billion Failed Actions
crm_mon on my system displays a lot of failed actions, I guess because the init script for the resource is not fully lsb compliant? In any case, the resources seem to work okay and failover okay. How can I get rid of all those failed actions? crm_mon output follows... Last updated: Thu Jun 16 03:32:32 2011 Stack: Heartbeat Current DC: ha07b.mydomain.com (6080642c-bad3-4bb8-80ba-db6b1f7a0735) - partition with quorum Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677 3 Nodes configured, unknown expected votes 4 Resources configured. Online: [ ha07c.mydomain.com ha07b.mydomain.com ha07a.mydomain.com ] Resource Group: g_clust04 p_fs_clust04 (ocf::heartbeat:Filesystem):Started ha07a.mydomain.com p_vip_clust04 (ocf::heartbeat:IPaddr2): Started ha07a.mydomain.com p_mysql_001(lsb:mysql_001):Started ha07a.mydomain.com p_mysql_230(lsb:mysql_230):Started ha07a.mydomain.com p_mysql_231(lsb:mysql_231):Started ha07a.mydomain.com p_mysql_232(lsb:mysql_232):Started ha07a.mydomain.com p_mysql_233(lsb:mysql_233):Started ha07a.mydomain.com p_mysql_234(lsb:mysql_234):Started ha07a.mydomain.com p_mysql_235(lsb:mysql_235):Started ha07a.mydomain.com p_mysql_236(lsb:mysql_236):Started ha07a.mydomain.com p_mysql_237(lsb:mysql_237):Started ha07a.mydomain.com p_mysql_238(lsb:mysql_238):Started ha07a.mydomain.com p_mysql_239(lsb:mysql_239):Started ha07a.mydomain.com p_mysql_240(lsb:mysql_240):Started ha07a.mydomain.com p_mysql_241(lsb:mysql_241):Started ha07a.mydomain.com p_mysql_242(lsb:mysql_242):Started ha07a.mydomain.com p_mysql_243(lsb:mysql_243):Started ha07a.mydomain.com p_mysql_244(lsb:mysql_244):Started ha07a.mydomain.com p_mysql_245(lsb:mysql_245):Started ha07a.mydomain.com p_mysql_246(lsb:mysql_246):Started ha07a.mydomain.com p_mysql_247(lsb:mysql_247):Started ha07a.mydomain.com p_mysql_248(lsb:mysql_248):Started ha07a.mydomain.com p_mysql_249(lsb:mysql_249):Started ha07a.mydomain.com p_mysql_250(lsb:mysql_250):Started ha07a.mydomain.com p_mysql_251(lsb:mysql_251):Started ha07a.mydomain.com p_mysql_252(lsb:mysql_252):Started ha07a.mydomain.com p_mysql_253(lsb:mysql_253):Started ha07a.mydomain.com p_mysql_254(lsb:mysql_254):Started ha07a.mydomain.com p_mysql_255(lsb:mysql_255):Started ha07a.mydomain.com p_mysql_256(lsb:mysql_256):Started ha07a.mydomain.com p_mysql_257(lsb:mysql_257):Started ha07a.mydomain.com p_mysql_258(lsb:mysql_258):Started ha07a.mydomain.com p_mysql_259(lsb:mysql_259):Started ha07a.mydomain.com p_mysql_260(lsb:mysql_260):Started ha07a.mydomain.com p_mysql_261(lsb:mysql_261):Started ha07a.mydomain.com p_mysql_262(lsb:mysql_262):Started ha07a.mydomain.com p_mysql_263(lsb:mysql_263):Started ha07a.mydomain.com p_mysql_264(lsb:mysql_264):Started ha07a.mydomain.com p_mysql_265(lsb:mysql_265):Started ha07a.mydomain.com p_mysql_266(lsb:mysql_266):Started ha07a.mydomain.com p_mysql_267(lsb:mysql_267):Started ha07a.mydomain.com p_mysql_268(lsb:mysql_268):Started ha07a.mydomain.com p_mysql_269(lsb:mysql_269):Started ha07a.mydomain.com p_mysql_270(lsb:mysql_270):Started ha07a.mydomain.com p_mysql_271(lsb:mysql_271):Started ha07a.mydomain.com p_mysql_272(lsb:mysql_272):Started ha07a.mydomain.com p_mysql_273(lsb:mysql_273):Started ha07a.mydomain.com p_mysql_274(lsb:mysql_274):Started ha07a.mydomain.com p_mysql_275(lsb:mysql_275):Started ha07a.mydomain.com p_mysql_276(lsb:mysql_276):Started ha07a.mydomain.com p_mysql_277(lsb:mysql_277):Started ha07a.mydomain.com p_mysql_009(lsb:mysql_009):Started ha07a.mydomain.com p_mysql_021(lsb:mysql_021):Started ha07a.mydomain.com p_mysql_052(lsb:mysql_052):Started ha07a.mydomain.com p_mysql_138(lsb:mysql_138):Started ha07a.mydomain.com p_mysql_278(lsb:mysql_278):Started ha07a.mydomain.com p_mysql_279(lsb:mysql_279):Started ha07a.mydomain.com p_mysql_280
[Linux-HA] Announcement for Heartbeat 3.0.5
About two month ago, dealing with a bug report of some paying customer, I fixed some long standing bugs in the heartbeat communication layer that caused heartbeat to segfault, and other bad behaviour. These bugs where triggered by misbehaving API clients, respectively massive packet loss on the communication links, and have been present basically since inception. The changelog does not really look spectacular, but is supposed to very much improve the robustness of the heartbeat communication stack if you experience massive packet loss on all channels, for whatever reason. As these are fixes that affect the heartbeat messaging core, they are relevant for both Pacemaker and haresources style clusters. Changelog: - do not request retransmission of lost messages from dead members - fix segfault due to recursion in api_remove_client_pid - properly cleanup pending delayed rexmit requests before reset of seqtrack - create HA_RSCTMP on start, if necessary - improve detection of pacemaker clusters in init script Tarball: http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/STABLE-3.0.5.tar.bz2 Enjoy! -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] heartbeat step down after split brain scenario
I have a two node cluster using heartbeat and haproxy. Unfortunately it is impossible to provide redundant heartbeat paths between the two nodes at different sites so it is possible for a failure to cause split brain. To evaluate the impact I tried disconnecting the two nodes and I found that both become active and both try to keep the VIPs after the link is restored. Is this avoidable using the auto_failback option? -- View this message in context: http://old.nabble.com/heartbeat-step-down-after-split-brain-scenario-tp31858728p31858728.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] resource agents 3.9.1 final release
Hi everybody, The current resource agent repository [1] has been tagged to v3.9.1. Tarballs are also available [2]. This is the very first release of the common resource agent repository. It is a big milestone towards eliminating duplication of effort with the goal of improving the overall quality and user experience. There is still a long way to go but the first stone has been laid down. Highlights for the LHA resource agents set: - lxc, symlink: new resource agents - db2: major rewrite and support for master/slave mode of operation - exportfs: backup/restore of rmtab is back - mysql: multiple improvements for master/slave and replication - ocft: new tests for pgsql, postfix, and iscsi - CTDB: minor bug fixes - pgsql: improve configuration check and probe handling Highlights for the rgmanager resource agents set: - oracledb: use shutdown immediate - tomcat5: fix generated XML - nfsclient: fix client name mismatch - halvm: fix mirror dev failure - nfs: fix selinux integration Several changes have been made to the build system and the spec file to accommodate both projects´ needs. The most noticeable change is the option to select all, linux-ha or rgmanager resource agents at configuration time, which will also set the default for the spec file. Also several improvements have been made to correctly build srpm/rpms on different distributions in different versions. The full list of changes is available in the ChangeLog file for users, and in an auto-generated git-to-changelog file called ChangeLog.devel. NOTE: About the 3.9.x version (particularly for linux-ha folks): This version was chosen simply because the rgmanager set was already at 3.1.x. In order to make it easier for distribution, and to keep package upgrades linear, we decided to bump the number higher than both projects. There is no other special meaning associated with it. Many thanks to everybody who helped with this release, in particular to the numerous contributors. Without you, the release would certainly not be possible. Cheers, The RAS Tribe [1] https://github.com/ClusterLabs/resource-agents/tarball/v3.9.1 [2] https://fedorahosted.org/releases/r/e/resource-agents/ PS: I am absolutely sure that URL [2] might give some people a fit, but we are still working to get a common release area. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] resource agents 3.9.1 final release
On Thu, Jun 16, 2011 at 3:13 PM, Fabio M. Di Nitto wrote: Highlights for the rgmanager resource agents set: - oracledb: use shutdown immediate hello, from oracledb.sh.in I can see this actually is not a configurable parameter, so that I cannot choose between immediate and abort, and I think it is not the best change. faction Stopping Oracle Database: stop_db immediate if [ $? -ne 0 ]; then faction Stopping Oracle Database (hard): stop_db abort || return 1 fi There are situations where an occurring problem could let a DB stuck on shutdown immediate, preventing completion of the command itself so you will never arrive to the error code condition to try the abort option... And also: SHUTDOWN IMMEDIATE No new connections are allowed, nor are new transactions allowed to be started, after the statement is issued. Any uncommitted transactions are rolled back. (If long uncommitted transactions exist, this method of shutdown might not complete quickly, despite its name.) Oracle does not wait for users currently connected to the database to disconnect. Oracle implicitly rolls back active transactions and disconnects all connected users. it is true that in case of shutdown abort you have anyway to rollback too, during the following crash recovery of startup phase, but I'd prefer to do this on the node where I'm going to land to and not on the node that I'm leaving (possibly because of a problem). In my opinion the only situation where immediate is better is for planned maintenance. Just my opininon. Keep on with the good job Gianluca ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat step down after split brain scenario
On 06/16/2011 04:28 AM, Jack Berg wrote: I have a two node cluster using heartbeat and haproxy. Unfortunately it is impossible to provide redundant heartbeat paths between the two nodes at different sites so it is possible for a failure to cause split brain. To evaluate the impact I tried disconnecting the two nodes and I found that both become active and both try to keep the VIPs after the link is restored. What do you mean by disconnecting: what's your failure scenario and how do you expect it to be handled? Running daemons are not guaranteed (arguably, expected) to notice when the network cable is unplugged. You have to monitor the link and restart all processes that bind()/listen() on the interface. If your nodes are at different sites, you need to also deal with the loss of link at the switch, gateway, etc., and figure out which one is still connected to the Internet -- and gets to keep the VIP. Which in general can't be done from the nodes themselves. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] resource agents 3.9.1 final release
On Jun 16, 2011, at 9:13 AM, Fabio M. Di Nitto wrote: The current resource agent repository [1] has been tagged to v3.9.1. Tarballs are also available [2]. Is there an instruction anywhere how to make rpm out of it or to compile it in general? http://www.clusterlabs.org/wiki/Install#Resource_Agents is obsolete, I imagine. Thank you, Vadym ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems