Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-16 Thread Lars Ellenberg
On Wed, Jun 15, 2011 at 04:07:27PM +0200, Florian Haas wrote:
 On 2011-06-15 15:50, Alan Robertson wrote:
  On 06/14/2011 06:03 AM, Florian Haas wrote:
  On 2011-06-14 13:08, Dejan Muhamedagic wrote:
  Hi Alan,
 
  On Mon, Jun 13, 2011 at 10:32:02AM -0600, Alan Robertson wrote:
  On 06/13/2011 04:12 AM, Simon Talbot wrote:
  A couple of observations (I am sure there are more) on the uniqueness 
  flag for OCF script parameters:
 
  Would it be wise for the for the index parameter of the SFEX ocf script 
  to have its unique flag set to 1 so that the crm tool (and others) 
  would warn if one inadvertantly tried to create two SFEX resource 
  primitives with the same index?
 
  Also, an example of the opposite, the Stonith/IPMI script, has 
  parameters such as interface, username and password with their unique 
  flags set to 1, causing erroneous warnings if you use the same 
  interface, username or password for multiple IPMI stonith primitives, 
  which of course if often the case in large clusters?
 
  When we designed it, we intended that Unique applies to the complete set
  of parameters - not to individual parameters.  It's like a multi-part
  unique key.  It takes all 3 to create a unique instance (for the example
  you gave).
  That makes sense.
  Does it really? Then what would be the point of having some params that
  are unique, and some that are not? Or would the tuple of _all_
  parameters marked as unique be considered unique?
 
  I don't know what you think I said, but A multi-part key to a database 
  is a tuple which consists of all marked parameters.  You just said what 
  I said in a different way.
  
  So we agree.
 
 Jfyi, I was asking a question, not stating an opinion. Hence the use of
 a question mark.

;-)

 So then, if the uniqueness should be enforced for a unique key that is
 comprised of _all_ the parameters marked unique in a parameter set, then
 what would be the correct way to express required uniqueness of
 _individual_ parameters?
 
 In other words, if I have foo and bar marked unique, then one resource
 with foo=1 and bar=2, and another with foo=1, bar=3 does not violate the
 uniqueness constraint. What if I want both foo and bar to be unique in
 and of themselves, so any duplicate use of foo=1 should be treated as a
 uniqueness violation?

With the current unique=true/false, you cannot express that.

Depending on what we chose the meaning to be,
parameters marked unique=true would be required to
  either be all _independently_ unique,
  or be unique as a tuple.

If we want to be able to express both, we need a different markup.

Of course, we can move the markup out of the parameter description,
into an additional markup, that spells them out,
like unique params=foo,bar /unique params=bla /.

But using unique=0 as the current non-unique meaning, then
unique=small-integer-or-even-named-label-who-cares, would
name the scope for this uniqueness requirement,
where parameters marked with the same such label
would form a unique tuple.
Enables us to mark multiple tuples, and individual parameters,
at the same time.

Question is: do we really want or need that.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-16 Thread Florian Haas
On 2011-06-16 09:03, Lars Ellenberg wrote:
 With the current unique=true/false, you cannot express that.

Thanks. You learn something every day. :)

 Depending on what we chose the meaning to be,
 parameters marked unique=true would be required to
   either be all _independently_ unique,
   or be unique as a tuple.
 
 If we want to be able to express both, we need a different markup.
 
 Of course, we can move the markup out of the parameter description,
 into an additional markup, that spells them out,
 like unique params=foo,bar /unique params=bla /.
 
 But using unique=0 as the current non-unique meaning, then
 unique=small-integer-or-even-named-label-who-cares, would
 name the scope for this uniqueness requirement,
 where parameters marked with the same such label
 would form a unique tuple.
 Enables us to mark multiple tuples, and individual parameters,
 at the same time.
 
 Question is: do we really want or need that.

That is a discussion for the updated OCF RA spec discussion, really. And
the driver of that discussion is currently submerged. :)

Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-16 Thread Lars Ellenberg
On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote:
 On 2011-06-16 09:03, Lars Ellenberg wrote:
  With the current unique=true/false, you cannot express that.
 
 Thanks. You learn something every day. :)

Sorry that I left off the As you are well aware of,
introductionary phrase. ;-)

I just summarized the problem:

  Depending on what we chose the meaning to be,
  parameters marked unique=true would be required to
either be all _independently_ unique,
or be unique as a tuple.

And made a suggestion how to solve it:

  If we want to be able to express both, we need a different markup.
  
  Of course, we can move the markup out of the parameter description,
  into an additional markup, that spells them out,
  like unique params=foo,bar /unique params=bla /.
  
  But using unique=0 as the current non-unique meaning, then
  unique=small-integer-or-even-named-label-who-cares, would
  name the scope for this uniqueness requirement,
  where parameters marked with the same such label
  would form a unique tuple.
  Enables us to mark multiple tuples, and individual parameters,
  at the same time.

If we really think it _is_ a problem.

  Question is: do we really want or need that.
 
 That is a discussion for the updated OCF RA spec discussion, really. And
 the driver of that discussion is currently submerged. :)

I guess this was @LMB?
Hey there ... do you read? :)

As to stood the test of time,
well, no. Not these resource agent parameter hints.
Not yet.

Especially the unique and type hints have been mostly ignored until now,
the type hints are still wrong for some resource agents last
time I checked, and also mostly ignored, and the unique hint just starts
to throw a warning in crm shell now. So, because these hints have been
ignored so far, they have not been tested, not even by time...

These hints are also not enforced by the cib (which does not know about
them anyways), but only hints to some frontend.
And because some frontend now started to at least consider these
hints, we are having this discussion now...

Lars
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-16 Thread Florian Haas
On 2011-06-16 10:51, Lars Ellenberg wrote:
 On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote:
 On 2011-06-16 09:03, Lars Ellenberg wrote:
 With the current unique=true/false, you cannot express that.

 Thanks. You learn something every day. :)
 
 Sorry that I left off the As you are well aware of,
 introductionary phrase. ;-)

In case that wasn't clear earlier, I was very much not aware of this. I
wasn't being ironic, for a change. :)

 Question is: do we really want or need that.

 That is a discussion for the updated OCF RA spec discussion, really. And
 the driver of that discussion is currently submerged. :)
 
 I guess this was @LMB?
 Hey there ... do you read? :)

He is on a diving vacation in Croatia. Not only was I not being ironic;
I referred to his literal submersion.

Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [Linux-HA] Announcement for Heartbeat 3.0.5

2011-06-16 Thread Lars Ellenberg

About two month ago, dealing with a bug report of some paying customer,
I fixed some long standing bugs in the heartbeat communication layer
that caused heartbeat to segfault, and other bad behaviour.

These bugs where triggered by misbehaving API clients,
respectively massive packet loss on the communication links,
and have been present basically since inception.

The changelog does not really look spectacular,
but is supposed to very much improve the robustness of the heartbeat
communication stack if you experience massive packet loss on all
channels, for whatever reason.

As these are fixes that affect the heartbeat messaging core,
they are relevant for both Pacemaker and haresources style clusters.

Changelog:
  - do not request retransmission of lost messages from dead members
  - fix segfault due to recursion in api_remove_client_pid
  - properly cleanup pending delayed rexmit requests before reset of seqtrack
  - create HA_RSCTMP on start, if necessary
  - improve detection of pacemaker clusters in init script

Tarball:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/STABLE-3.0.5.tar.bz2

Enjoy!

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Antwort: Re: Patch to ocf:heartbeat:IPaddr2 to check if link status is up

2011-06-16 Thread alexander . krauth
Dejan Muhamedagic schrieb am 15.06.2011 15:40:21:
 On Tue, Jun 14, 2011 at 07:15:21PM +0200, alexander.kra...@basf.com 
wrote:
  Dejan Muhamedagic schrieb am 08.06.2011 18:32:16:
   Hi Alexander,
   On Mon, Jun 06, 2011 at 05:42:30PM +0200, alexander.kra...@basf.com 
  wrote:
Dejan Muhamedagic schrieb am 04.04.2011 14:35:34:
 On Fri, Mar 18, 2011 at 04:15:16PM +0100, 
alexander.kra...@basf.com 
wrote:
  Hi,
  
  Dejan Muhamedagic schrieb am 18.03.2011 14:31:08:
   Hi,
   
   On Wed, Mar 16, 2011 at 04:58:25PM +0100, Corvus Corax 
wrote:

IPAddr2 puts the interface up on start and down on stop.
But its not able to detect an UP or DOWN change in status 
or 
monitor.

Therefore an ifconfig interface down from a thrird 
program 
  or 
a
careless administrator would drop the link without 
pacemaker 
noticing!
   
   Hmm, careless administrator is somewhat of a paradox, right?
   
   Really, what was your motivation for this? It makes me 
wonder,
   since this RA has existed for many years and so far nobody
   bothered to test this.
  
  Hm, maybe the idea behind is not totally new. Remember this 
  thread:
  

  
http://lists.community.tummy.com/pipermail/linux-ha-dev/2011-February/018184.html

  

  
  I would go with the remarks of LMB, that this is something 
closer 
  to
  the pingd than to Ipaddr2. Isn't the real intention of both 
post, 
  that 
you
  want to know, if your network interface is vital ?
 
 Yes.
 
  You may use pingd for that, but someone may be concerned to 
ping 
  the 
right
  remote device (also a default-gateway might not be a very 
static 
  thing 
in
  a modern network).
  
  My imagination is currently an agent (let's call it 
ethmonitor) 
  that 
  monitors
  a network interface with a combination of the fine methods 
that 
  Robert 

  Euhus
  has posted in his patch. Than you could define some rules in 
CIB 
  how 
to
  react on the event of a failed network interface. Sure this 
  assumes 
that 
  you
  do your heartbeats over more than one interface.
  
  It would check:
   1. interface link up ?
   2. does the RX counter of the interface increase during a 
certain 
  
amout 
  of time ?
   3. do I have some other nodes in my arp-cache which I could 
  arping ?
   4. maybe retry all checks to overcome short outages
  If all questions are answered with NO - the interface is dead.
  
  I would add my vote for such a feature.
 
 Just took a look at the thread you referenced above.
 Unfortunately, the author didn't get back with the new code
 after review and short discussion.
 

Now I took the code from Robert in the above referenced thread and 
put 
  it 
into a complete new RA.
It is based very much on the existing pind agent, but implements 
the 
monitoring like discussed above.
   
   Great!
   
Please let me know, what you think about it.
   
   Does it work? :)
  
  Yes, it does. For me in my test environment. :-)
  I did review your comments and attached a new version of the agent (as 
it 
  is not in the repository for diffs).
  Some comments of your comments below.
  
  Regards
  Alex
  
   
   See below for a few comments.
   
   Cheers,
   
   Dejan
   

Cheers,
Alex
   
#!/bin/sh
#
#   OCF Resource Agent compliant script.
#   Monitor the vitality of a local network interface.
#
#Based on the work by Robert Euhus and Lars Marowsky-Brée.
#
#   Transfered from Ipaddr2 into ethmonitor by Alexander Krauth
#
# Copyright (c) 2011 Robert Euhus, Alexander Krauth, Lars 
  Marowsky-Brée
#All Rights Reserved.
#
# This program is free software; you can redistribute it and/or 
modify
# it under the terms of version 2 of the GNU General Public 
License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, 
but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that 
it 
  is
# free of the rightful claim of any third person regarding 
  infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, 
if
# any, provided herein do not apply to combinations of this 
program 
  with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public 
License
# along with this program; if not, write the Free Software 
Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.

Re: [Linux-ha-dev] Antwort: Re: Patch to ocf:heartbeat:IPaddr2 to check if link status is up

2011-06-16 Thread Dejan Muhamedagic
Hi,

I pushed the RA to the repository. Just changed the meta-data a
bit to mention the default for the name parameter and removed
the check for probe before check_binary (it is not necessary in
this case).

Many thanks for the contribution!

Cheers,

Dejan

On Thu, Jun 16, 2011 at 04:34:54PM +0200, alexander.kra...@basf.com wrote:
 Dejan Muhamedagic schrieb am 15.06.2011 15:40:21:
  On Tue, Jun 14, 2011 at 07:15:21PM +0200, alexander.kra...@basf.com 
 wrote:
   Dejan Muhamedagic schrieb am 08.06.2011 18:32:16:
Hi Alexander,
On Mon, Jun 06, 2011 at 05:42:30PM +0200, alexander.kra...@basf.com 
   wrote:
 Dejan Muhamedagic schrieb am 04.04.2011 14:35:34:
  On Fri, Mar 18, 2011 at 04:15:16PM +0100, 
 alexander.kra...@basf.com 
 wrote:
   Hi,
   
   Dejan Muhamedagic schrieb am 18.03.2011 14:31:08:
Hi,

On Wed, Mar 16, 2011 at 04:58:25PM +0100, Corvus Corax 
 wrote:
 
 IPAddr2 puts the interface up on start and down on stop.
 But its not able to detect an UP or DOWN change in status 
 or 
 monitor.
 
 Therefore an ifconfig interface down from a thrird 
 program 
   or 
 a
 careless administrator would drop the link without 
 pacemaker 
 noticing!

Hmm, careless administrator is somewhat of a paradox, right?

Really, what was your motivation for this? It makes me 
 wonder,
since this RA has existed for many years and so far nobody
bothered to test this.
   
   Hm, maybe the idea behind is not totally new. Remember this 
   thread:
   
 
   
 http://lists.community.tummy.com/pipermail/linux-ha-dev/2011-February/018184.html
 
   
 
   
   I would go with the remarks of LMB, that this is something 
 closer 
   to
   the pingd than to Ipaddr2. Isn't the real intention of both 
 post, 
   that 
 you
   want to know, if your network interface is vital ?
  
  Yes.
  
   You may use pingd for that, but someone may be concerned to 
 ping 
   the 
 right
   remote device (also a default-gateway might not be a very 
 static 
   thing 
 in
   a modern network).
   
   My imagination is currently an agent (let's call it 
 ethmonitor) 
   that 
   monitors
   a network interface with a combination of the fine methods 
 that 
   Robert 
 
   Euhus
   has posted in his patch. Than you could define some rules in 
 CIB 
   how 
 to
   react on the event of a failed network interface. Sure this 
   assumes 
 that 
   you
   do your heartbeats over more than one interface.
   
   It would check:
1. interface link up ?
2. does the RX counter of the interface increase during a 
 certain 
   
 amout 
   of time ?
3. do I have some other nodes in my arp-cache which I could 
   arping ?
4. maybe retry all checks to overcome short outages
   If all questions are answered with NO - the interface is dead.
   
   I would add my vote for such a feature.
  
  Just took a look at the thread you referenced above.
  Unfortunately, the author didn't get back with the new code
  after review and short discussion.
  
 
 Now I took the code from Robert in the above referenced thread and 
 put 
   it 
 into a complete new RA.
 It is based very much on the existing pind agent, but implements 
 the 
 monitoring like discussed above.

Great!

 Please let me know, what you think about it.

Does it work? :)
   
   Yes, it does. For me in my test environment. :-)
   I did review your comments and attached a new version of the agent (as 
 it 
   is not in the repository for diffs).
   Some comments of your comments below.
   
   Regards
   Alex
   

See below for a few comments.

Cheers,

Dejan

 
 Cheers,
 Alex

 #!/bin/sh
 #
 #   OCF Resource Agent compliant script.
 #   Monitor the vitality of a local network interface.
 #
 #Based on the work by Robert Euhus and Lars Marowsky-Brée.
 #
 #   Transfered from Ipaddr2 into ethmonitor by Alexander Krauth
 #
 # Copyright (c) 2011 Robert Euhus, Alexander Krauth, Lars 
   Marowsky-Brée
 #All Rights Reserved.
 #
 # This program is free software; you can redistribute it and/or 
 modify
 # it under the terms of version 2 of the GNU General Public 
 License as
 # published by the Free Software Foundation.
 #
 # This program is distributed in the hope that it would be useful, 
 but
 # WITHOUT ANY WARRANTY; without even the implied warranty of
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 #
 # Further, this software is distributed without any warranty that 
 it 
   is
 # free of the rightful claim of any third person regarding 
   infringement
  

Re: [Linux-ha-dev] Uniquness OCF Parameters

2011-06-16 Thread Alan Robertson
On 06/16/2011 02:51 AM, Lars Ellenberg wrote:
 On Thu, Jun 16, 2011 at 09:48:20AM +0200, Florian Haas wrote:
 On 2011-06-16 09:03, Lars Ellenberg wrote:
 With the current unique=true/false, you cannot express that.
 Thanks. You learn something every day. :)
 Sorry that I left off the As you are well aware of,
 introductionary phrase. ;-)

 I just summarized the problem:

 Depending on what we chose the meaning to be,
 parameters marked unique=true would be required to
either be all _independently_ unique,
or be unique as a tuple.
 And made a suggestion how to solve it:

 If we want to be able to express both, we need a different markup.

 Of course, we can move the markup out of the parameter description,
 into an additional markup, that spells them out,
 likeunique params=foo,bar /unique params=bla /.

 But using unique=0 as the current non-unique meaning, then
 unique=small-integer-or-even-named-label-who-cares, would
 name the scope for this uniqueness requirement,
 where parameters marked with the same such label
 would form a unique tuple.
 Enables us to mark multiple tuples, and individual parameters,
 at the same time.
 If we really think it _is_ a problem.
If one wanted to, one could say
 unique=1,3
or
 unique=1
 unique=3

Then parameters which share the same uniqueness list are part of the 
same uniqueness grouping.  Since RAs today normally say unique=1, if one 
excluded the unique group 0 from being unique, then this could be done 
in a completely upwards-compatible way for nearly all resources.


-- 
 Alan Robertsonal...@unix.sh

Openness is the foundation and preservative of friendship...  Let me claim 
from you at all times your undisguised opinions. - William Wilberforce

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Prototypic declaration is insufficient.

2011-06-16 Thread renayama19661014
Hi all,

Because there is not prototypic declaration, in the top of the source of glue, 
I cannot compile it.

diff -r 7d9a54d5da6c main.c
--- a/main.cFri Jun 17 18:34:21 2011 +0900
+++ b/main.cFri Jun 17 18:34:55 2011 +0900
@@ -78,6 +78,7 @@
 void log_buf(int severity, char *buf);
 void log_msg(int severity, const char * fmt, ...)G_GNUC_PRINTF(2,3);
 void trans_log(int priority, const char * fmt, ...)G_GNUC_PRINTF(2,3);
+void setup_cl_log(void);
 
 static int pil_loglevel_to_syslog_severity[] = {
/* Indices: none=0, PIL_FATAL=1, PIL_CRIT=2, PIL_WARN=3,


Best Regards,
Hideo Yamauch.

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-HA] Always Get a Billion Failed Actions

2011-06-16 Thread Robinson, Eric
crm_mon on my system displays a lot of failed actions, I guess because
the init script for the resource is not fully lsb compliant?
 
In any case, the resources seem to work okay and failover okay. 
 
How can I get rid of all those failed actions?
 
crm_mon output follows...
 
 

Last updated: Thu Jun 16 03:32:32 2011
Stack: Heartbeat
Current DC: ha07b.mydomain.com (6080642c-bad3-4bb8-80ba-db6b1f7a0735) -
partition with quorum
Version: 1.0.9-89bd754939df5150de7cd76835f98fe90851b677
3 Nodes configured, unknown expected votes
4 Resources configured.

 
Online: [ ha07c.mydomain.com ha07b.mydomain.com ha07a.mydomain.com ]
 
 Resource Group: g_clust04
 p_fs_clust04   (ocf::heartbeat:Filesystem):Started
ha07a.mydomain.com
 p_vip_clust04  (ocf::heartbeat:IPaddr2):   Started
ha07a.mydomain.com
 p_mysql_001(lsb:mysql_001):Started
ha07a.mydomain.com
 p_mysql_230(lsb:mysql_230):Started
ha07a.mydomain.com
 p_mysql_231(lsb:mysql_231):Started
ha07a.mydomain.com
 p_mysql_232(lsb:mysql_232):Started
ha07a.mydomain.com
 p_mysql_233(lsb:mysql_233):Started
ha07a.mydomain.com
 p_mysql_234(lsb:mysql_234):Started
ha07a.mydomain.com
 p_mysql_235(lsb:mysql_235):Started
ha07a.mydomain.com
 p_mysql_236(lsb:mysql_236):Started
ha07a.mydomain.com
 p_mysql_237(lsb:mysql_237):Started
ha07a.mydomain.com
 p_mysql_238(lsb:mysql_238):Started
ha07a.mydomain.com
 p_mysql_239(lsb:mysql_239):Started
ha07a.mydomain.com
 p_mysql_240(lsb:mysql_240):Started
ha07a.mydomain.com
 p_mysql_241(lsb:mysql_241):Started
ha07a.mydomain.com
 p_mysql_242(lsb:mysql_242):Started
ha07a.mydomain.com
 p_mysql_243(lsb:mysql_243):Started
ha07a.mydomain.com
 p_mysql_244(lsb:mysql_244):Started
ha07a.mydomain.com
 p_mysql_245(lsb:mysql_245):Started
ha07a.mydomain.com
 p_mysql_246(lsb:mysql_246):Started
ha07a.mydomain.com
 p_mysql_247(lsb:mysql_247):Started
ha07a.mydomain.com
 p_mysql_248(lsb:mysql_248):Started
ha07a.mydomain.com
 p_mysql_249(lsb:mysql_249):Started
ha07a.mydomain.com
 p_mysql_250(lsb:mysql_250):Started
ha07a.mydomain.com
 p_mysql_251(lsb:mysql_251):Started
ha07a.mydomain.com
 p_mysql_252(lsb:mysql_252):Started
ha07a.mydomain.com
 p_mysql_253(lsb:mysql_253):Started
ha07a.mydomain.com
 p_mysql_254(lsb:mysql_254):Started
ha07a.mydomain.com
 p_mysql_255(lsb:mysql_255):Started
ha07a.mydomain.com
 p_mysql_256(lsb:mysql_256):Started
ha07a.mydomain.com
 p_mysql_257(lsb:mysql_257):Started
ha07a.mydomain.com
 p_mysql_258(lsb:mysql_258):Started
ha07a.mydomain.com
 p_mysql_259(lsb:mysql_259):Started
ha07a.mydomain.com
 p_mysql_260(lsb:mysql_260):Started
ha07a.mydomain.com
 p_mysql_261(lsb:mysql_261):Started
ha07a.mydomain.com
 p_mysql_262(lsb:mysql_262):Started
ha07a.mydomain.com
 p_mysql_263(lsb:mysql_263):Started
ha07a.mydomain.com
 p_mysql_264(lsb:mysql_264):Started
ha07a.mydomain.com
 p_mysql_265(lsb:mysql_265):Started
ha07a.mydomain.com
 p_mysql_266(lsb:mysql_266):Started
ha07a.mydomain.com
 p_mysql_267(lsb:mysql_267):Started
ha07a.mydomain.com
 p_mysql_268(lsb:mysql_268):Started
ha07a.mydomain.com
 p_mysql_269(lsb:mysql_269):Started
ha07a.mydomain.com
 p_mysql_270(lsb:mysql_270):Started
ha07a.mydomain.com
 p_mysql_271(lsb:mysql_271):Started
ha07a.mydomain.com
 p_mysql_272(lsb:mysql_272):Started
ha07a.mydomain.com
 p_mysql_273(lsb:mysql_273):Started
ha07a.mydomain.com
 p_mysql_274(lsb:mysql_274):Started
ha07a.mydomain.com
 p_mysql_275(lsb:mysql_275):Started
ha07a.mydomain.com
 p_mysql_276(lsb:mysql_276):Started
ha07a.mydomain.com
 p_mysql_277(lsb:mysql_277):Started
ha07a.mydomain.com
 p_mysql_009(lsb:mysql_009):Started
ha07a.mydomain.com
 p_mysql_021(lsb:mysql_021):Started
ha07a.mydomain.com
 p_mysql_052(lsb:mysql_052):Started
ha07a.mydomain.com
 p_mysql_138(lsb:mysql_138):Started
ha07a.mydomain.com
 p_mysql_278(lsb:mysql_278):Started
ha07a.mydomain.com
 p_mysql_279(lsb:mysql_279):Started
ha07a.mydomain.com
 p_mysql_280

[Linux-HA] Announcement for Heartbeat 3.0.5

2011-06-16 Thread Lars Ellenberg

About two month ago, dealing with a bug report of some paying customer,
I fixed some long standing bugs in the heartbeat communication layer
that caused heartbeat to segfault, and other bad behaviour.

These bugs where triggered by misbehaving API clients,
respectively massive packet loss on the communication links,
and have been present basically since inception.

The changelog does not really look spectacular,
but is supposed to very much improve the robustness of the heartbeat
communication stack if you experience massive packet loss on all
channels, for whatever reason.

As these are fixes that affect the heartbeat messaging core,
they are relevant for both Pacemaker and haresources style clusters.

Changelog:
  - do not request retransmission of lost messages from dead members
  - fix segfault due to recursion in api_remove_client_pid
  - properly cleanup pending delayed rexmit requests before reset of seqtrack
  - create HA_RSCTMP on start, if necessary
  - improve detection of pacemaker clusters in init script

Tarball:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/STABLE-3.0.5.tar.bz2

Enjoy!

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] heartbeat step down after split brain scenario

2011-06-16 Thread Jack Berg

I have a two node cluster using heartbeat and haproxy. Unfortunately it is
impossible to provide redundant heartbeat paths between the two nodes at
different sites so it is possible for a failure to cause split brain.

To evaluate the impact I tried disconnecting the two nodes and I found that
both become active and both try to keep the VIPs after the link is restored.

Is this avoidable using the auto_failback option?


-- 
View this message in context: 
http://old.nabble.com/heartbeat-step-down-after-split-brain-scenario-tp31858728p31858728.html
Sent from the Linux-HA mailing list archive at Nabble.com.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] resource agents 3.9.1 final release

2011-06-16 Thread Fabio M. Di Nitto

Hi everybody,

The current resource agent repository [1] has been tagged to v3.9.1.
Tarballs are also available [2].

This is the very first release of the common resource agent repository.
It is a big milestone towards eliminating duplication of effort with the
goal of improving the overall quality and user experience. There is
still a long way to go but the first stone has been laid down.

Highlights for the LHA resource agents set:

- lxc, symlink: new resource agents
- db2: major rewrite and support for master/slave mode of operation
- exportfs: backup/restore of rmtab is back
- mysql: multiple improvements for master/slave and replication
- ocft: new tests for pgsql, postfix, and iscsi
- CTDB: minor bug fixes
- pgsql: improve configuration check and probe handling

Highlights for the rgmanager resource agents set:

- oracledb: use shutdown immediate
- tomcat5: fix generated XML
- nfsclient: fix client name mismatch
- halvm: fix mirror dev failure
- nfs: fix selinux integration

Several changes have been made to the build system and the spec file to
accommodate both projects´ needs. The most noticeable change is the
option to select all, linux-ha or rgmanager resource agents at
configuration time, which will also set the default for the
spec file. Also several improvements have been made to correctly build
srpm/rpms on different distributions in different versions.

The full list of changes is available in the ChangeLog file for users,
and in an auto-generated git-to-changelog file called ChangeLog.devel.

NOTE: About the 3.9.x version (particularly for linux-ha folks): This
version was chosen simply because the rgmanager set was already at
3.1.x. In order to make it easier for distribution, and to keep package
upgrades linear, we decided to bump the number higher than both
projects. There is no other special meaning associated with it.

Many thanks to everybody who helped with this release, in
particular to the numerous contributors. Without you, the release
would certainly not be possible.

Cheers,
The RAS Tribe

[1] https://github.com/ClusterLabs/resource-agents/tarball/v3.9.1
[2] https://fedorahosted.org/releases/r/e/resource-agents/

PS: I am absolutely sure that URL [2] might give some people a fit, but
we are still working to get a common release area.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] resource agents 3.9.1 final release

2011-06-16 Thread Gianluca Cecchi
On Thu, Jun 16, 2011 at 3:13 PM, Fabio M. Di Nitto  wrote:

 Highlights for the rgmanager resource agents set:

 - oracledb: use shutdown immediate

hello,
from oracledb.sh.in I can see this actually is not a configurable
parameter, so that I cannot choose between immediate and abort,
and I think it is not the best change.


faction Stopping Oracle Database: stop_db immediate
if [ $? -ne 0 ]; then
faction Stopping Oracle Database (hard): stop_db
abort || return 1
fi


There are situations where an occurring problem could let a DB stuck
on shutdown immediate, preventing completion of the command itself so
you will never arrive to the error code condition to try the abort
option...
And also:

SHUTDOWN IMMEDIATE
No new connections are allowed, nor are new transactions allowed to be
started, after the statement is issued.
Any uncommitted transactions are rolled back. (If long uncommitted
transactions exist, this method of shutdown might not complete
quickly, despite its name.)
Oracle does not wait for users currently connected to the database to
disconnect. Oracle implicitly rolls back active transactions and
disconnects all connected users.


it is true that in case of shutdown abort you have anyway to rollback
too, during the following crash recovery of startup phase, but I'd
prefer to do this on the node where I'm going to land to and not on
the node that I'm leaving (possibly because of a problem).
In my opinion the only situation where immediate is better is for
planned maintenance.

Just my opininon.
Keep on with the good job
Gianluca
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat step down after split brain scenario

2011-06-16 Thread Dimitri Maziuk
On 06/16/2011 04:28 AM, Jack Berg wrote:
 
 I have a two node cluster using heartbeat and haproxy. Unfortunately it is
 impossible to provide redundant heartbeat paths between the two nodes at
 different sites so it is possible for a failure to cause split brain.
 
 To evaluate the impact I tried disconnecting the two nodes and I found that
 both become active and both try to keep the VIPs after the link is restored.

What do you mean by disconnecting: what's your failure scenario and
how do you expect it to be handled?

Running daemons are not guaranteed (arguably, expected) to notice when
the network cable is unplugged. You have to monitor the link and restart
all processes that bind()/listen() on the interface.

If your nodes are at different sites, you need to also deal with the
loss of link at the switch, gateway, etc., and figure out which one is
still connected to the Internet -- and gets to keep the VIP. Which in
general can't be done from the nodes themselves.

Dima
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] resource agents 3.9.1 final release

2011-06-16 Thread Vadym Chepkov

On Jun 16, 2011, at 9:13 AM, Fabio M. Di Nitto wrote:

 The current resource agent repository [1] has been tagged to v3.9.1.
 Tarballs are also available [2].


Is there an instruction anywhere how to make rpm out of it or to compile it in 
general?
http://www.clusterlabs.org/wiki/Install#Resource_Agents is obsolete, I imagine.

Thank you,
Vadym 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems