Re: [Linux-ha-dev] [Pacemaker] Fw: new important message

2016-02-18 Thread Serge Dubrouski
Got hacked?

On Thu, Feb 18, 2016, 7:53 PM Dejan Muhamedagic 
wrote:

> Hello!
>
>
>
> *New message, please read* http://estoncamlievler76.com/leaving.php
> 
>
>
>
> Dejan Muhamedagic
> ___
> Pacemaker mailing list: pacema...@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] support LSBs /run directory

2013-12-18 Thread Serge Dubrouski
Lars -

I'm a bit lost on path=$varrun/$path. Why do we need to add $varrun if it
doesn't present? The goal is to cover the case when /run , or any other
directory, is used instead of /var/run




On Tue, Dec 17, 2013 at 1:48 PM, Timur I. Bakeyev ti...@com.bat.ru wrote:

 Hi, Lars!

 On Tue, Dec 17, 2013 at 1:43 PM, Lars Ellenberg lars.ellenb...@linbit.com
  wrote:

 On Tue, Dec 17, 2013 at 02:39:52AM +0100, Timur I. Bakeyev wrote:
  Hi, guys!
 
  Any reaction, please?

 Probably best to add a helper to ocf-functions, say,
 # require_run_dir mode user:group path
 require_run_dir()
 {
 local mode=$1 owner=$2 path=$3
 local $varrun=@@__varrun_or_whatever_autofoo_calls_it__@@
 case $path in
 $varrun/*)  : nothing ;;
 *)
 path=$varrun/$path ;;
 esac
 test -d $path  return 0
 [ $(id -u) = 0 ] || return 1

 # (or some helper function mkdir_p, in case we doubt -p is
 available...)
 mkdir -p $path  chown $owner $path  chmod $mode $path
 }


 Then use that early in the various resource agents,
 maybe where the defaults are defined.

 Yes?



 That would be even better! There are few more RAs that would benefit from
 that.

 I'd only invert the parameters, as path is mandatory, permissions are
 semi-optional and owner, as in 99% we run as root - optional. And I'd put
 some meaningful defaults for the later two parameters:

 local path=$1 mode=$2 owner=$3
 : ${mode:=0755}
 : ${owner:=root}

 Also, as 'path' is usually is smth. like '/var/run/named' or
 '/var/run/zabbix' I'm afraid that switch will do nothing in any case.

 need a bit more magic, smth. like:

 if [ -n ${path##$varrun} ]

 or alike.

 With best regards,
 Timur.





 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] Heartbeat with Oracle's ASM

2012-11-15 Thread Serge Dubrouski
There is an RA for Oracle that can be used with Pacemaker. Generally ASM
behaves like a regular Oracle instance, so you can try it.
 On Nov 15, 2012 8:57 AM, Hill Fang hill.f...@ericsson.com wrote:

 Hi friend:

 I want know heartbeat is support oracle ASM now??




 HILL FANG
 Engineer

 Guangzhou Ericsson Communication Services Co.,Ltd.(GTC)
 SI Support
 2 /F, NO. 1025 Gaopu Road, Tianhe Software Park,Tianhe District, Guangzhou,
 510663, PR China
 Phone +86 020-85117631
 Fax +86 020-29002699
 SMS/MMS 15813329521
 hill.f...@ericsson.com
 www.ericsson.com


 [http://www.ericsson.com/]http://www.ericsson.com/

 This Communication is Confidential. We only send and receive email on the
 basis of the terms set out at www.ericsson.com/email_disclaimer
 http://www.ericsson.com/email_disclaimer


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] Patch for named

2012-10-03 Thread Serge Dubrouski
Look at start function. If one sets rootdir parameter to / , then start
function strips it and monitor fails. So the patch fixes it.
On Oct 3, 2012 7:45 AM, Dejan Muhamedagic de...@suse.de wrote:

 Hi Serge,

 On Mon, Oct 01, 2012 at 08:29:50PM -0600, Serge Dubrouski wrote:
  Hi, Dejan -
 
  Will you apply it?

 The grep ps part I'll apply. I was just curious why the previous
 version didn't work, but I guess it's not worth the time to
 investigate.

 And I'm trying to understand this part:

  named_getpid () {
  local pattern=$OCF_RESKEY_named

 -if [ -n $OCF_RESKEY_named_rootdir ]; then
 +if [ -n $OCF_RESKEY_named_rootdir -a x${OCF_RESKEY_named_rootdir}
 != x/ ]; then
 pattern=$pattern.*-t $OCF_RESKEY_named_rootdir
  fi

 How would named_rootdir be set to / unless the user sets it as
 a parameter? Why would / then be treated differently?

 Cheers,

 Dejan

  On Fri, Sep 28, 2012 at 5:09 AM, Serge Dubrouski serge...@gmail.com
 wrote:
 
   Yes it is. It also includes a fix for a small bug. So 2  lines changed.
   On Sep 28, 2012 2:54 AM, Dejan Muhamedagic de...@suse.de wrote:
  
   Hi Serge,
  
   On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote:
Hello -
   
Attached a short patch for named RA to fix improve getpid function.
  
   Sorry for the delay. Is this the same as
   https://github.com/ClusterLabs/resource-agents/issues/134
   and
   https://github.com/ClusterLabs/resource-agents/pull/140
  
   Cheers,
  
   Dejan
  
--
Serge Dubrouski.
  
  
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
  
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
  
  
 
 
  --
  Serge Dubrouski.

  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for named

2012-10-01 Thread Serge Dubrouski
Hi, Dejan -

Will you apply it?

On Fri, Sep 28, 2012 at 5:09 AM, Serge Dubrouski serge...@gmail.com wrote:

 Yes it is. It also includes a fix for a small bug. So 2  lines changed.
 On Sep 28, 2012 2:54 AM, Dejan Muhamedagic de...@suse.de wrote:

 Hi Serge,

 On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote:
  Hello -
 
  Attached a short patch for named RA to fix improve getpid function.

 Sorry for the delay. Is this the same as
 https://github.com/ClusterLabs/resource-agents/issues/134
 and
 https://github.com/ClusterLabs/resource-agents/pull/140

 Cheers,

 Dejan

  --
  Serge Dubrouski.


  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for named

2012-09-28 Thread Serge Dubrouski
Yes it is. It also includes a fix for a small bug. So 2  lines changed.
On Sep 28, 2012 2:54 AM, Dejan Muhamedagic de...@suse.de wrote:

 Hi Serge,

 On Sat, Sep 22, 2012 at 09:11:53AM -0600, Serge Dubrouski wrote:
  Hello -
 
  Attached a short patch for named RA to fix improve getpid function.

 Sorry for the delay. Is this the same as
 https://github.com/ClusterLabs/resource-agents/issues/134
 and
 https://github.com/ClusterLabs/resource-agents/pull/140

 Cheers,

 Dejan

  --
  Serge Dubrouski.


  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patch for named

2012-09-22 Thread Serge Dubrouski
Hello -

Attached a short patch for named RA to fix improve getpid function.

-- 
Serge Dubrouski.


named.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] add support for non-standard library locations and non-standard port

2012-09-15 Thread Serge Dubrouski
While I'm agree with exporting PGPORT I'm not sure that pglibs needs to be
in the RA. Why not to use /etc/ld.so.conf.d/ instead?


On Fri, Sep 14, 2012 at 10:10 AM, David Corlette dcorle...@netiq.comwrote:

 From: David Corlette dcorle...@moolap.esecurity.net

 ---
  heartbeat/pgsql |   24 
  1 files changed, 24 insertions(+), 0 deletions(-)

 diff --git a/heartbeat/pgsql b/heartbeat/pgsql
 index b57488d..9c66d56 100755
 --- a/heartbeat/pgsql
 +++ b/heartbeat/pgsql
 @@ -6,6 +6,7 @@
  # Authors:  Serge Dubrouski (serge...@gmail.com) -- original RA
  #   Florian Haas (flor...@linbit.com) -- makeover
  #   Takatoshi MATSUO (matsuo@gmail.com) -- support
 replication
 +#  David Corlette (dcorle...@netiq.com) -- add support for
 non-standard library locations and non-standard port
  #
  # Copyright:2006-2012 Serge Dubrouski serge...@gmail.com
  # and other Linux-HA contributors
 @@ -40,6 +41,7 @@ OCF_RESKEY_pgdata_default=/var/lib/pgsql/data
  OCF_RESKEY_pgdba_default=postgres
  OCF_RESKEY_pghost_default=
  OCF_RESKEY_pgport_default=5432
 +OCF_RESKEY_pglibs_default=/usr/lib
  OCF_RESKEY_start_opt_default=
  OCF_RESKEY_pgdb_default=template1
  OCF_RESKEY_logfile_default=/dev/null
 @@ -67,6 +69,7 @@ OCF_RESKEY_stop_escalate_in_slave_default=30
  : ${OCF_RESKEY_pgdba=${OCF_RESKEY_pgdba_default}}
  : ${OCF_RESKEY_pghost=${OCF_RESKEY_pghost_default}}
  : ${OCF_RESKEY_pgport=${OCF_RESKEY_pgport_default}}
 +: ${OCF_RESKEY_pglibs=${OCF_RESKEY_pglibs_default}}
  : ${OCF_RESKEY_config=${OCF_RESKEY_pgdata}/postgresql.conf}
  : ${OCF_RESKEY_start_opt=${OCF_RESKEY_start_opt_default}}
  : ${OCF_RESKEY_pgdb=${OCF_RESKEY_pgdb_default}}
 @@ -185,6 +188,14 @@ Port where PostgreSQL is listening
  content type=integer default=${OCF_RESKEY_pgport_default} /
  /parameter

 +parameter name=pglibs unique=0 required=0
 +longdesc lang=en
 +The location of the Postgres libraries.
 +/longdesc
 +shortdesc lang=enpglibs/shortdesc
 +content type=string default=${OCF_RESKEY_pglibs_default} /
 +/parameter
 +
  parameter name=monitor_user unique=0 required=0
  longdesc lang=en
  PostgreSQL user that pgsql RA will user for monitor operations. If it's
 not set
 @@ -1691,6 +1702,19 @@ else
 fi
  fi

 +if [ -n $OCF_RESKEY_pgport ]; then
 +   export PGPORT=$OCF_RESKEY_pgport
 +fi
 +
 +if [ -n $OCF_RESKEY_pglibs ]; then
 +   if [ -n $LD_LIBRARY_PATH ]; then
 +   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OCF_RESKEY_pglibs
 +   else
 +   export LD_LIBRARY_PATH=$OCF_RESKEY_pglibs
 +   fi
 +fi
 +
 +
  # What kind of method was invoked?
  case $1 in
  status) if pgsql_status
 --
 1.6.0.2

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] HA samba?

2012-04-25 Thread Serge Dubrouski
On Wed, Apr 25, 2012 at 4:28 PM, Seth Galitzer sg...@ksu.edu wrote:

 On 04/25/2012 05:12 PM, Dimitri Maziuk wrote:
  On 04/25/2012 03:53 PM, Seth Galitzer wrote:
  Can anybody point me to recent docs on how to go about setting this up?
 I've found several much older posts, but not much current with any
  kind of helpful detail.
 
  If you're running active/passive DRBD, it's what the wiki page calls
  mounted on one node at a time. That one's simple: use drbdlinks to
  keep everything incl. /etc/samba on the drbd filesystem and fire up smbd
  and nmbd after drbdlinks -- pretty much like any other daemon backed by
  drbd storage.
 

 I see how that will get all the locking and user data and that should be
 easy enough to configure.  But I'm also doing ADS integration instead of
 winbind, and that also seems to be a problem as only one node can be
 joined to the AD at a time, even with a shared IP.  Any suggestions for
 that?


Currently there is no official RA for smbd and nmbd daemons. You can try to
create one, and include joining domain there into a stat function, though I
don't need why you'd need it because AFAIK join domain is a one time
action unless you want to re-register your server in the domain.

So you can try to anything RA to control smbd and nmbd daemons, or you
can use LSB samba agent for that.

Also if you want just Samba you probably don't need exportfs and nfsd.



 Thanks.
 Seth

 --
 Seth Galitzer
 Systems Coordinator
 Computing and Information Sciences
 Kansas State University
 http://www.cis.ksu.edu/~sgsax
 sg...@ksu.edu
 785-532-7790
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] Patch: pgsql streaming replication

2012-03-19 Thread Serge Dubrouski
Sorry, we'll rework the patch.
 On Mar 19, 2012 2:39 PM, Soffen, Matthew msof...@iso-ne.com wrote:

 I believe that the reason for not using #bash is that it is it NOT part of
 the default install on non Linux systems.


 Matthew Soffen
 Principal Software Testing Coordinator
 ISO New England - http://www.iso-ne.com/



 -Original Message-
 From: linux-ha-dev-boun...@lists.linux-ha.org [mailto:
 linux-ha-dev-boun...@lists.linux-ha.org] On Behalf Of Lars Marowsky-Bree
 Sent: Monday, March 19, 2012 4:24 PM
 To: High-Availability Linux Development List
 Subject: Re: [Linux-ha-dev] Patch: pgsql streaming replication

 On 2012-03-19T11:09:16, Dejan Muhamedagic de...@suse.de wrote:

   --- a/heartbeat/pgsql
   +++ b/heartbeat/pgsql
   @@ -1,12 +1,13 @@
   -#!/bin/sh
   +#!/bin/bash
  Our policy is not to change shell. Is that absolutely necessary?

 He sends in many patches. bash is a 1MB install. I can't believe that in
 2012 we're still having this discussion ;-)



 Regards,
Lars

 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
 Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives
 to their mistakes. -- Oscar Wilde

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pgsql and streaming replcation

2012-03-18 Thread Serge Dubrouski
I sent a patch to the mail list. It's your RA but in some places I split
long lines into several ones.

On Tue, Mar 13, 2012 at 8:41 PM, Serge Dubrouski serge...@gmail.com wrote:

 Takatoshi -

 Please give me some time to review the latest version of RA and then we'll
 submit a patch.

 On Tue, Mar 13, 2012 at 5:35 AM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hello Serge

 2012/3/9 Takatoshi MATSUO matsuo@gmail.com:
  Hello Serge
 
  2012/3/8 Serge Dubrouski serge...@gmail.com:
  Hello, Takatoshi -
 
  On one hand this is a good feature to have, but on another if it's not
  stable, fails most of the time and brings to much of complexity to the
 RA
  then drop it.
  It seems like Andrew has plans to include support for stopping
  resources on demote in the future versions of Pacemaker.
 
  It's a good news.
  I will remove code of restarting for the future.
  Please accept a little complexity until future version is released.

 I removed code of restarting
 and I reconsider some checks of inconsistency.

 In this change
 PGSQL.lock file is created on promote
 and it's removed on demote only if there is no Slave.
 And I removed a complex check before start
 such as checking Timeline ID , Checkpoint
 because PGSQL.lock can means inconsistency.

 I want to adopt a serious stance for merging by now.
 Do you have any other comments ?
 And what's the first step for merging ?

 
  On Tue, Mar 6, 2012 at 11:43 PM, Takatoshi MATSUO 
 matsuo@gmail.com
  wrote:
 
  Hi Serge
 
  I think about removing code of restarting PostgreSQL on demote.
  Because almost restarts are failed to avoid data inconsistency.
  In addition this check is complex.
  PostgreSQL developer says that this occurrence is specific and
  can't be fix immediately.
 
  What do you think ?

 Regartds,
 Takatoshi MTSUO
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pgsql and streaming replcation

2012-03-13 Thread Serge Dubrouski
Takatoshi -

Please give me some time to review the latest version of RA and then we'll
submit a patch.

On Tue, Mar 13, 2012 at 5:35 AM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hello Serge

 2012/3/9 Takatoshi MATSUO matsuo@gmail.com:
  Hello Serge
 
  2012/3/8 Serge Dubrouski serge...@gmail.com:
  Hello, Takatoshi -
 
  On one hand this is a good feature to have, but on another if it's not
  stable, fails most of the time and brings to much of complexity to the
 RA
  then drop it.
  It seems like Andrew has plans to include support for stopping
  resources on demote in the future versions of Pacemaker.
 
  It's a good news.
  I will remove code of restarting for the future.
  Please accept a little complexity until future version is released.

 I removed code of restarting
 and I reconsider some checks of inconsistency.

 In this change
 PGSQL.lock file is created on promote
 and it's removed on demote only if there is no Slave.
 And I removed a complex check before start
 such as checking Timeline ID , Checkpoint
 because PGSQL.lock can means inconsistency.

 I want to adopt a serious stance for merging by now.
 Do you have any other comments ?
 And what's the first step for merging ?

 
  On Tue, Mar 6, 2012 at 11:43 PM, Takatoshi MATSUO matsuo@gmail.com
 
  wrote:
 
  Hi Serge
 
  I think about removing code of restarting PostgreSQL on demote.
  Because almost restarts are failed to avoid data inconsistency.
  In addition this check is complex.
  PostgreSQL developer says that this occurrence is specific and
  can't be fix immediately.
 
  What do you think ?

 Regartds,
 Takatoshi MTSUO
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pgsql and streaming replcation

2012-02-05 Thread Serge Dubrouski
Takatoshi -

Please consider apllying attached patch to your version of pgsql RA. It's
all cosmetics: replacing tabs with spaces and improving English. Thanks for
implementing recommendations I expressed earlier.


On Tue, Dec 13, 2011 at 3:37 AM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hello Serge

 2011/12/13 Serge Dubrouski serge...@gmail.com:
 
 
  On Mon, Dec 12, 2011 at 4:28 AM, Takatoshi MATSUO matsuo@gmail.com
  wrote:
 
  Hello Serge
 
  2011/12/12 Serge Dubrouski serge...@gmail.com:
  
  
   On Thu, Dec 8, 2011 at 10:04 PM, Takatoshi MATSUO 
 matsuo@gmail.com
   wrote:
  
   Hello Serge
  
   2011/12/8 Serge Dubrouski serge...@gmail.com:
   
   
On Mon, Dec 5, 2011 at 9:15 PM, Takatoshi MATSUO
matsuo@gmail.com
wrote:
   
Hello Serge
   
Serge Dubrouski serge...@gmail.com:
 Hello -

 Takatoshi MATSUO did a tremendous job on implementing support
 for
 streaming
 replication feature in pgsql RA. Also it looks like PostgeSQL
 9.1
 has
 all
 necessary interfaces to successfully implement  Pacemaker's M/S
 concept.
 So
 I think it's time to start discussion on how to merge
 Takatoshi's
 work
 into
 pgsql RA baseline. Here is the link to Takatoshi's GitHUB if
 somebody
 wants
 to test his RA:

 https://github.com/t-matsuo/

 So far I tested it for backward compatibility in a standard
 non-replication
 mode  and also tested M/S model and found no real issues. Though
 it
 definitely requires some more polishing and testing.

 Takatoshi, here are some changes that I want to discuss with
 you:

 1. Is it possible to add a check for PostgreSQL version and fail
 with
 OCF_ERR_INSTALLED when one tries to start replication on version
 less
 than
 9.1? A simple cat on PG_VERSION with some analysis would
 probably
 do.
   
I'll add a check.
  
   I added a check.
  
  
  
 https://github.com/t-matsuo/resource-agents/commit/3ab7cfdcce118043cd149b348740e50e7a946eb3
  
   
 2. I think that following lines should be moved from pgsql_start
 to
 pgsql_validate_all

  535 # Check whether tmpdir is readable by pgdba user
  536 if ! runasowner test -r $OCF_RESKEY_tmpdir; then
  537 ocf_log err Directory $OCF_RESKEY_tmpdir is not
 readable
 by
 $OCF_RESKEY_pgdba
  538 return $OCF_ERR_PERM
  539 fi
   
Thanks. I think so too.
I'll fix it.
   
  
   I fixed it and I deleted a check for tmpdir existence
   because the checking for permittion fills the role.
  
  
  
 https://github.com/t-matsuo/resource-agents/commit/82d4939486bcca429e2deb804d7faf756099bb59
  
  
On a second thought I'm not sure why we need that parameter and
directory at
all. Why not to create rep_mode.conf, PGSQL.lock and xlog.note in
$OCF_RESKEY_pgdata ? What problems it can create?
   
One more advantage to do
it in $OCF_RESKEY_pgdata is an ability to handle more than
 PostgreSQL
instance on the same server without a need for additional temp
directories.
  
   When backup is needed, customers may backup these files and restore
 it.
   It may cause problems.
   Specially PGSQL.lock causes an error on start.
  
   I think that they should be treated as a separate thing.
   because they are independent from PostgreSQL's data.
  
  
   Ok. Then may be it should default to something like
   ${OCF_RESKEY_pgdata}/temp and RA should create it and set the right
   ownership and permissions if it doesn't exist? And again may be in
 this
   case
   we don't need that parameter?
  
 
  I agree that RA creates it and sets the right ownership and permissions.
  But considering backup using tar command, I think it's not better to
  create it
  under $OCF_RESKEY_pgdata.
 
  According to Filesystem Hierarchy Standard
  it's better to use /var/lib/somewhere.
 
 
 
 http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLIBVARIABLESTATEINFORMATION
 ,
 
  Then I designed using /var/lig/[RA Name] (=/var/lib/pgsql).
  What do you think?
 
 
  Ahh, I see now where confusion comes from. /var/lib/pgsql is actually
 taken
  :-) It's used as default home directory in RedHat (at least) for postgres
  user in PostgreSQL 8.XX. When they'll start start shipping version 9
 they'll
  probably use it as well. OCF_RESKEY_pg_data  defaults to
  /var/lib/pgsql/data.
 
  Then, /var/lib is usually used for non-temporary data, For temporary
 files
  it's probably better to use /var/run or /var/tmp. If you still want to
 use
  /var/lib/pgsql you probably need to use /var/lib/pgsql/temp or so.

 Files under /var/run or /var/tmp are cleared by tmpwatch or at the
 beginning
 of the boot process, aren't they?

 Clearing PGSQL.lock file causes problem.


 
 
  
  
   Incidentally I considered to handle more than PostgreSQL instance.
   Initially I added port number to these filenames, but I deleted

Re: [Linux-ha-dev] [PATCH] named RA: support IPv6

2012-01-14 Thread Serge Dubrouski
On Sat, Jan 14, 2012 at 4:32 AM, Lars Ellenberg
lars.ellenb...@linbit.comwrote:

 On Mon, Jan 09, 2012 at 05:50:14PM +0100, Dejan Muhamedagic wrote:
  Hi Serge,
 
  On Mon, Jan 09, 2012 at 09:11:43AM -0700, Serge Dubrouski wrote:
   I did a couple of weeks ago :-)
 
  Hmm, me completely missed it. Sorry about that. Will apply the
  patch. Many thanks to Junko for the contribution.

 Hm. I apparently missed this, too.

 -if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address
 '$OCF_RESKEY_monitor_response
 +if [ $? -ne 0 ] || ! echo $output | egrep -q '.* has |IPv6 address
 '$OCF_RESKEY_monitor_response

 Not good.

 Should be
 +if [ $? -ne 0 ] || ! echo $output | grep -q '.* \(has\|IPv6\) address
 '$OCF_RESKEY_monitor_response

 Why?
 Because otherwise, as long as the resonse contains  has , it
 would match, and $OCF_RESKEY_monitor_response would be ignored.

 And, using egrep (or grep -E) would also change how
 $OCF_RESKEY_monitor_response would be interpreted,
 so could in theory break existing configurations,
 if they use grep special chars.
 If you consider this as unlikely, do

 +if [ $? -ne 0 ] || ! echo $output | grep -q -E '.* (has|IPv6) address
 '$OCF_RESKEY_monitor_response


Thanks, Lars. Of course you are right.

Dejan, could you please apply Lar's version?


 
  Thanks,
 
  Dejan
 
On Jan 9, 2012 8:00 AM, Dejan Muhamedagic de...@suse.de wrote:
  
Hi Junko-san,
   
On Tue, Dec 13, 2011 at 04:32:07PM +0900, Junko IKEDA wrote:
 Hi Serge,

 We are now investigating the support status of ocf RAs,
 and this is the issue for named.

 Here is the example output of host command;

 # host www.kame.net
 www.kame.net is an alias for orange.kame.net.
 orange.kame.net has address 203.178.141.194
 orange.kame.net has IPv6 address
 2001:200:dff:fff1:216:3eff:feb1:44d7

 named_monitor() searches its named server with
$OCF_RESKEY_monitor_response.
 I'm not familiar with named's behavior,
 is it possible to set IPv6 to $OCF_RESKEY_monitor_response?
 If $OCF_RESKEY_monitor_response has IPv6 address,
 the following syntax can not hit the result, right?
   
The patch looks OK to me. Serge, can you also ack please?
   
Cheers,
   
Dejan
   
 named_monitor()

 output=`$OCF_RESKEY_host $OCF_RESKEY_monitor_request
$OCF_RESKEY_monitor_ip`
 if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address
 '$OCF_RESKEY_monitor_response

 Would you please give me some advice?

 Regards,
 Junko IKEDA

 NTT DATA INTELLILINK CORPORATION
   
   
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
   
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
   
 
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
 
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] named RA: support IPv6

2012-01-09 Thread Serge Dubrouski
I did a couple of weeks ago :-)
 On Jan 9, 2012 8:00 AM, Dejan Muhamedagic de...@suse.de wrote:

 Hi Junko-san,

 On Tue, Dec 13, 2011 at 04:32:07PM +0900, Junko IKEDA wrote:
  Hi Serge,
 
  We are now investigating the support status of ocf RAs,
  and this is the issue for named.
 
  Here is the example output of host command;
 
  # host www.kame.net
  www.kame.net is an alias for orange.kame.net.
  orange.kame.net has address 203.178.141.194
  orange.kame.net has IPv6 address 2001:200:dff:fff1:216:3eff:feb1:44d7
 
  named_monitor() searches its named server with
 $OCF_RESKEY_monitor_response.
  I'm not familiar with named's behavior,
  is it possible to set IPv6 to $OCF_RESKEY_monitor_response?
  If $OCF_RESKEY_monitor_response has IPv6 address,
  the following syntax can not hit the result, right?

 The patch looks OK to me. Serge, can you also ack please?

 Cheers,

 Dejan

  named_monitor()
 
  output=`$OCF_RESKEY_host $OCF_RESKEY_monitor_request
 $OCF_RESKEY_monitor_ip`
  if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address
  '$OCF_RESKEY_monitor_response
 
  Would you please give me some advice?
 
  Regards,
  Junko IKEDA
 
  NTT DATA INTELLILINK CORPORATION


  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pgsql and streaming replcation

2011-12-12 Thread Serge Dubrouski
On Mon, Dec 12, 2011 at 4:28 AM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hello Serge

 2011/12/12 Serge Dubrouski serge...@gmail.com:
 
 
  On Thu, Dec 8, 2011 at 10:04 PM, Takatoshi MATSUO matsuo@gmail.com
  wrote:
 
  Hello Serge
 
  2011/12/8 Serge Dubrouski serge...@gmail.com:
  
  
   On Mon, Dec 5, 2011 at 9:15 PM, Takatoshi MATSUO 
 matsuo@gmail.com
   wrote:
  
   Hello Serge
  
   Serge Dubrouski serge...@gmail.com:
Hello -
   
Takatoshi MATSUO did a tremendous job on implementing support for
streaming
replication feature in pgsql RA. Also it looks like PostgeSQL 9.1
 has
all
necessary interfaces to successfully implement  Pacemaker's M/S
concept.
So
I think it's time to start discussion on how to merge Takatoshi's
work
into
pgsql RA baseline. Here is the link to Takatoshi's GitHUB if
 somebody
wants
to test his RA:
   
https://github.com/t-matsuo/
   
So far I tested it for backward compatibility in a standard
non-replication
mode  and also tested M/S model and found no real issues. Though it
definitely requires some more polishing and testing.
   
Takatoshi, here are some changes that I want to discuss with you:
   
1. Is it possible to add a check for PostgreSQL version and fail
 with
OCF_ERR_INSTALLED when one tries to start replication on version
 less
than
9.1? A simple cat on PG_VERSION with some analysis would probably
 do.
  
   I'll add a check.
 
  I added a check.
 
 
 https://github.com/t-matsuo/resource-agents/commit/3ab7cfdcce118043cd149b348740e50e7a946eb3
 
  
2. I think that following lines should be moved from pgsql_start to
pgsql_validate_all
   
 535 # Check whether tmpdir is readable by pgdba user
 536 if ! runasowner test -r $OCF_RESKEY_tmpdir; then
 537 ocf_log err Directory $OCF_RESKEY_tmpdir is not
readable
by
$OCF_RESKEY_pgdba
 538 return $OCF_ERR_PERM
 539 fi
  
   Thanks. I think so too.
   I'll fix it.
  
 
  I fixed it and I deleted a check for tmpdir existence
  because the checking for permittion fills the role.
 
 
 https://github.com/t-matsuo/resource-agents/commit/82d4939486bcca429e2deb804d7faf756099bb59
 
 
   On a second thought I'm not sure why we need that parameter and
   directory at
   all. Why not to create rep_mode.conf, PGSQL.lock and xlog.note in
   $OCF_RESKEY_pgdata ? What problems it can create?
  
   One more advantage to do
   it in $OCF_RESKEY_pgdata is an ability to handle more than PostgreSQL
   instance on the same server without a need for additional temp
   directories.
 
  When backup is needed, customers may backup these files and restore it.
  It may cause problems.
  Specially PGSQL.lock causes an error on start.
 
  I think that they should be treated as a separate thing.
  because they are independent from PostgreSQL's data.
 
 
  Ok. Then may be it should default to something like
  ${OCF_RESKEY_pgdata}/temp and RA should create it and set the right
  ownership and permissions if it doesn't exist? And again may be in this
 case
  we don't need that parameter?
 

 I agree that RA creates it and sets the right ownership and permissions.
 But considering backup using tar command, I think it's not better to
 create it
 under $OCF_RESKEY_pgdata.

 According to Filesystem Hierarchy Standard
 it's better to use /var/lib/somewhere.

 http://www.pathname.com/fhs/pub/fhs-2.3.html#VARLIBVARIABLESTATEINFORMATION
 ,

 Then I designed using /var/lig/[RA Name] (=/var/lib/pgsql).
 What do you think?


Ahh, I see now where confusion comes from. /var/lib/pgsql is actually taken
:-) It's used as default home directory in RedHat (at least) for postgres
user in PostgreSQL 8.XX. When they'll start start shipping version 9
they'll probably use it as well. OCF_RESKEY_pg_data  defaults to
/var/lib/pgsql/data.

Then, /var/lib is usually used for non-temporary data, For temporary files
it's probably better to use /var/run or /var/tmp. If you still want to use
/var/lib/pgsql you probably need to use /var/lib/pgsql/temp or so.


 
 
  Incidentally I considered to handle more than PostgreSQL instance.
  Initially I added port number to these filenames, but I deleted it
  to simplify filenames.
 
 
 https://github.com/t-matsuo/resource-agents/commit/b16faf2d797200048dc0fc07a45b6751cf5be190
 
 
   Also I think it would be good if RA was able to take care of adding
   include
   $WHATEVER_DIR/rep_mode.conf in postgresql.conf. It will make the RA
   self
   sustainable. In a current situation admin has add that directive
   manually.
   RA though can something like this in a start function for replication
   mode:
  
   if ! grep -i include $WHATEVER_DIR/rep_mode.conf $OCF_RESKEY_config
   then
echo include $WHATEVER_DIR/rep_mode.conf  $OCF_RESKEY_config
   fi
 
  Sounds good.
 
   Don't know if it makes sense to remove it on stop.
 
  I think it doesn't make sense to remove

Re: [Linux-ha-dev] pgsql and streaming replcation

2011-12-11 Thread Serge Dubrouski
On Thu, Dec 8, 2011 at 10:04 PM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hello Serge

 2011/12/8 Serge Dubrouski serge...@gmail.com:
 
 
  On Mon, Dec 5, 2011 at 9:15 PM, Takatoshi MATSUO matsuo@gmail.com
  wrote:
 
  Hello Serge
 
  Serge Dubrouski serge...@gmail.com:
   Hello -
  
   Takatoshi MATSUO did a tremendous job on implementing support for
   streaming
   replication feature in pgsql RA. Also it looks like PostgeSQL 9.1 has
   all
   necessary interfaces to successfully implement  Pacemaker's M/S
 concept.
   So
   I think it's time to start discussion on how to merge Takatoshi's work
   into
   pgsql RA baseline. Here is the link to Takatoshi's GitHUB if somebody
   wants
   to test his RA:
  
   https://github.com/t-matsuo/
  
   So far I tested it for backward compatibility in a standard
   non-replication
   mode  and also tested M/S model and found no real issues. Though it
   definitely requires some more polishing and testing.
  
   Takatoshi, here are some changes that I want to discuss with you:
  
   1. Is it possible to add a check for PostgreSQL version and fail with
   OCF_ERR_INSTALLED when one tries to start replication on version less
   than
   9.1? A simple cat on PG_VERSION with some analysis would probably do.
 
  I'll add a check.

 I added a check.

 https://github.com/t-matsuo/resource-agents/commit/3ab7cfdcce118043cd149b348740e50e7a946eb3

 
   2. I think that following lines should be moved from pgsql_start to
   pgsql_validate_all
  
535 # Check whether tmpdir is readable by pgdba user
536 if ! runasowner test -r $OCF_RESKEY_tmpdir; then
537 ocf_log err Directory $OCF_RESKEY_tmpdir is not readable
   by
   $OCF_RESKEY_pgdba
538 return $OCF_ERR_PERM
539 fi
 
  Thanks. I think so too.
  I'll fix it.
 

 I fixed it and I deleted a check for tmpdir existence
 because the checking for permittion fills the role.

 https://github.com/t-matsuo/resource-agents/commit/82d4939486bcca429e2deb804d7faf756099bb59


  On a second thought I'm not sure why we need that parameter and
 directory at
  all. Why not to create rep_mode.conf, PGSQL.lock and xlog.note in
  $OCF_RESKEY_pgdata ? What problems it can create?
 
  One more advantage to do
  it in $OCF_RESKEY_pgdata is an ability to handle more than PostgreSQL
  instance on the same server without a need for additional temp
 directories.

 When backup is needed, customers may backup these files and restore it.
 It may cause problems.
 Specially PGSQL.lock causes an error on start.

 I think that they should be treated as a separate thing.
 because they are independent from PostgreSQL's data.


Ok. Then may be it should default to something like
${OCF_RESKEY_pgdata}/temp and RA should create it and set the right
ownership and permissions if it doesn't exist? And again may be in this
case we don't need that parameter?




 Incidentally I considered to handle more than PostgreSQL instance.
 Initially I added port number to these filenames, but I deleted it
 to simplify filenames.

 https://github.com/t-matsuo/resource-agents/commit/b16faf2d797200048dc0fc07a45b6751cf5be190


  Also I think it would be good if RA was able to take care of adding
 include
  $WHATEVER_DIR/rep_mode.conf in postgresql.conf. It will make the RA self
  sustainable. In a current situation admin has add that directive
 manually.
  RA though can something like this in a start function for replication
 mode:
 
  if ! grep -i include $WHATEVER_DIR/rep_mode.conf $OCF_RESKEY_config
  then
   echo include $WHATEVER_DIR/rep_mode.conf  $OCF_RESKEY_config
  fi

 Sounds good.

  Don't know if it makes sense to remove it on stop.

 I think it doesn't make sense to remove it,
 because rep_mode.conf becomes empty on stop.


The only problem here if admin changes temp_dir parameter but doesn't
delete records from postgres.conf. We could end up with several include
records then or with records pointing to the non existing files.



   3. I don't really like this part of pgsql_real_monitor:
  
775 if ! is_replication; then
776 OCF_RESKEY_monitor_sql=`escape_string 
   $OCF_RESKEY_monitor_sql`
777 runasowner -q $loglevel $OCF_RESKEY_psql $psql_options
 -c
'$OCF_RESKEY_monitor_sql'
778 rc=$?
779 else
780 output=`su $OCF_RESKEY_pgdba -c cd $OCF_RESKEY_pgdata; 
   $OCF_RESKEY_psql $psql_options -Atc \${CHECK_MS_SQL}\`
781 rc=$?
782 fi
  
   I think that functional monitor (the one that uses monitor_sql) should
   run
   always independently of DB mode since its primary role is to check
 data
   and
   fail if it's not correct or corrupted. In replication mode there
 should
   be
   additional monitoring. Other way it misleads customer on a usage of
   monitor_sql.
 
  All right.
 
  Does it need to execute select now(); if monitor_sql parameter is
 empty?
  I think it's unnecessary.
 
 
  For a case of an empty parameter you are right

Re: [Linux-ha-dev] pgsql and streaming replcation

2011-12-07 Thread Serge Dubrouski
On Mon, Dec 5, 2011 at 9:15 PM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hello Serge

 Serge Dubrouski serge...@gmail.com:
  Hello -
 
  Takatoshi MATSUO did a tremendous job on implementing support for
 streaming
  replication feature in pgsql RA. Also it looks like PostgeSQL 9.1 has all
  necessary interfaces to successfully implement  Pacemaker's M/S concept.
 So
  I think it's time to start discussion on how to merge Takatoshi's work
 into
  pgsql RA baseline. Here is the link to Takatoshi's GitHUB if somebody
 wants
  to test his RA:
 
  https://github.com/t-matsuo/
 
  So far I tested it for backward compatibility in a standard
 non-replication
  mode  and also tested M/S model and found no real issues. Though it
  definitely requires some more polishing and testing.
 
  Takatoshi, here are some changes that I want to discuss with you:
 
  1. Is it possible to add a check for PostgreSQL version and fail with
  OCF_ERR_INSTALLED when one tries to start replication on version less
 than
  9.1? A simple cat on PG_VERSION with some analysis would probably do.

 I'll add a check.

  2. I think that following lines should be moved from pgsql_start to
  pgsql_validate_all
 
   535 # Check whether tmpdir is readable by pgdba user
   536 if ! runasowner test -r $OCF_RESKEY_tmpdir; then
   537 ocf_log err Directory $OCF_RESKEY_tmpdir is not readable by
  $OCF_RESKEY_pgdba
   538 return $OCF_ERR_PERM
   539 fi

 Thanks. I think so too.
 I'll fix it.


On a second thought I'm not sure why we need that parameter and directory
at all. Why not to create rep_mode.conf, PGSQL.lock and xlog.note in
$OCF_RESKEY_pgdata ? What problems it can create? One more advantage to do
it in $OCF_RESKEY_pgdata is an ability to handle more than PostgreSQL
instance on the same server without a need for additional temp directories.

Also I think it would be good if RA was able to take care of adding
include $WHATEVER_DIR/rep_mode.conf in postgresql.conf. It will make the
RA self sustainable. In a current situation admin has add that directive
manually. RA though can something like this in a start function for
replication mode:

if ! grep -i include $WHATEVER_DIR/rep_mode.conf $OCF_RESKEY_config
then
 echo include $WHATEVER_DIR/rep_mode.conf  $OCF_RESKEY_config
fi

Don't know if it makes sense to remove it on stop.

 3. I don't really like this part of pgsql_real_monitor:
 
   775 if ! is_replication; then
   776 OCF_RESKEY_monitor_sql=`escape_string 
 $OCF_RESKEY_monitor_sql`
   777 runasowner -q $loglevel $OCF_RESKEY_psql $psql_options -c
  '$OCF_RESKEY_monitor_sql'
   778 rc=$?
   779 else
   780 output=`su $OCF_RESKEY_pgdba -c cd $OCF_RESKEY_pgdata; 
 $OCF_RESKEY_psql $psql_options -Atc \${CHECK_MS_SQL}\`
   781 rc=$?
   782 fi
 
  I think that functional monitor (the one that uses monitor_sql) should
 run
  always independently of DB mode since its primary role is to check data
 and
  fail if it's not correct or corrupted. In replication mode there should
 be
  additional monitoring. Other way it misleads customer on a usage of
  monitor_sql.

 All right.

 Does it need to execute select now(); if monitor_sql parameter is empty?
 I think it's unnecessary.


For a case of an empty parameter you are right and running select now()
probably unnecessary, Non-empty monitro_sql shall be executed in my opinion.



  4. You already populate several attributes with crm_attribute. Does it
 make
  sense to populate a name of a Master node in promote function and use it
  later on instead of running crm_mon on each monitor command?
 

 Do you mean that you don't want to use crm_mon in monitor?


I prefer to have as few dependencies  on external programs as possible.
Using crm_attribute to communicate the name of a master node would be
consistent with the rest of the script since you already use it for
communicating state of the nodes. If you prefer using crm_mon than you have
to add a check that that binary exists on the server. In 99.99% of the case
it will but still the check is necessary I think.



  5. It also requires some changes in terms of a proper English but we can
 go
  through it later.

 I hope your help.


Will do.



 
  And yet again thanks for a brilliant work.
 
 Thank you for the compliment.

  Florian, Dejan how would you like to merge a patch when we are ready? The
  patch will be rather big one and AFAIK you have some policy on the
 amount of
  changes for one patch.
 
 
  --
  Serge Dubrouski.

 --
 Regards,
 Takatoshi MATSUO
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pgsql and streaming replcation

2011-12-05 Thread Serge Dubrouski
Chris -

The best way you can help us is to test new RA as much as possible and in
all possible modes. I mean regular and streaming replcation.

On Mon, Dec 5, 2011 at 5:40 AM, Chris Bowlby cbow...@tenthpowertech.comwrote:

 Hi Florian/Serge,

 I can offer any insights into PostgreSQL as well, as I've been actively
 using and support it since 2001/2002.

 On 12/05/2011 04:22 AM, Florian Haas wrote:
  On Sun, Dec 4, 2011 at 11:11 PM, Serge Dubrouskiserge...@gmail.com
  wrote:
  Florian, Dejan how would you like to merge a patch when we are ready?
 The
  patch will be rather big one and AFAIK you have some policy on the
 amount of
  changes for one patch.
  If it's a big addition of functionality, then a big patch is expected.
  However please make sure that you do one patch per functional change.
  Also, don't mix functional changes with cleanup work like fixing
  whitespace, correcting incorrectly advertised resource parameters,
  etc. It's acceptable to mix those in with the same pull request, but
  they should be in separate changesets so we can easily bisect any
  arising issues.
 
  Other than that, I guess Dejan will agree with me that your PostgreSQL
  expertise is way better than his and mine. So if you greenlight the
  feature addition functionally we're unlikely to second-guess you on
  that.
 
  Does this help?
  Cheers,
  Florian
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] pgsql and streaming replcation

2011-12-04 Thread Serge Dubrouski
Hello -

Takatoshi MATSUO did a tremendous job on implementing support for streaming
replication feature in pgsql RA. Also it looks like PostgeSQL 9.1 has all
necessary interfaces to successfully implement  Pacemaker's M/S concept. So
I think it's time to start discussion on how to merge Takatoshi's work into
pgsql RA baseline. Here is the link to Takatoshi's GitHUB if somebody wants
to test his RA:

https://github.com/t-matsuo/

So far I tested it for backward compatibility in a standard non-replication
mode  and also tested M/S model and found no real issues. Though it
definitely requires some more polishing and testing.

Takatoshi, here are some changes that I want to discuss with you:

1. Is it possible to add a check for PostgreSQL version and fail with
OCF_ERR_INSTALLED when one tries to start replication on version less than
9.1? A simple cat on PG_VERSION with some analysis would probably do.

2. I think that following lines should be moved from pgsql_start to
pgsql_validate_all

 535 # Check whether tmpdir is readable by pgdba user
 536 if ! runasowner test -r $OCF_RESKEY_tmpdir; then
 537 ocf_log err Directory $OCF_RESKEY_tmpdir is not readable by
$OCF_RESKEY_pgdba
 538 return $OCF_ERR_PERM
 539 fi

3. I don't really like this part of pgsql_real_monitor:

775 if ! is_replication; then
 776 OCF_RESKEY_monitor_sql=`escape_string
$OCF_RESKEY_monitor_sql`
 777 runasowner -q $loglevel $OCF_RESKEY_psql $psql_options -c
'$OCF_RESKEY_monitor_sql'
 778 rc=$?
 779 else
 780 output=`su $OCF_RESKEY_pgdba -c cd $OCF_RESKEY_pgdata;
$OCF_RESKEY_psql $psql_options -Atc \${CHECK_MS_SQL}\`
 781 rc=$?
 782 fi


I think that functional monitor (the one that uses monitor_sql) should run
always independently of DB mode since its primary role is to check data and
fail if it's not correct or corrupted. In replication mode there should be
additional monitoring. Other way it misleads customer on a usage of
monitor_sql.

4. You already populate several attributes with crm_attribute. Does it make
sense to populate a name of a Master node in promote function and use it
later on instead of running crm_mon on each monitor command?

5. It also requires some changes in terms of a proper English but we can go
through it later.

And yet again thanks for a brilliant work.

Florian, Dejan how would you like to merge a patch when we are ready? The
patch will be rather big one and AFAIK you have some policy on the amount
of changes for one patch.


-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patches for named and pgsql

2011-10-13 Thread Serge Dubrouski
Hello -

Attached are patches for named and pgsql patches that repace == with =
when compare strings.

-- 
Serge Dubrouski.
/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -644,7 +644,7 @@ esac
 pgsql_validate_all
 rc=$?
 
-[ $1 == validate-all ]  exit $rc
+[ $1 = validate-all ]  exit $rc
 
 if [ $rc -ne 0 ]
 then

--- a/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -644,7 +644,7 @@ esac
 pgsql_validate_all
 rc=$?
 
-[ $1 == validate-all ]  exit $rc
+[ $1 = validate-all ]  exit $rc
 
 if [ $rc -ne 0 ]
 then

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] OCFT file for named RA

2011-10-08 Thread Serge Dubrouski
Hello -

Attached is a OCFT file for named RA

-- 
Serge Dubrouski.


named
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for named RA

2011-10-06 Thread Serge Dubrouski
Thanks, Dejan.

I'll create ocft file for this RA a bit later.

On Thu, Oct 6, 2011 at 3:55 AM, Dejan Muhamedagic de...@suse.de wrote:

 On Wed, Oct 05, 2011 at 07:47:13PM -0600, Serge Dubrouski wrote:
  On Wed, Oct 5, 2011 at 4:22 AM, Dejan Muhamedagic de...@suse.de wrote:
 
   Hi Serge,
  
   On Mon, Oct 03, 2011 at 07:53:25PM -0600, Serge Dubrouski wrote:
Hello -
   
Attached is a patch for named RA that fixes stop function and makes
   required
tools configurable OCF parameters. Please apply to git.
  
   There are two issues handled in this patch. It's always good to
   send separate patches for separate problems.
  
 
  I know. Could you split it this time, please? If not, I'll split it. I'm
  just not sure how to better do this, second patch should be applied
 against
  code with first one already applied, right?

 Since the agent is fairly new and hasn't been released yet, we
 can make an exception.

   I think that adding extra rndc and host parameters is an
   overkill. If an installation doesn't have them in the PATH,
   well, they need to fix that. We certainly won't make path for
   every binary configurable.
  
 
  I don't think it's an overkill. Pretty often sysadmin prefer to install
  latest version of BIND compiled for sources and they can install it
 whenever
  they want. Some prefer /opt, some /usr/local some something  else. So
  providing some flexibly around this issue is a good thing  I think and
 I'd
  really like to have it.

 OK.

   If you insist, we can still add them. But I think that the
   defaults should be without paths, i.e. just rndc and host.
  
 
  If one uses standard RPM or DEB or any other distro package they rndc
 will
  be in /usr/sbin and host will bin in /bin.  Having full path specified
 has
  some advantages, like working properly with sudo when one needs to test
 RA
  but doesn't have full root access to the machine and /sbin isn't in the
  $PATH. But yet again it's all arguable and I won't insist if you really
  don't like it.

 OK, we can add it as it is.

 Many thanks for the contribution.

 Cheers,

 Dejan

   Cheers,
  
   Dejan
  
--
Serge Dubrouski.
  
diff --git a/heartbeat/named b/heartbeat/named
index 8d15db6..f9efb92 100755
--- a/heartbeat/named
+++ b/heartbeat/named
@@ -15,12 +15,10 @@
 : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
 . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
   
-# Used binaries
-RNDC=rndc
-HOST=host
-
 #Defaults
 OCF_RESKEY_named_default=/usr/sbin/named
+OCF_RESKEY_rndc_default=/usr/sbin/rndc
+OCF_RESKEY_host_default=/usr/bin/host
 OCF_RESKEY_named_user_default=named
 OCF_RESKEY_named_config_default=
 OCF_RESKEY_named_pidfile_default=/var/run/named/named.pid
@@ -32,6 +30,8 @@ OCF_RESKEY_monitor_response_default=127.0.0.1
 OCF_RESKEY_monitor_ip_default=127.0.0.1
   
 : ${OCF_RESKEY_named=${OCF_RESKEY_named_default}}
+: ${OCF_RESKEY_rndc=${OCF_RESKEY_rndc_default}}
+: ${OCF_RESKEY_host=${OCF_RESKEY_host_default}}
 : ${OCF_RESKEY_named_user=${OCF_RESKEY_named_user_default}}
 : ${OCF_RESKEY_named_config=${OCF_RESKEY_named_config_default}}
 : ${OCF_RESKEY_named_pidfile=${OCF_RESKEY_named_pidfile_default}}
@@ -80,6 +80,22 @@ Path to the named command.
 content type=string default=${OCF_RESKEY_named_default} /
 /parameter
   
+parameter name=rndc unique=0 required=0
+longdesc lang=en
+Path to the rndc command.
+/longdesc
+shortdesc lang=enrndc/shortdesc
+content type=string default=${OCF_RESKEY_rndc_default} /
+/parameter
+
+parameter name=host unique=0 required=0
+longdesc lang=en
+Path to the host command.
+/longdesc
+shortdesc lang=enhost/shortdesc
+content type=string default=${OCF_RESKEY_host_default} /
+/parameter
+
 parameter name=named_user unique=0 required=0
 longdesc lang=en
 User that should own named process.
@@ -187,8 +203,8 @@ EOF
 # Validate most critical parameters
 named_validate_all() {
 check_binary $OCF_RESKEY_named
-check_binary $RNDC
-check_binary $HOST
+check_binary $OCF_RESKEY_rndc
+check_binary $OCF_RESKEY_host
   
 if [ -n $OCF_RESKEY_named_config -a \
 ! -r
 ${OCF_RESKEY_named_rootdir}/${OCF_RESKEY_named_config} ];
   then
@@ -256,7 +272,7 @@ named_monitor() {
 return $OCF_NOT_RUNNING
 fi
   
-output=`$HOST $OCF_RESKEY_monitor_request
 $OCF_RESKEY_monitor_ip`
+output=`$OCF_RESKEY_host $OCF_RESKEY_monitor_request
   $OCF_RESKEY_monitor_ip`
   
 if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address
   '$OCF_RESKEY_monitor_response
 then
@@ -274,7 +290,7 @@ named_monitor() {
 #
   
 named_reload() {
-$RNDC reload /dev/null || return $OCF_ERR_GENERIC
+$OCF_RESKEY_rndc reload /dev/null || return $OCF_ERR_GENERIC
   
 return $OCF_SUCCESS

[Linux-ha-dev] How to use reload action of RA agent?

2011-10-06 Thread Serge Dubrouski
Hello -

How one supposed to use reload action of RA agent it it's supported by RA?
When I try to set up an order like this:

order Reload_After_Start +inf: res1:start res2:reload

neither crm nor cibadmin allow me to define such order.


-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for named RA

2011-10-05 Thread Serge Dubrouski
On Wed, Oct 5, 2011 at 4:22 AM, Dejan Muhamedagic de...@suse.de wrote:

 Hi Serge,

 On Mon, Oct 03, 2011 at 07:53:25PM -0600, Serge Dubrouski wrote:
  Hello -
 
  Attached is a patch for named RA that fixes stop function and makes
 required
  tools configurable OCF parameters. Please apply to git.

 There are two issues handled in this patch. It's always good to
 send separate patches for separate problems.


I know. Could you split it this time, please? If not, I'll split it. I'm
just not sure how to better do this, second patch should be applied against
code with first one already applied, right?



 I think that adding extra rndc and host parameters is an
 overkill. If an installation doesn't have them in the PATH,
 well, they need to fix that. We certainly won't make path for
 every binary configurable.


I don't think it's an overkill. Pretty often sysadmin prefer to install
latest version of BIND compiled for sources and they can install it whenever
they want. Some prefer /opt, some /usr/local some something  else. So
providing some flexibly around this issue is a good thing  I think and I'd
really like to have it.


 If you insist, we can still add them. But I think that the
 defaults should be without paths, i.e. just rndc and host.


If one uses standard RPM or DEB or any other distro package they rndc will
be in /usr/sbin and host will bin in /bin.  Having full path specified has
some advantages, like working properly with sudo when one needs to test RA
but doesn't have full root access to the machine and /sbin isn't in the
$PATH. But yet again it's all arguable and I won't insist if you really
don't like it.


 Cheers,

 Dejan

  --
  Serge Dubrouski.

  diff --git a/heartbeat/named b/heartbeat/named
  index 8d15db6..f9efb92 100755
  --- a/heartbeat/named
  +++ b/heartbeat/named
  @@ -15,12 +15,10 @@
   : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
   . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
 
  -# Used binaries
  -RNDC=rndc
  -HOST=host
  -
   #Defaults
   OCF_RESKEY_named_default=/usr/sbin/named
  +OCF_RESKEY_rndc_default=/usr/sbin/rndc
  +OCF_RESKEY_host_default=/usr/bin/host
   OCF_RESKEY_named_user_default=named
   OCF_RESKEY_named_config_default=
   OCF_RESKEY_named_pidfile_default=/var/run/named/named.pid
  @@ -32,6 +30,8 @@ OCF_RESKEY_monitor_response_default=127.0.0.1
   OCF_RESKEY_monitor_ip_default=127.0.0.1
 
   : ${OCF_RESKEY_named=${OCF_RESKEY_named_default}}
  +: ${OCF_RESKEY_rndc=${OCF_RESKEY_rndc_default}}
  +: ${OCF_RESKEY_host=${OCF_RESKEY_host_default}}
   : ${OCF_RESKEY_named_user=${OCF_RESKEY_named_user_default}}
   : ${OCF_RESKEY_named_config=${OCF_RESKEY_named_config_default}}
   : ${OCF_RESKEY_named_pidfile=${OCF_RESKEY_named_pidfile_default}}
  @@ -80,6 +80,22 @@ Path to the named command.
   content type=string default=${OCF_RESKEY_named_default} /
   /parameter
 
  +parameter name=rndc unique=0 required=0
  +longdesc lang=en
  +Path to the rndc command.
  +/longdesc
  +shortdesc lang=enrndc/shortdesc
  +content type=string default=${OCF_RESKEY_rndc_default} /
  +/parameter
  +
  +parameter name=host unique=0 required=0
  +longdesc lang=en
  +Path to the host command.
  +/longdesc
  +shortdesc lang=enhost/shortdesc
  +content type=string default=${OCF_RESKEY_host_default} /
  +/parameter
  +
   parameter name=named_user unique=0 required=0
   longdesc lang=en
   User that should own named process.
  @@ -187,8 +203,8 @@ EOF
   # Validate most critical parameters
   named_validate_all() {
   check_binary $OCF_RESKEY_named
  -check_binary $RNDC
  -check_binary $HOST
  +check_binary $OCF_RESKEY_rndc
  +check_binary $OCF_RESKEY_host
 
   if [ -n $OCF_RESKEY_named_config -a \
   ! -r ${OCF_RESKEY_named_rootdir}/${OCF_RESKEY_named_config} ];
 then
  @@ -256,7 +272,7 @@ named_monitor() {
   return $OCF_NOT_RUNNING
   fi
 
  -output=`$HOST $OCF_RESKEY_monitor_request $OCF_RESKEY_monitor_ip`
  +output=`$OCF_RESKEY_host $OCF_RESKEY_monitor_request
 $OCF_RESKEY_monitor_ip`
 
   if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address
 '$OCF_RESKEY_monitor_response
   then
  @@ -274,7 +290,7 @@ named_monitor() {
   #
 
   named_reload() {
  -$RNDC reload /dev/null || return $OCF_ERR_GENERIC
  +$OCF_RESKEY_rndc reload /dev/null || return $OCF_ERR_GENERIC
 
   return $OCF_SUCCESS
   }
  @@ -338,33 +354,38 @@ named_start() {
 
   named_stop () {
   local timeout
  +local timewait
 
   named_status || return $OCF_SUCCESS
 
  -if ! $RNDC stop /dev/null; then
  +$OCF_RESKEY_rndc stop /dev/null 
  +if [ $? -ne 0 ]; then
  + ocf_log info rndc stop failed. Killing named.
   kill `cat ${OCF_RESKEY_named_pidfile}`
   fi
 
   if [ -n $OCF_RESKEY_CRM_meta_timeout ]; then
 # Allow 2/3 of the action timeout for the orderly shutdown
 # (The origin unit is ms, hence the conversion)
  -  timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
  +  timewait

[Linux-ha-dev] Patch for named RA

2011-10-03 Thread Serge Dubrouski
Hello -

Attached is a patch for named RA that fixes stop function and makes required
tools configurable OCF parameters. Please apply to git.

-- 
Serge Dubrouski.
diff --git a/heartbeat/named b/heartbeat/named
index 8d15db6..f9efb92 100755
--- a/heartbeat/named
+++ b/heartbeat/named
@@ -15,12 +15,10 @@
 : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
 . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
 
-# Used binaries
-RNDC=rndc
-HOST=host
-
 #Defaults
 OCF_RESKEY_named_default=/usr/sbin/named
+OCF_RESKEY_rndc_default=/usr/sbin/rndc
+OCF_RESKEY_host_default=/usr/bin/host
 OCF_RESKEY_named_user_default=named
 OCF_RESKEY_named_config_default=
 OCF_RESKEY_named_pidfile_default=/var/run/named/named.pid
@@ -32,6 +30,8 @@ OCF_RESKEY_monitor_response_default=127.0.0.1
 OCF_RESKEY_monitor_ip_default=127.0.0.1
 
 : ${OCF_RESKEY_named=${OCF_RESKEY_named_default}}
+: ${OCF_RESKEY_rndc=${OCF_RESKEY_rndc_default}}
+: ${OCF_RESKEY_host=${OCF_RESKEY_host_default}}
 : ${OCF_RESKEY_named_user=${OCF_RESKEY_named_user_default}}
 : ${OCF_RESKEY_named_config=${OCF_RESKEY_named_config_default}}
 : ${OCF_RESKEY_named_pidfile=${OCF_RESKEY_named_pidfile_default}}
@@ -80,6 +80,22 @@ Path to the named command.
 content type=string default=${OCF_RESKEY_named_default} /
 /parameter
 
+parameter name=rndc unique=0 required=0
+longdesc lang=en
+Path to the rndc command.
+/longdesc
+shortdesc lang=enrndc/shortdesc
+content type=string default=${OCF_RESKEY_rndc_default} /
+/parameter
+
+parameter name=host unique=0 required=0
+longdesc lang=en
+Path to the host command.
+/longdesc
+shortdesc lang=enhost/shortdesc
+content type=string default=${OCF_RESKEY_host_default} /
+/parameter
+
 parameter name=named_user unique=0 required=0
 longdesc lang=en
 User that should own named process.
@@ -187,8 +203,8 @@ EOF
 # Validate most critical parameters
 named_validate_all() {
 check_binary $OCF_RESKEY_named
-check_binary $RNDC
-check_binary $HOST
+check_binary $OCF_RESKEY_rndc
+check_binary $OCF_RESKEY_host
 
 if [ -n $OCF_RESKEY_named_config -a \
 ! -r ${OCF_RESKEY_named_rootdir}/${OCF_RESKEY_named_config} ]; then
@@ -256,7 +272,7 @@ named_monitor() {
 return $OCF_NOT_RUNNING
 fi

-output=`$HOST $OCF_RESKEY_monitor_request $OCF_RESKEY_monitor_ip`
+output=`$OCF_RESKEY_host $OCF_RESKEY_monitor_request $OCF_RESKEY_monitor_ip`
 
 if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address '$OCF_RESKEY_monitor_response 
 then
@@ -274,7 +290,7 @@ named_monitor() {
 #
 
 named_reload() {
-$RNDC reload /dev/null || return $OCF_ERR_GENERIC
+$OCF_RESKEY_rndc reload /dev/null || return $OCF_ERR_GENERIC
 
 return $OCF_SUCCESS
 }
@@ -338,33 +354,38 @@ named_start() {
 
 named_stop () {
 local timeout
+local timewait
 
 named_status || return $OCF_SUCCESS
 
-if ! $RNDC stop /dev/null; then
+$OCF_RESKEY_rndc stop /dev/null  
+if [ $? -ne 0 ]; then
+	ocf_log info rndc stop failed. Killing named.
 kill `cat ${OCF_RESKEY_named_pidfile}`
 fi
  
 if [ -n $OCF_RESKEY_CRM_meta_timeout ]; then
   # Allow 2/3 of the action timeout for the orderly shutdown
   # (The origin unit is ms, hence the conversion)
-  timeout=$((OCF_RESKEY_CRM_meta_timeout/1500))
+  timewait=$((OCF_RESKEY_CRM_meta_timeout/1500))
 else
-  timeout=20
+  timewait=20
 fi
- 
+
+sleep 1; timeout=0 # Sleep here for 1 sec to let rndc finish.
 while named_status ; do
-if [ $timeout -ge ${OCF_RESKEY_named_stop_timeout} ]; then
+if [ $timeout -ge $timewait ]; then
 break
 else
 sleep 1
-timeout=$((timeout++))
+timeout=`expr $timeout + 1`
+ocf_log debug named appears to hung, waiting ...
 fi
 done
 
 #If still up
 if named_status 21; then
-ocf_log err named is still up! Killing;
+ocf_log err named is still up! Killing
 kill -9 `cat ${OCF_RESKEY_named_pidfile}`
 fi
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] OCF RA for named

2011-09-19 Thread Serge Dubrouski
On Mon, Sep 19, 2011 at 9:33 AM, Dejan Muhamedagic de...@suse.de wrote:

 On Mon, Sep 19, 2011 at 04:10:50PM +0200, Dejan Muhamedagic wrote:
  On Tue, Sep 06, 2011 at 02:52:47PM -0600, Serge Dubrouski wrote:
   On Thu, Sep 1, 2011 at 6:28 AM, Dejan Muhamedagic de...@suse.de
 wrote:
  
  [...]
 parameter name=monitor_ip unique=0 required=0
 longdesc lang=en
 IP Address where named listens.
 /longdesc
 shortdesc lang=enmonitor_ip/shortdesc
 content type=string default=${OCF_RESKEY_monitor_ip_default}
 /
 /parameter
 /parameters
   
Why not just use localhost? Could there be an instance which
doesn't listen on the lo interface?
   
  
   Disagree. I usually prefer monitor clustered resources through VIPs
 they
   assigned to. Also localhost wouldn't work with the case of several
 instances
   listening on different interfaces.
 
  OK.
 
  [...]
 if [ -z $OCF_RESKEY_monitor_request -o \
  -z $OCF_RESKEY_monitor_response -o \
  -z $OCF_RESKEY_monitor_ip ]; then
 ocf_log err Neither monitor_request, monitor_response or
monitor_ip can be empty
   
ocf_log err Neither monitor_request, monitor_response, nor
 monitor_ip can
be empty
   
(I guess, not a native speaker.)
   
  
   Even after 10 years of living in the US. Next time will check with my
   daughter ;-)
 
  Actually, I was wrong too. Neither can be used just for two
  things, not three or more. In this case, it should be:
 
  ocf_log err None of monitor_request, monitor_response, and monitor_ip
 can
  be empty
 
  I'll fix that before check-in.

 OT:

 Well, it turns out that I wasn't so wrong after all, just need
 to remove the comma:

 ocf_log err Neither monitor_request, monitor_response nor monitor_ip can
 be empty

 But the None variety still sounds better to me.

 Cheers,


Thanks.



 Dejan

  Many thanks for the contribution!
 
  Cheers,
 
  Dejan
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] OCF RA for named

2011-09-06 Thread Serge Dubrouski
On Thu, Sep 1, 2011 at 6:28 AM, Dejan Muhamedagic de...@suse.de wrote:

 Hi Serge,

 On Tue, Jul 12, 2011 at 03:50:36PM -0600, Serge Dubrouski wrote:
  Hello -
 
  I've created an OCF RA for named (BIND) server. There is an existing one
 in
  redhat directory but I don't like how it does monitoring and I doubt that
 it
  can work with pacemaker. So please review the attached RA and see if it
 can
  be included into the project.

 Sorry for the delay. The RA looks quite good, some comments
 below.

 Cheers,

 Dejan

  #!/bin/sh
  #
  # Description:  Manages a named (Bind) server as an OCF High-Availability
  #   resource
  #
  # Authors:  Serge Dubrouski (serge...@gmail.com)
  #
  # Copyright:2011 Serge Dubrouski serge...@gmail.com
  #
  # License:  GNU General Public License (GPL)
  #
 
 ###
  # Initialization:
 
  : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
  . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
 
  # Used binaries
  RNDC=/usr/sbin/rndc
  HOST=/usr/bin/host
  PIDOF=/sbin/pidof

 How about relying on PATH? The RA should have a sane environment
 and these are all standard locations. Packagers may have
 different ideas which may lead to problems.


Done.



  #Defaults
  OCF_RESKEY_named_default=/usr/sbin/named
  OCF_RESKEY_named_user_default=named
  OCF_RESKEY_named_config_default=/etc/named.conf
  OCF_RESKEY_named_pidfile_default=/var/run/named/named.pid
  OCF_RESKEY_named_rootdir_default=
  OCF_RESKEY_named_options_default=
  OCF_RESKEY_named_keytab_file_default=
  OCF_RESKEY_named_stop_timeout_default=25
  OCF_RESKEY_monitor_request_default=localhost
  OCF_RESKEY_monitor_response_default=127.0.0.1
  OCF_RESKEY_monitor_ip_default=127.0.0.1
 
  : ${OCF_RESKEY_named=${OCF_RESKEY_named_default}}
  : ${OCF_RESKEY_named_user=${OCF_RESKEY_named_user_default}}
  : ${OCF_RESKEY_named_config=${OCF_RESKEY_named_config_default}}
  : ${OCF_RESKEY_named_pidfile=${OCF_RESKEY_named_pidfile_default}}
  : ${OCF_RESKEY_named_rootdir=${OCF_RESKEY_named_rootdir_default}}
  : ${OCF_RESKEY_named_options=${OCF_RESKEY_named_options_default}}
  : ${OCF_RESKEY_named_keytab_file=${OCF_RESKEY_named_keytab_file_default}}
  :
 ${OCF_RESKEY_named_stop_timeout=${OCF_RESKEY_named_stop_timeout_default}}
  : ${OCF_RESKEY_monitor_request=${OCF_RESKEY_monitor_request_default}}
  : ${OCF_RESKEY_monitor_response=${OCF_RESKEY_monitor_response_default}}
  : ${OCF_RESKEY_monitor_ip=${OCF_RESKEY_monitor_ip_default}}
 
  usage() {
  cat EOF
  usage: $0
 start|stop|reload|status|monitor|meta-data|validate-all|methods
 
  $0 manages named (Bind) server as an HA resource.
 
  The 'start' operation starts named server.
  The 'stop' operation stops  named server.
  The 'reload' operation reload named configuration.
  The 'status' operation reports whether named is up.
  The 'monitor' operation reports whether named is running.
  The 'validate-all' operation reports whether parameters are
 valid.
  The 'methods' operation reports on the methods $0 supports.
  EOF
return $OCF_ERR_ARGS
  }
 
  named_meta_data() {
  cat EOF
  ?xml version=1.0?
  !DOCTYPE resource-agent SYSTEM ra-api-1.dtd
  resource-agent name=named
  version1.0/version
 
  longdesc lang=en
  Resource script for named (Bind) server. It manages named as an HA
 resource.
  /longdesc
  shortdesc lang=enManages a named server/shortdesc
 
  parameters
  parameter name=named unique=0 required=0
  longdesc lang=en
  Path to the named command.
  /longdesc
  shortdesc lang=ennamed/shortdesc
  content type=string default=${OCF_RESKEY_named_default} /
  /parameter
 
  parameter name=named_user unique=0 required=0
  longdesc lang=en
  User that should own named process.
  /longdesc
  shortdesc lang=ennamed_user/shortdesc
  content type=string default=${OCF_RESKEY_named_user_default} /
  /parameter
 
  parameter name=named_config unique=0 required=0
  longdesc lang=en
  Configuration file for named.
  /longdesc
  shortdesc lang=ennamed_config/shortdesc
  content type=string default=${OCF_RESKEY_named_config_default} /
  /parameter

 This one should be unique.


Done.



  parameter name=named_pidfile unique=0 required=0
  longdesc lang=en
  PIDFILE file for named.
  /longdesc
  shortdesc lang=ennamed_pidfile/shortdesc
  content type=string default=${OCF_RESKEY_named_pidfile_default} /
  /parameter

 This one too.


Done.



  parameter name=named_rootdir unique=0 required=0
  longdesc lang=en
  Directory that named should use for chroot if any.
  /longdesc
  shortdesc lang=ennamed_rootdir/shortdesc
  content type=string default=${OCF_RESKEY_named_rootdir_default} /
  /parameter

 This one also? Or do different instances share chroot?


Made it unique too.


  parameter name=named_options unique=0 required=0
  longdesc lang=en
  Options for named process if any.
  /longdesc
  shortdesc lang=ennamed_options

Re: [Linux-ha-dev] OCF RA for named

2011-08-18 Thread Serge Dubrouski
On Wed, Aug 17, 2011 at 12:39 PM, Lars Ellenberg
lars.ellenb...@linbit.comwrote:

 On Tue, Aug 16, 2011 at 08:51:04AM -0600, Serge Dubrouski wrote:
  On Tue, Aug 16, 2011 at 8:44 AM, Dejan Muhamedagic de...@suse.de
 wrote:
 
   Hi Serge,
  
   On Fri, Aug 05, 2011 at 08:19:52AM -0600, Serge Dubrouski wrote:
No interest?
  
   Probably not true :) It's just that recently I've been away for
   a while and in between really swamped with my daily work. I'm
   trying to catch up now, but it may take a while.
  
   In the meantime, I'd like to ask you about the motivation. DNS
   already has a sort of redundancy built in through its
   primary/secondary servers.
  
 
  That redundancy doesn't work quite well. Yes you can have primary and
  secondary servers configured in resolv.conf but if primary is down
 resolver
  waits till request times out for the primary server till it sends a
 request
  to the secondary one. The dealy can be up to 30 seconds and impacts some
  applications pretty badly, This is standard behaviour for Linux, Solaris
 for
  example works differently and isn't impacted by this issue. Works around
 are
  having caching DNS server working locally or having primary DNS server
  highly available with using Pacemaker :-)
 
  Here is what man page for resolv.conf says:
 
   nameserver Name server IP address
  Internet  address  (in  dot  notation) of a name server that the
  resolver  should  query.   Up  to  MAXNS   (currently   3, see
  resolv.h)  name  servers  may  be listed, one per keyword.  If
  there are multiple servers, the resolver library queries them in
  the  order  listed.   If  no nameserver entries are present, the
  default is to use the name server on the  local  machine.  *(The
  algorithm  used  is to try a name server, and if the query times
  out, try the next, until out of name servers, then repeat trying
  all  the  name  servers  until  a  maximum number of retries are
  made.)*

 options timeout:2 attempts:5 rotate


Right, once can do this. But even with this it would take additional 10
seconds for requests  sent to the server that's down before they timeout. In
production environment that's absolutely unacceptable.


 but yes, it is still a valid use case to have a clustered primary name
 server,
 and possibly multiple backups.


And that's why I cerated this RA :-)



 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] OCF RA for named

2011-08-16 Thread Serge Dubrouski
On Tue, Aug 16, 2011 at 8:44 AM, Dejan Muhamedagic de...@suse.de wrote:

 Hi Serge,

 On Fri, Aug 05, 2011 at 08:19:52AM -0600, Serge Dubrouski wrote:
  No interest?

 Probably not true :) It's just that recently I've been away for
 a while and in between really swamped with my daily work. I'm
 trying to catch up now, but it may take a while.

 In the meantime, I'd like to ask you about the motivation. DNS
 already has a sort of redundancy built in through its
 primary/secondary servers.


That redundancy doesn't work quite well. Yes you can have primary and
secondary servers configured in resolv.conf but if primary is down resolver
waits till request times out for the primary server till it sends a request
to the secondary one. The dealy can be up to 30 seconds and impacts some
applications pretty badly, This is standard behaviour for Linux, Solaris for
example works differently and isn't impacted by this issue. Works around are
having caching DNS server working locally or having primary DNS server
highly available with using Pacemaker :-)

Here is what man page for resolv.conf says:

   nameserver Name server IP address
  Internet  address  (in  dot  notation) of a name server that
the
  resolver  should  query.   Up  to  MAXNS   (currently   3,
see
  resolv.h)  name  servers  may  be listed, one per keyword.
If
  there are multiple servers, the resolver library queries them
in
  the  order  listed.   If  no nameserver entries are present,
the
  default is to use the name server on the  local  machine.  *(The
  algorithm  used  is to try a name server, and if the query
times
  out, try the next, until out of name servers, then repeat
trying
  all  the  name  servers  until  a  maximum number of retries
are
  made.)*


 Cheers,

 Dejan

  On Tue, Jul 12, 2011 at 3:50 PM, Serge Dubrouski serge...@gmail.com
 wrote:
 
   Hello -
  
   I've created an OCF RA for named (BIND) server. There is an existing
 one in
   redhat directory but I don't like how it does monitoring and I doubt
 that it
   can work with pacemaker. So please review the attached RA and see if it
 can
   be included into the project.
  
  
   --
   Serge Dubrouski.
  
 
 
 
  --
  Serge Dubrouski.

  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] OCF RA for named

2011-08-05 Thread Serge Dubrouski
No interest?

On Tue, Jul 12, 2011 at 3:50 PM, Serge Dubrouski serge...@gmail.com wrote:

 Hello -

 I've created an OCF RA for named (BIND) server. There is an existing one in
 redhat directory but I don't like how it does monitoring and I doubt that it
 can work with pacemaker. So please review the attached RA and see if it can
 be included into the project.


 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] OCF RA for named

2011-08-05 Thread Serge Dubrouski
No interest?

On Tue, Jul 12, 2011 at 3:50 PM, Serge Dubrouski serge...@gmail.com wrote:

 Hello -

 I've created an OCF RA for named (BIND) server. There is an existing one in
 redhat directory but I don't like how it does monitoring and I doubt that it
 can work with pacemaker. So please review the attached RA and see if it can
 be included into the project.


 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] [PATCH] fix config parameter type for pgsql

2011-07-22 Thread Serge Dubrouski
Thanks for catching this! Could somebody apply it?

On Thu, Jul 21, 2011 at 7:43 PM, Takatoshi MATSUO matsuo@gmail.comwrote:

 Hi

 I found a slight bug for pgsql.
 You know,  type of config parameter is not integer but string.


 Best Regard,
 Takatoshi MATSUO

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pacemaker - migrate RA, based on the state of other RA, w/o clone?

2011-07-14 Thread Serge Dubrouski
On Thu, Jul 14, 2011 at 5:28 AM, Florian Haas florian.h...@linbit.comwrote:

 On 2011-07-14 12:55, RNZ wrote:
 
 
  On Thu, Jul 14, 2011 at 2:02 PM, Florian Haas florian.h...@linbit.com
  mailto:florian.h...@linbit.com wrote:
 
  On 2011-07-14 08:46, RNZ wrote:
   No, I want and I need - multi-master scheme (more then two
 nodes)...
 
  There is nothing in Pacemaker's master/slave scheme that restricts
 you
  to a single master. The ocf:linbit:drbd resource agent, for example,
 is
  configurable in dual-Master mode.
 
  Once the resource agent properly implements the functionality (the
 hard
  part), configuring a multi-master master/slave set is simply a
 question
  of setting the master-max meta parameter to a value greater than 1
 (the
  easy part).
 
  I don't think so... Couchdb RESTful API very easy allow running
  repliacate by next scheme:

 It's entirely possible that the couchdb native API may be more powerful
 in specific regards, but if you want to put it into a Pacemaker cluster
 you may have to occasionally accept some minor limitations. That's a
 tradeoff which is present for all Pacemaker managed applications.

  primitive cdb0
  hostA: hostB:dbB  localhost:dbB
  hostA: hostC:dbC  localhost:dbC
  hostA: hostD:dbD  localhost:dbD
  primitive cdb1
  hostB: hostA:dbB  localhost:dbB
  primitive cdb2
  hostC: hostA:dbC  localhost:dbC
 
  In this scheme hostA used as master for hostB and hostC (master-master)
  and as slave for hostD (slave-master). Both (master-master and
  slave-master for different servers/databases) scheme per one instance.

 So you mean there would be a cascading replication, like so:

 hostD
   |
 hostA
 /   \
 hostB   hostC

 Such a thing is not something Pacemaker caters for specifically, but I
 dare say it doesn't need to, either. You would simply create one
 master/slave set where D is master and A is slave, and another where A
 is master and B and C are slaves.


Wouldn't such configuration mean  running 2 instances of a resource on
nodeA? I doubt that that would be a right solution.


  By the way, is there any specific reason you are contributing under a
  pseudonym? It's highly unusual in this community to do so.
 
 
  Sorry, habit... My real name Alibek.Amaev, alibek.am...@gmail.com
  mailto:alibek.am...@gmail.com or alibe...@gmail.com
  mailto:alibe...@gmail.com

 Pleased to meet you Alibek, welcome to the tribe. :)

 Cheers,
 Florian


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pacemaker - migrate RA, based on the state of other RA, w/o clone?

2011-07-14 Thread Serge Dubrouski
On Thu, Jul 14, 2011 at 7:11 AM, RNZ renoi...@gmail.com wrote:



 On Thu, Jul 14, 2011 at 4:46 PM, Serge Dubrouski serge...@gmail.comwrote:



 On Thu, Jul 14, 2011 at 5:28 AM, Florian Haas florian.h...@linbit.comwrote:

 On 2011-07-14 12:55, RNZ wrote:
 
 
  On Thu, Jul 14, 2011 at 2:02 PM, Florian Haas florian.h...@linbit.com
  mailto:florian.h...@linbit.com wrote:
 
  On 2011-07-14 08:46, RNZ wrote:
   No, I want and I need - multi-master scheme (more then two
 nodes)...
 
  There is nothing in Pacemaker's master/slave scheme that restricts
 you
  to a single master. The ocf:linbit:drbd resource agent, for
 example, is
  configurable in dual-Master mode.
 
  Once the resource agent properly implements the functionality (the
 hard
  part), configuring a multi-master master/slave set is simply a
 question
  of setting the master-max meta parameter to a value greater than 1
 (the
  easy part).
 
  I don't think so... Couchdb RESTful API very easy allow running
  repliacate by next scheme:

 It's entirely possible that the couchdb native API may be more powerful
 in specific regards, but if you want to put it into a Pacemaker cluster
 you may have to occasionally accept some minor limitations. That's a
 tradeoff which is present for all Pacemaker managed applications.

  primitive cdb0
  hostA: hostB:dbB  localhost:dbB
  hostA: hostC:dbC  localhost:dbC
  hostA: hostD:dbD  localhost:dbD
  primitive cdb1
  hostB: hostA:dbB  localhost:dbB
  primitive cdb2
  hostC: hostA:dbC  localhost:dbC
 
  In this scheme hostA used as master for hostB and hostC (master-master)
  and as slave for hostD (slave-master). Both (master-master and
  slave-master for different servers/databases) scheme per one instance.

 So you mean there would be a cascading replication, like so:

 hostD
   |
 hostA
 /   \
 hostB   hostC

 Such a thing is not something Pacemaker caters for specifically, but I
 dare say it doesn't need to, either. You would simply create one
 master/slave set where D is master and A is slave, and another where A
 is master and B and C are slaves.


 Wouldn't such configuration mean  running 2 instances of a resource on
 nodeA? I doubt that that would be a right solution.


 No. Example present at end of file RA
 https://github.com/rnz/resource-agents/blob/master/heartbeat/couchdb


Sorry, it was a question to Florian about his vision how to implemented
cascading replication in the Pacemaker master/slave model. Cascading
replication can exist in OpenLDAP for example and there one node can be a
slave to another and yet be a master for the third one. I currently do no
see how that can be described in Pacemaker configuration even if OpenLDAP
master/slave RA existed.


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pacemaker - migrate RA, based on the state of other RA, w/o clone?

2011-07-14 Thread Serge Dubrouski
On Thu, Jul 14, 2011 at 5:50 AM, RNZ renoi...@gmail.com wrote:



 On Thu, Jul 14, 2011 at 3:28 PM, Florian Haas florian.h...@linbit.comwrote:

 On 2011-07-14 12:55, RNZ wrote:
 
 
  On Thu, Jul 14, 2011 at 2:02 PM, Florian Haas florian.h...@linbit.com
  mailto:florian.h...@linbit.com wrote:
 
  On 2011-07-14 08:46, RNZ wrote:
   No, I want and I need - multi-master scheme (more then two
 nodes)...
 
  There is nothing in Pacemaker's master/slave scheme that restricts
 you
  to a single master. The ocf:linbit:drbd resource agent, for example,
 is
  configurable in dual-Master mode.
 
  Once the resource agent properly implements the functionality (the
 hard
  part), configuring a multi-master master/slave set is simply a
 question
  of setting the master-max meta parameter to a value greater than 1
 (the
  easy part).
 
  I don't think so... Couchdb RESTful API very easy allow running
  repliacate by next scheme:

 It's entirely possible that the couchdb native API may be more powerful
 in specific regards, but if you want to put it into a Pacemaker cluster
 you may have to occasionally accept some minor limitations. That's a
 tradeoff which is present for all Pacemaker managed applications.

 I understand that. But I don't understand why pacemaker not allow to change
 resource location, based on other resource state? It's like a self-evident
 functional... Did not it?


Why it does you can use collocation or groups. If you can somehow separate
one instance of you CouchDB from other (Andrew suggested a master role for
example) you can tie up vIP to that instance with a collocation constraint.
Other way if you uniquely identify all of your instances than again you can
collocate you vIP with one of them.




  primitive cdb0
  hostA: hostB:dbB  localhost:dbB
  hostA: hostC:dbC  localhost:dbC
  hostA: hostD:dbD  localhost:dbD
  primitive cdb1
  hostB: hostA:dbB  localhost:dbB
  primitive cdb2
  hostC: hostA:dbC  localhost:dbC
 
  In this scheme hostA used as master for hostB and hostC (master-master)
  and as slave for hostD (slave-master). Both (master-master and
  slave-master for different servers/databases) scheme per one instance.

 So you mean there would be a cascading replication, like so:

 hostD
   |
 hostA
 /   \
 hostB   hostC

 Such a thing is not something Pacemaker caters for specifically, but I
 dare say it doesn't need to, either. You would simply create one
 master/slave set where D is master and A is slave, and another where A
 is master and B and C are slaves.


 Florian, no this example scheme not cascade, hostA usead as slave for hostD
 and as master/slave from hostB and hostC
 hostB --- hostA --- hostC
 ^
 |
 hostC---

  By the way, is there any specific reason you are contributing under a
  pseudonym? It's highly unusual in this community to do so.
 
 
 Pleased to meet you Alibek, welcome to the tribe. :)


 Thank you Florian, me too peased to meet you! 8)
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] pacemaker - migrate RA, based on the state of other RA, w/o clone?

2011-07-14 Thread Serge Dubrouski
On Thu, Jul 14, 2011 at 9:02 AM, RNZ renoi...@gmail.com wrote:



 On Thu, Jul 14, 2011 at 6:42 PM, Serge Dubrouski serge...@gmail.comwrote:



 On Thu, Jul 14, 2011 at 5:50 AM, RNZ renoi...@gmail.com wrote:



 On Thu, Jul 14, 2011 at 3:28 PM, Florian Haas 
 florian.h...@linbit.comwrote:

 On 2011-07-14 12:55, RNZ wrote:
 
 
  On Thu, Jul 14, 2011 at 2:02 PM, Florian Haas 
 florian.h...@linbit.com
  mailto:florian.h...@linbit.com wrote:
 
  On 2011-07-14 08:46, RNZ wrote:
   No, I want and I need - multi-master scheme (more then two
 nodes)...
 
  There is nothing in Pacemaker's master/slave scheme that restricts
 you
  to a single master. The ocf:linbit:drbd resource agent, for
 example, is
  configurable in dual-Master mode.
 
  Once the resource agent properly implements the functionality (the
 hard
  part), configuring a multi-master master/slave set is simply a
 question
  of setting the master-max meta parameter to a value greater than 1
 (the
  easy part).
 
  I don't think so... Couchdb RESTful API very easy allow running
  repliacate by next scheme:

 It's entirely possible that the couchdb native API may be more powerful
 in specific regards, but if you want to put it into a Pacemaker cluster
 you may have to occasionally accept some minor limitations. That's a
 tradeoff which is present for all Pacemaker managed applications.

 I understand that. But I don't understand why pacemaker not allow to
 change resource location, based on other resource state? It's like a
 self-evident functional... Did not it?


 Why it does you can use collocation or groups. If you can somehow separate
 one instance of you CouchDB from other (Andrew suggested a master role for
 example) you can tie up vIP to that instance with a collocation constraint.
 Other way if you uniquely identify all of your instances than again you can
 collocate you vIP with one of them.

 Collocation need clone, and potential my be split-brain (couchdb-1 on more
 then one nodes)...


Ahh! I didn't notice that you don't want to your CouchDB resources to fail
over to the other nodes ever. Quite none standard configuration. Your
approach should probably work though.

Group - now it use in situation if couchdb-1 - fail/stop?

 I think more natural be use example of next rule for location:

 node vub001
 node vub002
 primitive couchdb-1 ocf:heartbeat:couchdb \
 ...
 location vIP-L1 \
  rule 100: #uname vub001 \
  rule inf: #ra-state couchdb-1 eq 0 \
 location vIP-L2 \
  rule 10: #uname vub002 \
  rule inf: #ra-state couchdb-2 eq 0 \



 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] HA for postgresql

2011-07-13 Thread Serge Dubrouski
On Wed, Jul 13, 2011 at 5:08 AM, Sanjay Rao
s...@noida.interrasystems.comwrote:

 Hi All,

 I am using postgres 9.0 with streaming replication(master/standby). I
 want to use pacemaker/corosync to create a trigger file(dummy file) to
 indicate the standby to take over.

 how can I execute a script on standby server whenever a failure happens ?


Currently pacemaker (or at least pgsql RA) doesn't support PostgreSQL
replication. So basically you need to write your own script that will
monitor primary instance and create a trigger  file on the standby node when
primary fails.


 Regards,
 Sanjay Rao


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] HA for postgresql

2011-06-27 Thread Serge Dubrouski
http://www.clusterlabs.org/wiki/Documentation#Howtos

There are 2 How-To guides.

On Sun, Jun 26, 2011 at 11:55 PM, Sanjay Rao
s...@noida.interrasystems.comwrote:

 Hi,

 I am new to linux and Linux-HA. Here is my problem :

 I want to implement HA between my two postgresql servers working as
 master and standby servers. My main motto behind this implementation is
 faiover not load balancing. Does anybody have any idea how to
 configure HA in this case. Any hint/web link would be good for start.
 Thanks in advance.
 Regards,
 Sanjay Rao



 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pgsql RA error in 3.9.1

2011-06-21 Thread Serge Dubrouski
Hello Dejan -

Yes, and here is one more patch to replace that exit with return and fix a
mistypo.

On Tue, Jun 21, 2011 at 6:29 AM, Dejan Muhamedagic deja...@fastmail.fmwrote:

 Hi Serge,

 On Mon, Jun 20, 2011 at 02:37:24PM -0600, Serge Dubrouski wrote:
  Sorry.

 Is this the proper fix now?

 Cheers,

 Dejan

  On Mon, Jun 20, 2011 at 2:22 PM, Vadym Chepkov vchep...@gmail.com
 wrote:
 
  
   On Jun 20, 2011, at 3:55 PM, Serge Dubrouski wrote:
  
Patch is attached.
  
  
   Your patch is damaged, it has lines cut short.
  
   Vadym
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
  
 
 
 
  --
  Serge Dubrouski.

  --- a/heartbeat/pgsql
  +++ b/heartbeat/pgsql
  @@ -540,9 +540,13 @@ pgsql_validate_all() {
   return $OCF_ERR_INSTALLED;
   fi
 
  -if ! runasowner test -w $OCF_RESKEY_pgdata; then
  -ocf_log err Directory $OCF_RESKEY_pgdata is not writable by
 $OCF_RESKEY_pgdba
  -exit $OCF_ERR_PERM;
  +if ocf_is_probe; then
  +ocf_log info Don't check $OCF_RESKEY_pgdata during probe
  +else
  +if ! runasowner test -w $OCF_RESKEY_pgdata; then
  +ocf_log err Directory $OCF_RESKEY_pgdata is not writable by
 $OCF_RESKEY_pgdba
  +exit $OCF_ERR_PERM;
  +fi
   fi
 
   if [ -n $OCF_RESKEY_monitor_user -a ! -n
 $OCF_RESKEY_monitor_password ]

  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
diff --git a/heartbeat/pgsql b/heartbeat/pgsql
index f9a882b..eb51c61 100755
--- a/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -536,7 +536,7 @@ pgsql_validate_all() {
 
 getent passwd $OCF_RESKEY_pgdba /dev/null 21
 if [ ! $? -eq 0 ]; then
-ocf_log err User $OCF_RESKEY_pgdba doesn't exit;
+ocf_log err User $OCF_RESKEY_pgdba doesn't exist;
 return $OCF_ERR_INSTALLED;
 fi
 
@@ -545,7 +545,7 @@ pgsql_validate_all() {
 else
 if ! runasowner test -w $OCF_RESKEY_pgdata; then
 ocf_log err Directory $OCF_RESKEY_pgdata is not writable by $OCF_RESKEY_pgdba
-exit $OCF_ERR_PERM;
+return $OCF_ERR_PERM;
 fi
 fi
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] pgsql RA error in 3.9.1

2011-06-20 Thread Serge Dubrouski
On Mon, Jun 20, 2011 at 10:09 AM, Florian Haas florian.h...@linbit.comwrote:

 Ouch, that hurts. Vadym, thanks for reporting this. Serge, can you fix
 pgsql_monitor so it behaves correctly during probes nodes where the
 resource is expected to be inactive? I'd be happy to merge.

 Cheers,
 Florian

 On 2011-06-20 18:01, Vadym Chepkov wrote:
  Hi,
 
  new psql RA in resources-agents-3.9.1 doesn't handle shared storage
 configuration properly:
 
  Jun 20 15:37:00 m52 pgsql[20851]: INFO: Configuration file
 /master/sql/data/postgresql.conf not readable during probe.
  Jun 20 15:37:00 m52 pgsql[20851]: ERROR: bash: line 0: cd:
 /master/sql/data: No such file or directory
  Jun 20 15:37:00 m52 pgsql[20851]: ERROR: Directory /master/sql/data is
 not writable by postgres


You don't have mount point available on slave? You don;t need real data, but
mount point should exist.


  Jun 20 15:37:00 m52 lrmd: [2407]: WARN: Managed sql_master:monitor
 process 20851 exited with return code 4.
  Jun 20 15:37:00 m52 lrmd: [2407]: info: operation monitor[2688] on
 sql_master for client 2410: pid 20851 exited with return code 4
 
  Since this is on a slave/inactive node, config and pgdata are missing and
 it is normal situation, so monitor operation should not fail.

 
  # crm_mon -1f|tail -2
  Failed actions:
  sql_master_monitor_0 (node=m52, call=2688, rc=4, status=complete):
 insufficient privileges
 
  Cheers,
  Vadym




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pgsql RA error in 3.9.1

2011-06-20 Thread Serge Dubrouski
On Mon, Jun 20, 2011 at 10:14 AM, Florian Haas florian.h...@linbit.comwrote:

 On 2011-06-20 18:11, Serge Dubrouski wrote:
 
 
  On Mon, Jun 20, 2011 at 10:09 AM, Florian Haas florian.h...@linbit.com
  mailto:florian.h...@linbit.com wrote:
 
  Ouch, that hurts. Vadym, thanks for reporting this. Serge, can you
 fix
  pgsql_monitor so it behaves correctly during probes nodes where the
  resource is expected to be inactive? I'd be happy to merge.
 
  Cheers,
  Florian
 
  On 2011-06-20 18:01, Vadym Chepkov wrote:
   Hi,
  
   new psql RA in resources-agents-3.9.1 doesn't handle shared
  storage configuration properly:
  
   Jun 20 15:37:00 m52 pgsql[20851]: INFO: Configuration file
  /master/sql/data/postgresql.conf not readable during probe.
   Jun 20 15:37:00 m52 pgsql[20851]: ERROR: bash: line 0: cd:
  /master/sql/data: No such file or directory
   Jun 20 15:37:00 m52 pgsql[20851]: ERROR: Directory
  /master/sql/data is not writable by postgres
 
 
  You don't have mount point available on slave? You don;t need real data,
  but mount point should exist.

 Wow, that was quick. Nice. :)

 Well suppose all of /master is on DRBD (or shared storage), and it's a
 subdirectory of that mount point that PostgreSQL uses. So on the
 inactive node /master would be there, but /master/sql/data would not.
 That's a legitimate use case, in my opinion, and the RA should behave
 nicely in that case. Should it not?


Hmm, didn't think about this case. Ok, I'll fix it.

BTW, this behaviour wasn't introduced with the latest changes, it was always
there,



 Cheers,
 Florian


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pgsql RA error in 3.9.1

2011-06-20 Thread Serge Dubrouski
On Mon, Jun 20, 2011 at 11:06 AM, Vadym Chepkov vchep...@gmail.com wrote:


 On Jun 20, 2011, at 12:33 PM, Florian Haas wrote:

  On 2011-06-20 18:17, Serge Dubrouski wrote:
  Well suppose all of /master is on DRBD (or shared storage), and it's a
  subdirectory of that mount point that PostgreSQL uses. So on the
  inactive node /master would be there, but /master/sql/data would not.
  That's a legitimate use case, in my opinion, and the RA should behave
  nicely in that case. Should it not?
 
 
  Hmm, didn't think about this case. Ok, I'll fix it.
 
  BTW, this behaviour wasn't introduced with the latest changes, it was
 always
  there,
 
  Then double kudos to Vadym for spotting it now. :)
 

 It definitely didn't happen with the previous version I had - 1.0.4 and I
 didn't change cluster configuration.

 I think it comes from this addition to pgsql_validate_all


 +if ! runasowner test -w $OCF_RESKEY_pgdata; then
 +ocf_log err Directory $OCF_RESKEY_pgdata is not writable by
 $OCF_RESKEY_pgdba
 +exit $OCF_ERR_PERM;
 fi


Previous RPM for RedHat could be very old. But anyway I'll fix this bug
later today.



 which is called for each single operation, not just for validate-all.


 Vadym




 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] pgsql RA error in 3.9.1

2011-06-20 Thread Serge Dubrouski
Patch is attached.

I wasn't able to create a pull request since there is already one existing
and I don't know how to delete it.

On Mon, Jun 20, 2011 at 11:24 AM, Serge Dubrouski serge...@gmail.comwrote:



 On Mon, Jun 20, 2011 at 11:06 AM, Vadym Chepkov vchep...@gmail.comwrote:


 On Jun 20, 2011, at 12:33 PM, Florian Haas wrote:

  On 2011-06-20 18:17, Serge Dubrouski wrote:
  Well suppose all of /master is on DRBD (or shared storage), and it's a
  subdirectory of that mount point that PostgreSQL uses. So on the
  inactive node /master would be there, but /master/sql/data would not.
  That's a legitimate use case, in my opinion, and the RA should behave
  nicely in that case. Should it not?
 
 
  Hmm, didn't think about this case. Ok, I'll fix it.
 
  BTW, this behaviour wasn't introduced with the latest changes, it was
 always
  there,
 
  Then double kudos to Vadym for spotting it now. :)
 

 It definitely didn't happen with the previous version I had - 1.0.4 and I
 didn't change cluster configuration.

 I think it comes from this addition to pgsql_validate_all


 +if ! runasowner test -w $OCF_RESKEY_pgdata; then
 +ocf_log err Directory $OCF_RESKEY_pgdata is not writable by
 $OCF_RESKEY_pgdba
 +exit $OCF_ERR_PERM;
 fi


 Previous RPM for RedHat could be very old. But anyway I'll fix this bug
 later today.



 which is called for each single operation, not just for validate-all.


 Vadym




 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
--- a/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -540,9 +540,13 @@ pgsql_validate_all() {
 return $OCF_ERR_INSTALLED;
 fi
 
-if ! runasowner test -w $OCF_RESKEY_pgdata; then
-ocf_log err Directory $OCF_RESKEY_pgdata is not writable by $OCF_RESKE
-exit $OCF_ERR_PERM;
+if ocf_is_probe; then
+ocf_log info Don't check $OCF_RESKEY_pgdata during probe
+else
+if ! runasowner test -w $OCF_RESKEY_pgdata; then
+ocf_log err Directory $OCF_RESKEY_pgdata is not writable by $OCF_R
+exit $OCF_ERR_PERM;
+fi
 fi
 
 if [ -n $OCF_RESKEY_monitor_user -a ! -n $OCF_RESKEY_monitor_password ]
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] pgsql RA error in 3.9.1

2011-06-20 Thread Serge Dubrouski
Sorry.

On Mon, Jun 20, 2011 at 2:22 PM, Vadym Chepkov vchep...@gmail.com wrote:


 On Jun 20, 2011, at 3:55 PM, Serge Dubrouski wrote:

  Patch is attached.


 Your patch is damaged, it has lines cut short.

 Vadym
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
--- a/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -540,9 +540,13 @@ pgsql_validate_all() {
 return $OCF_ERR_INSTALLED;
 fi
 
-if ! runasowner test -w $OCF_RESKEY_pgdata; then
-ocf_log err Directory $OCF_RESKEY_pgdata is not writable by $OCF_RESKEY_pgdba
-exit $OCF_ERR_PERM;
+if ocf_is_probe; then
+ocf_log info Don't check $OCF_RESKEY_pgdata during probe
+else
+if ! runasowner test -w $OCF_RESKEY_pgdata; then
+ocf_log err Directory $OCF_RESKEY_pgdata is not writable by $OCF_RESKEY_pgdba
+exit $OCF_ERR_PERM;
+fi
 fi
 
 if [ -n $OCF_RESKEY_monitor_user -a ! -n $OCF_RESKEY_monitor_password ]
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] Patch for pgsql

2011-06-15 Thread Serge Dubrouski
I screwed up with git so here is the patch attached.

On Tue, Jun 14, 2011 at 12:44 PM, Serge Dubrouski serge...@gmail.comwrote:

 Hello -

 I've just sent a pull request from my fork of resource-agent git
 repository. Proposed patch adds better handling of probes, fixes one English
 misspelling and declares rc variable as local in some function where it's
 used.

 I hope I did everything right with git

 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
diff --git a/heartbeat/pgsql b/heartbeat/pgsql
index ca08f9b..812aa34 100755
--- a/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -22,11 +22,6 @@
 get_pgsql_param() {
 local config_file
 local param_name
-local loglevel=err
-
-if ocf_is_probe; then
-loglevel=warn
-fi
 
 param_name=$1
  
@@ -37,10 +32,8 @@ get_pgsql_param() {
 config=$OCF_RESKEY_pgdata/postgresql.conf
 fi
 
-if [ ! -f $config ]; then
-ocf_log $loglevel Cannot find configuration file $config
-return
-fi
+check_config $config 
+[ $? -eq 0 ] || return
 
 perl_code=if (/^\s*$param_name[\s=]+\s*(.*)$/) {
\$dir=\$1;
@@ -310,6 +303,7 @@ EOF
 pgsql_start() {
 local pgctl_options
 local postgres_options
+local rc
 
 if pgsql_status; then
 ocf_log info PostgreSQL is already running. PID=`cat $PIDFILE`
@@ -386,6 +380,8 @@ pgsql_start() {
 
 #pgsql_stop: Stop PostgreSQL
 pgsql_stop() {
+local rc
+
 if ! pgsql_status
 then
 #Already stopped
@@ -456,6 +452,7 @@ pgsql_status() {
 pgsql_monitor() {
 local loglevel
 local psql_options
+local rc
 
 # Set the log level of the error message
 loglevel=${1:-err}
@@ -509,6 +506,22 @@ check_binary2() {
 return 0
 }
 
+check_config() {
+local rc=0
+
+if [ ! -f $1 ]; then
+if ocf_is_probe; then
+   ocf_log info Configuration file $1 not readable during probe.
+   rc=1
+else
+   ocf_log err Configuration file $1 doesn't exist
+   rc=2
+fi
+fi
+
+return $rc
+}
+
 # Validate most critical parameters
 pgsql_validate_all() {
 if ! check_binary2 $OCF_RESKEY_pgctl || 
@@ -517,8 +530,8 @@ pgsql_validate_all() {
 fi
 
 if [ -n $OCF_RESKEY_config -a ! -f $OCF_RESKEY_config ]; then
-ocf_log err the configuration file $OCF_RESKEY_config doesn't exist
-return $OCF_ERR_INSTALLED
+   check_config $OCF_RESKEY_config 
+   [ $? -eq 2 ]  return $OCF_ERR_INSTALLED
 fi
 
 getent passwd $OCF_RESKEY_pgdba /dev/null 21
@@ -528,7 +541,7 @@ pgsql_validate_all() {
 fi
 
 if ! runasowner test -w $OCF_RESKEY_pgdata; then
-ocf_log err Directory $OCF_RESKEY_pgdata is not writable by $OCF_RESKEY_pgdba
+ocf_log err Directory $OCF_RESKEY_pgdata is not writeable by $OCF_RESKEY_pgdba
 exit $OCF_ERR_PERM;
 fi
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Patch for pgsql

2011-06-15 Thread Serge Dubrouski
Yes it's right.

Do you know how I can resync my fork to the upstream?

On Wed, Jun 15, 2011 at 6:28 AM, Florian Haas florian.h...@linbit.comwrote:

 On 2011-06-15 14:26, Serge Dubrouski wrote:
  I screwed up with git so here is the patch attached.

 Nice, thanks. Is the pgsql ocft test case OK as it is in the repo?

 Cheers,
 Florian


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Patch for pgsql

2011-06-14 Thread Serge Dubrouski
Hello -

I've just sent a pull request from my fork of resource-agent git repository.
Proposed patch adds better handling of probes, fixes one English misspelling
and declares rc variable as local in some function where it's used.

I hope I did everything right with git

-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] changelog for resource agents 3.9.x

2011-05-31 Thread Serge Dubrouski
pgsql: low, fix some exit codes.

On Tue, May 31, 2011 at 8:08 AM, Dejan Muhamedagic deja...@fastmail.fmwrote:

 Hello all,

 This is the ChangeLog I tried to put together for the coming
 release. Please take a look. In particular the mysql lines.

 And please use the prefix Low/Middle/High for the log lines in
 future, that helps immensely to compile a useful changelog.

 Cheers,

 Dejan


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] state of heartbeat resource agents

2011-05-24 Thread Serge Dubrouski
On Tue, May 24, 2011 at 4:10 AM, Dejan Muhamedagic deja...@fastmail.fmwrote:

 Hi all,

 Fabio suggested to make one last release before introducing the
 new common OCF provider.

 I'm scrutinizing now the LF bugzilla.
 Is there anything else that needs to be done for the Heartbeat RA
 set? IIRC, there were recently updates proposed for pgsql,
 postfix, mysql(?).


For pgsql there were small patches for exit codes and ocft file.


 Cheers,

 Dejan
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] state of heartbeat resource agents

2011-05-24 Thread Serge Dubrouski
On Tue, May 24, 2011 at 9:02 AM, Dejan Muhamedagic de...@suse.de wrote:

 Hi Serge,

 On Tue, May 24, 2011 at 08:14:21AM -0600, Serge Dubrouski wrote:
  On Tue, May 24, 2011 at 4:10 AM, Dejan Muhamedagic deja...@fastmail.fm
 wrote:
 
   Hi all,
  
   Fabio suggested to make one last release before introducing the
   new common OCF provider.
  
   I'm scrutinizing now the LF bugzilla.
   Is there anything else that needs to be done for the Heartbeat RA
   set? IIRC, there were recently updates proposed for pgsql,
   postfix, mysql(?).
  
 
  For pgsql there were small patches for exit codes and ocft file.

 Can you do the github dance? Or were these patches already
 discussed on the ML? If so, then can you please pass an URL. I
 didn't follow the ML traffic very closely lately and I'm afraid I
 may miss some.


As far as I know you already pushed them into github :-)

https://github.com/ClusterLabs/resource-agents/commit/331d550c1f5bbed88fbced4f11b36cab2a2138a8
https://github.com/ClusterLabs/resource-agents/commit/6d13868a70ddf90d5bc37e745c3df77caafc755a


 Cheers,

 Dejan

   Cheers,
  
   Dejan
   ___
   Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
   Home Page: http://linux-ha.org/
  
 
 
 
  --
  Serge Dubrouski.

  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Bug in crm shell or pengine

2011-04-26 Thread Serge Dubrouski
On Tue, Apr 26, 2011 at 1:03 PM, Dejan Muhamedagic deja...@fastmail.fm wrote:
 Hi,

 On Tue, Apr 19, 2011 at 09:22:41AM -0600, Serge Dubrouski wrote:
 On Tue, Apr 19, 2011 at 1:12 AM, Andrew Beekhof and...@beekhof.net wrote:
  On Mon, Apr 18, 2011 at 11:38 PM, Serge Dubrouski serge...@gmail.com 
  wrote:
  Ok, I've read the documentation. It's not a bug, it's a feature :-)
 
  Might be nice if the shell could somehow prevent such configs, but it
  would be non-trivial to implement.

 Or may be as trivial as checking for such duplicates and in case of
 different roles adjusting interval time with plus or minus 1.

 A good idea. Could you please file a bugzilla lest we forget
 about it.

Bug 2586.


-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] OCFT file and Patch for pgsql

2011-04-18 Thread Serge Dubrouski
Hello -

Attached are ocft file and small patch for pgsql RA. Patch adds some
more validations for correct settings for OCF_RESKEY_pgdba user and
fixes several exit codes.

I'm kind of new to git so let me know if the patch isn't properly formatted.

-- 
Serge Dubrouski.


pgsql_ocft
Description: Binary data
diff --git a/heartbeat/pgsql b/heartbeat/pgsql
index d2af0be..ca08f9b 100755
--- a/heartbeat/pgsql
+++ b/heartbeat/pgsql
@@ -329,7 +329,7 @@ pgsql_start() {
 if ! check_log_file $OCF_RESKEY_logfile
 then
 ocf_log err PostgreSQL can't write to the log file: $OCF_RESKEY_logfile
-	return $OCF_ERR_GENERIC
+	return $OCF_ERR_PERM
 fi
 
 # Check socket directory
@@ -521,6 +521,17 @@ pgsql_validate_all() {
 return $OCF_ERR_INSTALLED
 fi
 
+getent passwd $OCF_RESKEY_pgdba /dev/null 21
+if [ ! $? -eq 0 ]; then
+ocf_log err User $OCF_RESKEY_pgdba doesn't exit;
+return $OCF_ERR_INSTALLED;
+fi
+
+if ! runasowner test -w $OCF_RESKEY_pgdata; then
+ocf_log err Directory $OCF_RESKEY_pgdata is not writable by $OCF_RESKEY_pgdba
+exit $OCF_ERR_PERM;
+fi
+
 if [ -n $OCF_RESKEY_monitor_user -a ! -n $OCF_RESKEY_monitor_password ]
 then
 ocf_log err monitor password can't be empty
@@ -564,24 +575,24 @@ check_socket_dir() {
 if [ ! -d $OCF_RESKEY_socketdir ]; then
 if ! mkdir $OCF_RESKEY_socketdir; then
 ocf_log err Cannot create directory $OCF_RESKEY_socketdir
-exit $OCF_ERR_GENERIC
+exit $OCF_ERR_PERM
 fi
 
 if ! chown $OCF_RESKEY_pgdba:`getent passwd \
  $OCF_RESKEY_pgdba | cut -d : -f 4` $OCF_RESKEY_socketdir 
 then
 ocf_log err Cannot change ownership for $OCF_RESKEY_socketdir
-exit $OCF_ERR_GENERIC
+exit $OCF_ERR_PERM
 fi
 
 if ! chmod 2775 $OCF_RESKEY_socketdir; then
 ocf_log err Cannot change permissions for $OCF_RESKEY_socketdir
-exit $OCF_ERR_GENERIC
+exit $OCF_ERR_PERM
 fi
 else
 if ! runasowner touch $OCF_RESKEY_socketdir/test.$$; then
 ocf_log err $OCF_RESKEY_pgdba cannot create files in $OCF_RESKEY_socketdir
-exit $OCF_ERR_GENERIC
+exit $OCF_ERR_PERM
 fi
 rm $OCF_RESKEY_socketdir/test.$$
 fi
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Bug in crm shell or pengine

2011-04-18 Thread Serge Dubrouski
Hello -

Looks like there is a bug in crm shell Pacemaker version 1.1.5 or in pengine.


primitive pg_drbd ocf:linbit:drbd \
params drbd_resource=drbd0 \
op monitor interval=60s role=Master timeout=10s \
op monitor interval=60s role=Slave timeout=10s

Log file:

Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Operation
pg_drbd-monitor-60s-0 is a duplicate of pg_drbd-monitor-60s
Apr 17 04:05:29 cs51 crmd: [5535]: info: do_state_transition: Starting
PEngine Recheck Timer
Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Do not use the
same (name, interval) combination more than once per resource
Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Operation
pg_drbd-monitor-60s-0 is a duplicate of pg_drbd-monitor-60s
Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Do not use the
same (name, interval) combination more than once per resource
Apr 17 04:05:29 cs51 pengine: [5534]: ERROR: is_op_dup: Operation
pg_drbd-monitor-60s-0 is a duplicate of pg_drbd-monitor-60s

Plus strange behavior of the cluster like inability to mover resources
from one node to another.

-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-HA] When is the next release for resource agents?

2011-04-06 Thread Serge Dubrouski
Hello -

When is the next release for resource agents? Agents that come with
resource-agents-1.0.3-2.6.el5 form clusterlabs repository are very
outdated.pgsql is at least one year old or so.

-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-02 Thread Serge Dubrouski
On Sat, Apr 2, 2011 at 10:18 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 On 4/2/2011 12:40 AM, Vadym Chepkov wrote:

 Ok, lets see how this might work.
 You would need a separate monitor for the cluster and since this
 monitor also can potentially crash, you would need another monitor to
 observer the first one, then we would want the first one to monitor
 second one, so we would need a cluster of monitors.

 That is precisely why I'm happy with heartbeat 2.1.4 in R1 setup:
 simple, stupid, and I know exactly what failures it will handle and what
 problems it monitors for (because I wrote the mon scripts).

Old shoes always feet better.


 Wait, don't we have already cluster in place? It seems logical to have
 monitor to be part of the cluster. I was expecting monitor operation
 to handle that, but it seems for DRBD this is not the case.

 This is also not the case with e.g. apache once you think about it: the
 agent checks it wget of /server-status on locahost returns success.
 There's 3 things wrong with that, the one relevant here is that kernel
 should be smart enough to route the packets over lo even if you're
 wget'ting from cluster ip. As a result you cannot check if a daemon is
 answering on cluster ip if you run the check on active node.

 So you have to have an external monitor. Don't you have one to monitor
 your switches and upsen and not-clustered kit anyway?

True for basic setup. Not true if you carefully study Apache RA.


 Maybe  we
 should  have another primitive running? drbd_status or something?
 When drbd subsystem is in degraded state, have drbd_status in stopped 
 state?

 Drbd has its own logic for figuring out its state. Controlled via
 drbd.conf -- adjust drbd.conf so the secondary does not start in
 degraded state. And shuts down when split brain is detected.

 Dima


 Dima
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-01 Thread Serge Dubrouski
On Fri, Apr 1, 2011 at 8:38 AM, Lars Ellenberg
lars.ellenb...@linbit.com wrote:
 On Fri, Apr 01, 2011 at 11:35:19AM +0200, Christoph Bartoschek wrote:
 Am 01.04.2011 11:27, schrieb Florian Haas:
  On 2011-04-01 10:49, Christoph Bartoschek wrote:
  Am 01.04.2011 10:27, schrieb Andrew Beekhof:
  On Sat, Mar 26, 2011 at 12:10 AM, Lars Ellenberg
  lars.ellenb...@linbit.com   wrote:
  On Fri, Mar 25, 2011 at 06:18:07PM +0100, Christoph Bartoschek wrote:
  I am missing the state: running degraded or suboptimal.
 
  Yep, degraded is not a state available for pacemaker.
  Pacemaker cannot do much about suboptimal.
 
  I wonder what it would take to change that.  I suspect either a
  crystal ball or way too much knowledge of drbd internals.
 
  The RA would be responsible to check this. For drbd any diskstate
  different from UpToDate/UpToDate is suboptimal.
 
  Have you actually looked at the resource agent? It does already evaluate
  the disk state and adjusts the master preference accordingly. What else
  is there to do?

 Maybe I misunderstood Andrew's comment. I read it this way:  If we
 introduce a new state suboptimal, would it be hard to detect it?

 I just wanted to express that detecting suboptimality seems not to be
 that hard.

 But that state is useless for pacemaker,
 since it cannot do anything about it.

Looks like a lot of people, including myself, are still confused with
this statement. Basically this state of DRBD resource is unstable and
resource is unusable, why do you think that this is normal for
Pacemaker to report a such state as Ok state?


 I thought I made that clear.

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-01 Thread Serge Dubrouski
On Fri, Apr 1, 2011 at 8:43 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Fri, Apr 1, 2011 at 4:38 PM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
 On Fri, Apr 01, 2011 at 11:35:19AM +0200, Christoph Bartoschek wrote:
 Am 01.04.2011 11:27, schrieb Florian Haas:
  On 2011-04-01 10:49, Christoph Bartoschek wrote:
  Am 01.04.2011 10:27, schrieb Andrew Beekhof:
  On Sat, Mar 26, 2011 at 12:10 AM, Lars Ellenberg
  lars.ellenb...@linbit.com   wrote:
  On Fri, Mar 25, 2011 at 06:18:07PM +0100, Christoph Bartoschek wrote:
  I am missing the state: running degraded or suboptimal.
 
  Yep, degraded is not a state available for pacemaker.
  Pacemaker cannot do much about suboptimal.
 
  I wonder what it would take to change that.  I suspect either a
  crystal ball or way too much knowledge of drbd internals.
 
  The RA would be responsible to check this. For drbd any diskstate
  different from UpToDate/UpToDate is suboptimal.
 
  Have you actually looked at the resource agent? It does already evaluate
  the disk state and adjusts the master preference accordingly. What else
  is there to do?

 Maybe I misunderstood Andrew's comment. I read it this way:  If we
 introduce a new state suboptimal, would it be hard to detect it?

 No, detecting is the easy part.

 I just wanted to express that detecting suboptimality seems not to be
 that hard.

 But that state is useless for pacemaker,
 since it cannot do anything about it.

 This was the part I was wondering about - if pacemaker _could_ do
 something intelligent.

Isn't a simple reporting that resource is broken rather than healthy
intelligent enough?

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-01 Thread Serge Dubrouski
On Fri, Apr 1, 2011 at 8:50 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Fri, Apr 1, 2011 at 4:46 PM, Serge Dubrouski serge...@gmail.com wrote:
 On Fri, Apr 1, 2011 at 8:43 AM, Andrew Beekhof and...@beekhof.net wrote:
 On Fri, Apr 1, 2011 at 4:38 PM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
 On Fri, Apr 01, 2011 at 11:35:19AM +0200, Christoph Bartoschek wrote:
 Am 01.04.2011 11:27, schrieb Florian Haas:
  On 2011-04-01 10:49, Christoph Bartoschek wrote:
  Am 01.04.2011 10:27, schrieb Andrew Beekhof:
  On Sat, Mar 26, 2011 at 12:10 AM, Lars Ellenberg
  lars.ellenb...@linbit.com   wrote:
  On Fri, Mar 25, 2011 at 06:18:07PM +0100, Christoph Bartoschek wrote:
  I am missing the state: running degraded or suboptimal.
 
  Yep, degraded is not a state available for pacemaker.
  Pacemaker cannot do much about suboptimal.
 
  I wonder what it would take to change that.  I suspect either a
  crystal ball or way too much knowledge of drbd internals.
 
  The RA would be responsible to check this. For drbd any diskstate
  different from UpToDate/UpToDate is suboptimal.
 
  Have you actually looked at the resource agent? It does already evaluate
  the disk state and adjusts the master preference accordingly. What else
  is there to do?

 Maybe I misunderstood Andrew's comment. I read it this way:  If we
 introduce a new state suboptimal, would it be hard to detect it?

 No, detecting is the easy part.

 I just wanted to express that detecting suboptimality seems not to be
 that hard.

 But that state is useless for pacemaker,
 since it cannot do anything about it.

 This was the part I was wondering about - if pacemaker _could_ do
 something intelligent.

 Isn't a simple reporting that resource is broken rather than healthy
 intelligent enough?

 But is it completely broken?  Is it broken in a way that pacemaker can
 do something to repair it?

The primary point so far is that current situation is unacceptable for
most of customers. Even if it's half broken or broken to 1/3 it
shouldn't be reported as healthy resource. And as far as I understand
Pacemaker cannot do anything more than that. Fixing this situation, if
possible, would be responsibility of DRBD RA.



 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] DRBD and pacemaker interaction

2011-04-01 Thread Serge Dubrouski
On Fri, Apr 1, 2011 at 8:56 AM, Lars Ellenberg
lars.ellenb...@linbit.com wrote:
 On Fri, Apr 01, 2011 at 08:42:04AM -0600, Serge Dubrouski wrote:
 On Fri, Apr 1, 2011 at 8:38 AM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
  On Fri, Apr 01, 2011 at 11:35:19AM +0200, Christoph Bartoschek wrote:
  Am 01.04.2011 11:27, schrieb Florian Haas:
   On 2011-04-01 10:49, Christoph Bartoschek wrote:
   Am 01.04.2011 10:27, schrieb Andrew Beekhof:
   On Sat, Mar 26, 2011 at 12:10 AM, Lars Ellenberg
   lars.ellenb...@linbit.com   wrote:
   On Fri, Mar 25, 2011 at 06:18:07PM +0100, Christoph Bartoschek wrote:
   I am missing the state: running degraded or suboptimal.
  
   Yep, degraded is not a state available for pacemaker.
   Pacemaker cannot do much about suboptimal.
  
   I wonder what it would take to change that.  I suspect either a
   crystal ball or way too much knowledge of drbd internals.
  
   The RA would be responsible to check this. For drbd any diskstate
   different from UpToDate/UpToDate is suboptimal.
  
   Have you actually looked at the resource agent? It does already evaluate
   the disk state and adjusts the master preference accordingly. What else
   is there to do?
 
  Maybe I misunderstood Andrew's comment. I read it this way:  If we
  introduce a new state suboptimal, would it be hard to detect it?
 
  I just wanted to express that detecting suboptimality seems not to be
  that hard.
 
  But that state is useless for pacemaker,
  since it cannot do anything about it.

 Looks like a lot of people, including myself, are still confused with
 this statement. Basically this state of DRBD resource is unstable and
 resource is unusable, why do you think that this is normal for
 Pacemaker to report a such state as Ok state?

 It is usable. It is being used.
 It is at least as usable as a degraded RAID1.
 Pacemaker cannot do anything about that missing disk, either.

I see your point. Would it be possible to provide some options through
RA? I mean add a parameter that would control a degraded state.
Something like OCF_RESKEY_on_degraded with possible values FAIL or
CONTINUE.


 Of course you can patch pacemaker to detect that the RAID1 is degraded,
 and trigger faxing a PO to your supplier for a replacement drive.

 But possibly you should rather have some monitoring (nagios, ...) notice this,
 page/email/alert with your favorite method the relevant people, and have
 them take appropriate actions?

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] DRBD problems are not reported

2011-04-01 Thread Serge Dubrouski
 direct link , started cluster
 back and there is no indication whatsoever in the cluster status that
 drbd is in trouble, except location constraint added by
 crm-fence-peer.sh
 Even scores attributes for the master resource are not negative on the 
 disconnected secondary.

 After I applied my fix all, is kosher - I get Slave as stopped, I get 
 fail-count.

 And you won't ever be able to promote an unconnected Secondary,
 or recover from replication link hickups.

Do you think that an attempt to promote an outdated resource:

0: cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown   r-

is a better solution? Will it succeed?

 How are you going to do failovers?
 How are you going to do a reboot of a degraded cluster?

 But of course you are free to deploy any hacks you want.

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Updating LDAP in Heartbeat/DRDB Cluster

2011-02-23 Thread Serge Dubrouski
Why not to use ldap syncrepl feature instead of DRBD?

On other hand what exactly are you trying to get? I don't think that
you can build and active/active LDAP cluster using DRBD since LDAP
caches it's database and doesn't provide mechanisms to synchronize
caches, so your only choice would be active/passive cluster, in this
case you just update an active node and make sure that you drbd and
filesystem resources are collocated with LDAP.

On Wed, Feb 23, 2011 at 2:24 PM, Christopher Metter
christopher.met...@informatik.uni-wuerzburg.de wrote:
 Hi there,


 I'm running a setup with a Heartbeat/DRDB cluster with 2 nodes and open
 ldap database stored inside the DRDB-device.
 No problem with the setup itself, it runs perfectly.


 But I'm having following problem: How to update LDAP in a cluster?
 The plan was to first run the update on the current slave device, then
 manually failover and run update on the other device.
 This was a very bad idea as LDAP needs to have access to its database
 and as the slave has no mounted drdb-device, it fails and update stops.

 So is there any recommended update startegy or guide on how to update
 LDAP inside a heartbeat/drdb cluster?
 As the LDAP server needs to be online any downtime would be very bad.


 I'm having some ideas on how to solve this problem but I'm not sure if
 this is the real solution:
 Situation: node1 is running heartbeat/drdb with ldap server up | node2
 is in slave mode
 - Stop heartbead and drdb on node2
 - Start ldap server with an empty database on node2
 - Update ldap and everything else on node2
 - After successful update stop ldap on node2
 - Start Heartbeat/drdb on node2
 - Manual Failover - node2 now master, node1 slave
 - goto node1 and redo according steps


 I hope you understand my concern and you can help me to fix this
 update-mess.

 Is anyone out there with a solution or do you guys just handle it like
 'hitchhikers guide to galaxy' advises?
 (Asking Hitchhikers Guide: How to Update LDAP in Heartbeat Cluster? - Don't!
 http://www.imdb.com/title/tt0371724/quotes?qt0351093 )



 Greetings from Germany,
 Christopher Metter
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Updating LDAP in Heartbeat/DRDB Cluster

2011-02-23 Thread Serge Dubrouski
Sorry, I thought about database updates, not software updates.

On Wed, Feb 23, 2011 at 2:33 PM, Serge Dubrouski serge...@gmail.com wrote:
 Why not to use ldap syncrepl feature instead of DRBD?

 On other hand what exactly are you trying to get? I don't think that
 you can build and active/active LDAP cluster using DRBD since LDAP
 caches it's database and doesn't provide mechanisms to synchronize
 caches, so your only choice would be active/passive cluster, in this
 case you just update an active node and make sure that you drbd and
 filesystem resources are collocated with LDAP.

 On Wed, Feb 23, 2011 at 2:24 PM, Christopher Metter
 christopher.met...@informatik.uni-wuerzburg.de wrote:
 Hi there,


 I'm running a setup with a Heartbeat/DRDB cluster with 2 nodes and open
 ldap database stored inside the DRDB-device.
 No problem with the setup itself, it runs perfectly.


 But I'm having following problem: How to update LDAP in a cluster?
 The plan was to first run the update on the current slave device, then
 manually failover and run update on the other device.
 This was a very bad idea as LDAP needs to have access to its database
 and as the slave has no mounted drdb-device, it fails and update stops.

 So is there any recommended update startegy or guide on how to update
 LDAP inside a heartbeat/drdb cluster?
 As the LDAP server needs to be online any downtime would be very bad.


 I'm having some ideas on how to solve this problem but I'm not sure if
 this is the real solution:
 Situation: node1 is running heartbeat/drdb with ldap server up | node2
 is in slave mode
 - Stop heartbead and drdb on node2
 - Start ldap server with an empty database on node2
 - Update ldap and everything else on node2
 - After successful update stop ldap on node2
 - Start Heartbeat/drdb on node2
 - Manual Failover - node2 now master, node1 slave
 - goto node1 and redo according steps


 I hope you understand my concern and you can help me to fix this
 update-mess.

 Is anyone out there with a solution or do you guys just handle it like
 'hitchhikers guide to galaxy' advises?
 (Asking Hitchhikers Guide: How to Update LDAP in Heartbeat Cluster? - Don't!
 http://www.imdb.com/title/tt0371724/quotes?qt0351093 )



 Greetings from Germany,
 Christopher Metter
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Updating LDAP in Heartbeat/DRDB Cluster

2011-02-23 Thread Serge Dubrouski
On Wed, Feb 23, 2011 at 2:56 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 Serge Dubrouski wrote:
 Why not to use ldap syncrepl feature instead of DRBD?

 The problem with syncrepl is not the replication, it's the timeouts in
 the failover. As in you type ls -l, your computer freezes for 5 minutes.

With syncrepl you don't need a shared storage, so you can run LDAP as a clone.


 Dimitri
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Updating LDAP in Heartbeat/DRDB Cluster

2011-02-23 Thread Serge Dubrouski
On Wed, Feb 23, 2011 at 3:36 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 Serge Dubrouski wrote:
 On Wed, Feb 23, 2011 at 2:56 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu 
 wrote:
 Serge Dubrouski wrote:
 Why not to use ldap syncrepl feature instead of DRBD?
 The problem with syncrepl is not the replication, it's the timeouts in
 the failover. As in you type ls -l, your computer freezes for 5 minutes.

 With syncrepl you don't need a shared storage, so you can run LDAP as a 
 clone.

 What I mean is, with syncrepl you're looking at 2 active ldap servers.

 If you have 2 active ldap servers, the usual default is 2 minutes until
 your system gives up on one and tries to talk to the other. This takes
 place in response to any user action that involves uid, gid, or whatever
 else you my be storing in ldap. (E.g. ls -l needs to map uids to
 names.) The action is blocking, the shell freezes.

But you still can have just 1 IP associated with a node that has LDAP
up. Or you can have an IP with load balancer and health monitor. It's
all design issues.


 As opposed to a cluster that monitors status of ldap service and fails
 over if there's a problem: the only way hb_takeover takes 2 minutes is
 if your drbd's developed split brain. And if you're lucky, it failed
 over before you typed ls -l.

 Dima
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] [PATCH] manage PostgreSQL 9.0 streaming replication using Master/Slave

2011-02-14 Thread Serge Dubrouski
On Mon, Feb 14, 2011 at 1:28 AM, Takatoshi MATSUO matsuo@gmail.com wrote:
 Ideally demote operation should stop a master node and then restart it
 in hot-standby mode. It's up to administrator to make sure that no
 node with outdated data gets promoted to the master role. One should
 follow standard procedures: cluster software shouldn't be configured
 for autostart at the boot time, administrator has to make sure that
 data was refreshed if the node was down for some prolonged time.

 Hmm..
 Do you mean that RA puts recovery.conf automatically at demote op to
 start hot standby?
 Please give me some time to think it over.


 Sorry, I got the wrong idea about restoring data.
 To start as hot-standby needs restoring anytime,
 because Time-line ID of PostgreSQL is incremented.
 In addition, shutting down the PostgreSQL with immediate option causes
 inconsistent WAL  between primary and hot-standby.

 So I think it's difficult to start slave automatically at demote.
 Still, do you think it's better to implement restoring ?

I'm afraid it's not just better, but it's a must. We have to play by
Pacemaker's rules and that means that we have to properly implement
demote operation and that's switching from Master to Slave, not just
stopping Master. I do appreciate your efforts, but implementation has
to conform to Pacemaker standards, i.e. Master has to start where it's
configured in Pacemaker, not just where recovery.conf file exists.
Administrator has to be able to easily switch between node roles and
so on.

I still need some more time to learn PostgreSQL data replication and
do some tests. Let's think if that's possible to implement real
Master/Slave in Pacemaker sense of things.

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] manage PostgreSQL 9.0 streaming replication using Master/Slave

2011-02-07 Thread Serge Dubrouski
Hello -

First of all thanks for doing this. It's definitely a neat option to
have. Unfortunately I haven't yet worked with ProstgreSQL 9.0. Could
you please elaborate little bit more on how it works? What happens if
master dies? Who is responsible for creating/deleting recovery.conf
file?

Also please be more careful with quoting variables in your if statements.



On Mon, Feb 7, 2011 at 5:19 AM, Takatoshi MATSUO matsuo@gmail.com wrote:
 Hello

 I wrote a patch for pgsql RA to manage PostgreSQL 9.0 streaming
 replication(PGSR)
 which is new feature in Postgres 9.0.

 I add new parameters, rep_mode, trigger_file and tmpdir.

 rep_mode means replication mode
 Default is none which do the same thing as before.
 You have to set async to use PGSR.

 trigger_file must be set same parameter of recovery.conf.
 This file is created to promote hot standby to primary.
 Default is /var/lib/pgsql/PGSQL.5432.trigger.

 tmpdir is temporary directory used for trigger_file and flag files.
 Default is /var/lib/pgsql.



 * Details

 PGSR's transition differs from Pacemaker's one as follows.

 *** transition of Pacemaker **

     start         promote
   --  
 Stop        Slave          Master
   -   
     stop          demote

 **

 *** transition of PGSR ***

     pg_ctl start
   --
     pg_ctl start     create trigger_file
   ---  
 Stop           Hot standby              Primary
   
     pg_ctl stop
   --
     pg_ctl stop

  Note
     PostgreSQL developer says that it's not normal operation
     to start primary through hot standby initially.
 ***

 Therefore this patch has 4 states as follows.

 STATE1   | STATE2   | STATE3      | STATE4
  Stop    | Slave    | Slave       | Master         - Pacemaker's state
  Stop    | Stop     | HotStandby  | Primary        - PGSR's state



 The STATE2 is steppingstone to transit to the STATE4
 so primary's initial start transits as follows.

   STATE1 --- STATE2 --- STATE4
         start       promote

 In the STATE2, PostgreSQL is stopping, so resource agent creates a flag
 file($REPRESS_MONITOR) to cheat monitor operation, and waites for promote op.


 * Method of starting

 You have to start only primary server's pacemaker with primary configuration
 and waits for promote op (STATE4).
 After that, you backup data from primary server and restore it
 to hot standby server and starts hot standby server's pacemaker with
 hot standby configuration.


 * About repressing start flag file

 Hot standby server must be started with new restored data.
 But if restored data is old, hot standby server fails to replication,
 but resource
 agent can't notice it.
 At this time primary server is broken, hot standby server is promoted with 
 very
 old data, so I bring in repressing start flag file($REPRESS_START)
 to repress restarting PostgreSQL automatically with old data.
 Therefore you have to backup and restore new data and remove this flag
 before starting PostgreSQL if it exists.


 How do you think these implementation ?
 Any comment ?

 Regards,
 Takatoshi MATSUO
 Linux-HA Japan

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/





-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Serge Dubrouski
Which OS?

Which version of Hearbeat?

heartbeat_pid - PID of which  of Heartbeat processes? It has several.


On Tue, Jan 4, 2011 at 6:32 AM, Igor Chudov ichu...@gmail.com wrote:
 A few weeks I reported that heartbeat died on one of the cluster machines,
 due to SIGXCPU.

 Well, it happened again. Heartbeat died, now both machines had the shared IP
 address up, what a god awful mess!!!

 Nopw they have split brain and the whole nine yards!

 I  looked at /proc/heartbeat_pid/limits and found:

 Limit                     Soft Limit           Hard Limit           Units

 Max cpu time              43                   unlimited            seconds


 So, this process somehow has a limit set for it.

 Does anyone have ANY clue who would set a limit for this process??? WTF?
 Does it do it for itself or what?
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Serge Dubrouski
           Hard Limit
 Units
 
  Max cpu time              43                   unlimited
  seconds
 
 
  So, this process somehow has a limit set for it.
 
  Does anyone have ANY clue who would set a limit for this process??? WTF?
  Does it do it for itself or what?
 

 I cannot answer your question, but I suspect it might be useful if you
 mentioned which version of heartbeat and what resource manager you are
 using. Perhaps provide a copy of your heartbeat configuration.

 Is heartbeat using too much CPU? It should be pretty much idle
 relative to the rest of the system - If not, it is worth finding out
 why not.

 Regards,
 Steve
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Serge Dubrouski
On Tue, Jan 4, 2011 at 9:14 AM, Igor Chudov ichu...@gmail.com wrote:
 Serge, I am not sure of anything, but the self-communication is supposed to
 be taking place on a single crossover cable between second network cards of
 the servers. (eth1).

Agree, yet something strange and pretty unique is going on with your
setup. Could you publish your ha.conf and outputs for ifconfig eth1
and netstat -in ?


 Igor

 On Tue, Jan 4, 2011 at 10:06 AM, Serge Dubrouski serge...@gmail.com wrote:

 Are you sure that everything is all right with your network? It looks
 like processes that are responsible for UDP communications are taking
 too much of CPU time.

 On Tue, Jan 4, 2011 at 8:47 AM, Igor Chudov ichu...@gmail.com wrote:
  Steve, here's some data.
 
  The OS is Ubuntu 10.04.
 
  ~# apt-cache policy heartbeat
  heartbeat:
   Installed: 1:3.0.3-1ubuntu1
   Candidate: 1:3.0.3-1ubuntu1
   Version table:
   *** 1:3.0.3-1ubuntu1 0
         500 http://us.archive.ubuntu.com/ubuntu/ lucid/universe Packages
         100 /var/lib/dpkg/status
 
  I agree that it should not use too much CPU, and I think that it does
 not.
  But after a while it gets a SIGXCPU anyway.
 
  It also seems to die from something else.
 
  ec 29 02:29:16 pfs-srv3 heartbeat: [1196]: WARN: Managed HBREAD process
 1228
  killed by signal 24 [SIGXCPU - CPU limit exceeded].
  Dec 29 02:29:16 pfs-srv3 heartbeat: [1196]: ERROR: Managed HBREAD process
  1228 dumped core
  Dec 29 02:29:16 pfs-srv3 heartbeat: [1196]: ERROR: HBREAD process died.
   Beginning communications restart process for comm channel 0.
  Dec 29 02:29:16 pfs-srv3 heartbeat: [1196]: info: glib: UDP Broadcast
  heartbeat closed on port 12694 interface eth1 - Status: 1
  Dec 29 02:29:16 pfs-srv3 heartbeat: [1196]: WARN: Managed HBWRITE process
  1227 killed by signal 9 [SIGKILL - Kill, unblockable].
  Dec 29 02:29:16 pfs-srv3 heartbeat: [1196]: ERROR: Both comm processes
 for
  channel 0 have died.  Restarting.
  Dec 29 02:29:16 pfs-srv3 heartbeat: [1196]: info: glib: UDP Broadcast
  heartbeat started on port 12694 (12694) interface eth1
  Dec 29 02:29:16 pfs-srv3 heartbeat: [1196]: info: glib: UDP Broadcast
  heartbeat closed on port 12694 interface eth1 - Status: 1
  Dec 29 02:29:16 pfs-srv3 heartbeat: [1196]: info: Communications restart
  succeeded.
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: WARN: Managed HBREAD process
  6729 killed by signal 24 [SIGXCPU - CPU limit exceeded].
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: ERROR: Managed HBREAD process
  6729 dumped core
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: ERROR: HBREAD process died.
   Beginning communications restart process for comm channel 0.
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: info: glib: UDP Broadcast
  heartbeat closed on port 12694 interface eth1 - Status: 1
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: WARN: Managed HBWRITE process
  6728 killed by signal 9 [SIGKILL - Kill, unblockable].
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: ERROR: Both comm processes
 for
  channel 0 have died.  Restarting.
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: info: glib: UDP Broadcast
  heartbeat started on port 12694 (12694) interface eth1
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: info: glib: UDP Broadcast
  heartbeat closed on port 12694 interface eth1 - Status: 1
  Dec 30 21:03:49 pfs-srv3 heartbeat: [1196]: info: Communications restart
  succeeded.
  Dec 31 13:58:22 pfs-srv3 heartbeat: [1226]: CRIT: Emergency Shutdown:
 Master
  Control process died.
  Dec 31 13:58:22 pfs-srv3 heartbeat: [1226]: CRIT: Killing pid 1196 with
  SIGTERM
  Dec 31 13:58:22 pfs-srv3 heartbeat: [1226]: CRIT: Killing pid 9866 with
  SIGTERM
  Dec 31 13:58:22 pfs-srv3 heartbeat: [1226]: CRIT: Killing pid 9867 with
  SIGTERM
  Dec 31 13:58:22 pfs-srv3 heartbeat: [1226]: CRIT: Emergency Shutdown(MCP
  dead): Killing ourselves.
 
  i
 
  On Tue, Jan 4, 2011 at 9:33 AM, Steve Davies davies...@gmail.com
 wrote:
 
  On 4 January 2011 13:47, Igor Chudov ichu...@gmail.com wrote:
   Further reading indicates that heartbeat itself sets a limit for
 itself
   every so often.
  
   Then it exceeds the limit (probably due to a bug). I am sure that
 tha's
  why
   whoever wrote heartbeat, set cpu limit, instead of foxing their bugs.
  
   Then it dies with SIGXCPU, leaving everything in an extremely messy
  state,
   leading to split brain, destruction of shared resources (DRBD data).
  
   I was trying to be a little patient. A little forgiving. I must say
 that
  my
   patience is rapidly running out.
  
   I absolutely cannot use this solution as a basis of a high
 reliability
   cluster, because it is the opposite of reliability.
  
   We had an old cluster that works very well with heartbeat V1. But it
 is
   getting old, the disks are wearing out, the fans are not getting
 newer,
  etc.
   I set up a new cluster in summer, but never fully trusted it, and it
  looks
   like I will not be able to trust it. We never completed a switchover

Re: [Linux-HA] Heartbeat dies AGAIN with SIGXCPU, cluster screwed up again

2011-01-04 Thread Serge Dubrouski
On Tue, Jan 4, 2011 at 1:29 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 Igor Chudov wrote:

 At this point I feel rather desperate. Perhaps I should give pacemaker
 another go. I really have no idea and I am running out of options.

 If all you need is a 2-node active-passive cluster, most (all?)
 pacemaker features are useless for you. (Besides, one look at their
 apache resource agent tells me I'm better off with my mon scripts.)

That's not true and please don't start it all over again.


 I haven't figured out how to get 3.0.3 to log anything in R1 mode, but
 other than that it worked for a couple of weeks I had it up. But that
 was on centos 5 w/ pacemaker repo rpms. Who knows how ubuntu built their
 debs.

 Dima
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] [PATCH] Medium: .ocf-shellfuncs: add ocf_test_pid convenience function

2010-12-15 Thread Serge Dubrouski
 And to really check that the process runs and _works_ as expected, you
 need to do a real check anyways (select now(), wget | grep ...).

 So what I try to say is, for the single resource agent,
 it does not make any difference at all whether it says

        if ocf_is_pid_running $pid ; then do_real_monitor; fi
  or
        if kill -0 $pid ; then do_real_monitor; fi

 and I like the latter better.

Me too.

If does get implemented though, ability to run it as a particular use
will be a must then. At least for pgsql RA.



 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-13 Thread Serge Dubrouski
BTW, here is one more Open Source alternative to Pacemaker for those
who're building HA storage systems: http://www.openfiler.com/

But I really doubt that it's easy ti learn and set up.

On Mon, Dec 13, 2010 at 6:06 AM, Pieter Baele pieter.ba...@gmail.com wrote:
 On Wed, Dec 8, 2010 at 20:39, Igor Chudov ichu...@gmail.com wrote:
 On Wed, Dec 8, 2010 at 1:32 PM, Serge Dubrouski serge...@gmail.com wrote:
 Taking into account simple the answer is no. You can try RedHat
 Cluster Suite on CentOS, but that's not simple.

 What's wrong with DRBD/Pacemaker/Corosync ?

 DRBD/Pacemaker was complicated, documentation did not exist or did not
 match the behavior, the GUI was broken and never really worked. Config
 files were completely opaque. I spent weeks on this without having
 something that could work and was documentable.

 Yeah... Try Veritas Cluster + Storage Foundation then.
 First, you have to go through more then 1000 pages of documentation 

 In a few words: very complete (+), bloated, expensive
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-10 Thread Serge Dubrouski
On Fri, Dec 10, 2010 at 8:42 AM, Les Mikesell lesmikes...@gmail.com wrote:
 On 12/10/2010 9:27 AM, Andrew Beekhof wrote:
 On Fri, Dec 10, 2010 at 2:53 PM, Les Mikeselllesmikes...@gmail.com  wrote:
 On 12/10/10 2:20 AM, Andrew Beekhof wrote:

 See LRM operation WebSite_start_0 unknown error from November, that's
 where your pdf led me. By the time I hit unknown error starting drbd
 resource -- set up exactly as you describe, I've spent close to a week
 trying to replicate the setup that takes    an hour. So I replaced it it
 with haresources and everything started to work.

 This does not at all back up your claim that there is no documentation.
 All this shows is that EPEL5 (what you tried it on) is different from
 Fedora-13 (what the guide was written for).

 Who would use fedora for anything that needed a highly available server?

 I believe you're missing the point.

 No, just looking from the point of view of a reader, not the writer of
 said document.

 Fedora was chosen for the documentation because it is freely available
 and contains current versions of the necessary dependancies.
 SLES/RHEL are not free, EPEL5 does not meet for the second criteria
 and EPEL6 does not yet exist.

 Does ubuntu LTS fit into this scheme anywhere?

 No-one is suggesting all clusters should run on Fedora. I was clearly
 trying to say that instructions for A are unlikely to work unmodified
 for B.

 So perhaps the appropriate question would be where to find the
 instructions for B - or a platform suitable for general use.  Which is
 what I thought came up on this thread long ago but with some sort of
 disagreement about its existence.

What exactly you want to build? The only thing that you probably can't
do on CentOS5 using DRBD/Corosync or Heartbeat/Pacemaker is building
an active/active dual master cluster with share clustering file system
like OCSF2, And that is because kernel is too old there. everything
else is possible.

If you do need a dual master (which is a pretty rare case) then use
other distro.


 --
   Les Mikesell
    lesmikes...@gmail.com

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-10 Thread Serge Dubrouski
On Fri, Dec 10, 2010 at 12:54 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 Les Mikesell wrote:
 ...
 What I
 wanted was advice on the best platform that had a packaged, re-usable
 setup available that was likely to be maintained in updates for a long
 time.

 There's a bit of problem with your requirement: you forgot supported.
 As in try getting any support here for version of heartbeat that ships
 with RHEL 5 (or Suse 10, as I understand).

What's wrong with RHEL5? You can use packages from
http://www.clusterlabs.org/rpm
Yes they don't support dual-master filesystem with OCFS2, but do you
really need it?

BTW, packaging for RHEL5 really sucks. Lots of things are really
outdated and if you want to use latest features you have either to
build them manually or use packages from third party repositories. One
of the best examples is OpenLdap. 2.3.42 that gets shilled with RHEL5
is way old and doesn't support such critical features as syncrepl for
example.


 Dima
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-10 Thread Serge Dubrouski
On Fri, Dec 10, 2010 at 1:21 PM, Les Mikesell lesmikes...@gmail.com wrote:
 On 12/10/2010 1:54 PM, Dimitri Maziuk wrote:
 Les Mikesell wrote:
 ...
 What I
 wanted was advice on the best platform that had a packaged, re-usable
 setup available that was likely to be maintained in updates for a long
 time.

 There's a bit of problem with your requirement: you forgot supported.
 As in try getting any support here for version of heartbeat that ships
 with RHEL 5 (or Suse 10, as I understand).

 I'm not looking for support for my local environment.  I'm looking for a
 version that is reusable and works without local hacks.  Or whatever
 might be expected to work for a long time into the future if set up now.
 But I suppose not changing interfaces wildly would be part of that
 requirement so a packager can maintain it.

You still didn't tell what you are building. I have a cluster of 2
nodes running 2 instances of Apache for 3 years already. OS CentOS
5.5, Pacemaker + Heartbeat. Upgraded it couple of times without any
issues. Is it long enough? I also support a couple of other clusters
on CentOS 5.5.


 --
   Les Mikesell
     lesmikes...@gmail.com
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-10 Thread Serge Dubrouski
On Fri, Dec 10, 2010 at 1:55 PM, Les Mikesell lesmikes...@gmail.com wrote:
 On 12/10/2010 2:29 PM, Serge Dubrouski wrote:

 What I
 wanted was advice on the best platform that had a packaged, re-usable
 setup available that was likely to be maintained in updates for a long
 time.

 There's a bit of problem with your requirement: you forgot supported.
 As in try getting any support here for version of heartbeat that ships
 with RHEL 5 (or Suse 10, as I understand).

 I'm not looking for support for my local environment.  I'm looking for a
 version that is reusable and works without local hacks.  Or whatever
 might be expected to work for a long time into the future if set up now.
 But I suppose not changing interfaces wildly would be part of that
 requirement so a packager can maintain it.

 You still didn't tell what you are building. I have a cluster of 2
 nodes running 2 instances of Apache for 3 years already. OS CentOS
 5.5, Pacemaker + Heartbeat. Upgraded it couple of times without any
 issues. Is it long enough? I also support a couple of other clusters
 on CentOS 5.5.

 Right now, all I have are a few instances of heartbeat floating a
 fail-over IP address between pairs of boxes that don't really need any
 state maintained (basically client-facing proxies) because that's all
 I've trusted the old versions to handle.  Everything else is behind load
 balancers with data replication handled some other way.  I'm sure I'd
 find a lot more places to use paired fail-over systems if they were
 simple to set up and included data replication.  So at this point it is
 more about building a test platform than making any specific application
 work.  But in the back of my head, I'm kind of wishing our developers
 would adopt riak or a similar redundant/scalable data store and do away
 with most of the need for specifically paired systems.

That's exactly what I've been running on my Apache cluster: 2
instances of Apache (one per server with fail over ability) plus
virtual IP for each instance. Each Apache has several Proxies with
mod_load_balancer. You can user CentOS 5.5 for that any any recent
package of Pacemaker/Corosync. Since you don't really have any shared
data between cluster nodes you event don't have to bother about
STONITH devices. Apache config files once created usually don't get
changed but even if they do you can sync them manually. A whole
project can be built in 10 - 15 minutes (doesn't include Apache
configuration). After that you can update Pacemaker once a year or so,
or can even let it run forever.


 --
   Les Mikesell
    lesmikes...@gmail.com
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-08 Thread Serge Dubrouski
The question is too broad. Are you interested in Open Source or
proprietary product? What do you want to achieve?

One of the answers could be AoE (ATA over Ethernet) + OCFS2.

On Wed, Dec 8, 2010 at 11:26 AM, Igor Chudov ichu...@gmail.com wrote:
 I would like to know if there are relatively straightforward Linux
 based alternatives to DRBD and heartbeat.

 Any pointers and suggestions will be gratefully accepted.

 Thanks

 i
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-08 Thread Serge Dubrouski
Taking into account simple the answer is no. You can try RedHat
Cluster Suite on CentOS, but that's not simple.

What's wrong with DRBD/Pacemaker/Corosync ?

On Wed, Dec 8, 2010 at 12:28 PM, Igor Chudov ichu...@gmail.com wrote:
 I am interested in finding out Open Source alternative that would
 replace both DRBD as well as Heartbeat. I am definitely NOT looking
 for pacemaker either.

 I am looking for something simple, just file serving ability and
 switching NFS and samba would be all I need.

 igor

 On Wed, Dec 8, 2010 at 1:22 PM, Serge Dubrouski serge...@gmail.com wrote:
 The question is too broad. Are you interested in Open Source or
 proprietary product? What do you want to achieve?

 One of the answers could be AoE (ATA over Ethernet) + OCFS2.

 On Wed, Dec 8, 2010 at 11:26 AM, Igor Chudov ichu...@gmail.com wrote:
 I would like to know if there are relatively straightforward Linux
 based alternatives to DRBD and heartbeat.

 Any pointers and suggestions will be gratefully accepted.

 Thanks

 i
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-08 Thread Serge Dubrouski
On Wed, Dec 8, 2010 at 12:39 PM, Igor Chudov ichu...@gmail.com wrote:
 On Wed, Dec 8, 2010 at 1:32 PM, Serge Dubrouski serge...@gmail.com wrote:
 Taking into account simple the answer is no. You can try RedHat
 Cluster Suite on CentOS, but that's not simple.

 What's wrong with DRBD/Pacemaker/Corosync ?

 DRBD/Pacemaker was complicated, documentation did not exist or did not
 match the behavior, the GUI was broken and never really worked. Config
 files were completely opaque. I spent weeks on this without having
 something that could work and was documentable.

 Then I got the DRBD/heartbeat to work without Pacemaker, however, its
 reliability leaves much to be desired. For example, if both systems
 come up at the same time, they cannot decide who gets the resource. I
 did a hack to fix that (restarting heartbeat after boot of one of the
 systems) but that did not leave me with a warm and comfortable
 feeling.

 Then once in a while DRBD would stop syncing.

 I need to replace our old, aging corporate DRBD/Heartbeat network
 storage implementation, which always worked, but the servers are now
 getting old. I am feeling very uncomfortable about using what I have
 for replacement.

If DRBD/Heartbeat always worked for you and you feel comfortable with
it then why not just replace aged hardware? Though your answer is a
bit confusing: on one hand you say that Heartbeat/DRBD always worked
for you on other hand you say that sometimes DRBD stop syncing.

May be your company should just spend more money and buy something
like NetApp appliance? In this case you'd get a reliable device that's
relatively easy to manage plus commercial support for it. They have
clustered solutions as well.


 i


 On Wed, Dec 8, 2010 at 12:28 PM, Igor Chudov ichu...@gmail.com wrote:
 I am interested in finding out Open Source alternative that would
 replace both DRBD as well as Heartbeat. I am definitely NOT looking
 for pacemaker either.

 I am looking for something simple, just file serving ability and
 switching NFS and samba would be all I need.

 igor

 On Wed, Dec 8, 2010 at 1:22 PM, Serge Dubrouski serge...@gmail.com wrote:
 The question is too broad. Are you interested in Open Source or
 proprietary product? What do you want to achieve?

 One of the answers could be AoE (ATA over Ethernet) + OCFS2.

 On Wed, Dec 8, 2010 at 11:26 AM, Igor Chudov ichu...@gmail.com wrote:
 I would like to know if there are relatively straightforward Linux
 based alternatives to DRBD and heartbeat.

 Any pointers and suggestions will be gratefully accepted.

 Thanks

 i
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Are there any Linux alternatives to drbd and heartbeat?

2010-12-08 Thread Serge Dubrouski
On Wed, Dec 8, 2010 at 12:51 PM, Igor Chudov ichu...@gmail.com wrote:
 On Wed, Dec 8, 2010 at 1:49 PM, Serge Dubrouski serge...@gmail.com wrote:
 On Wed, Dec 8, 2010 at 12:39 PM, Igor Chudov ichu...@gmail.com wrote:
 On Wed, Dec 8, 2010 at 1:32 PM, Serge Dubrouski serge...@gmail.com wrote:
 Taking into account simple the answer is no. You can try RedHat
 Cluster Suite on CentOS, but that's not simple.

 What's wrong with DRBD/Pacemaker/Corosync ?

 DRBD/Pacemaker was complicated, documentation did not exist or did not
 match the behavior, the GUI was broken and never really worked. Config
 files were completely opaque. I spent weeks on this without having
 something that could work and was documentable.

 Then I got the DRBD/heartbeat to work without Pacemaker, however, its
 reliability leaves much to be desired. For example, if both systems
 come up at the same time, they cannot decide who gets the resource. I
 did a hack to fix that (restarting heartbeat after boot of one of the
 systems) but that did not leave me with a warm and comfortable
 feeling.

 Then once in a while DRBD would stop syncing.

 I need to replace our old, aging corporate DRBD/Heartbeat network
 storage implementation, which always worked, but the servers are now
 getting old. I am feeling very uncomfortable about using what I have
 for replacement.

 If DRBD/Heartbeat always worked for you and you feel comfortable with
 it then why not just replace aged hardware? Though your answer is a
 bit confusing: on one hand you say that Heartbeat/DRBD always worked
 for you on other hand you say that sometimes DRBD stop syncing.

 What worked was the old Heartbeat on an old Debian install.

 What is not working so well is the new Heartbeat on Ubuntu Lucid.

 May be your company should just spend more money and buy something
 like NetApp appliance? In this case you'd get a reliable device that's
 relatively easy to manage plus commercial support for it. They have
 clustered solutions as well.

 I am open to this personally. Does this NetAppliance have
 clustering/failover capabilities?

Yes it does. See
http://www.netapp.com/us/solutions/infrastructure/data-protection/

There is also HP with the new StorageWorks P4000 servers. Usually
those products can act as NFS, CIFS servers and act as a NAS.




 Igor



 i


 On Wed, Dec 8, 2010 at 12:28 PM, Igor Chudov ichu...@gmail.com wrote:
 I am interested in finding out Open Source alternative that would
 replace both DRBD as well as Heartbeat. I am definitely NOT looking
 for pacemaker either.

 I am looking for something simple, just file serving ability and
 switching NFS and samba would be all I need.

 igor

 On Wed, Dec 8, 2010 at 1:22 PM, Serge Dubrouski serge...@gmail.com 
 wrote:
 The question is too broad. Are you interested in Open Source or
 proprietary product? What do you want to achieve?

 One of the answers could be AoE (ATA over Ethernet) + OCFS2.

 On Wed, Dec 8, 2010 at 11:26 AM, Igor Chudov ichu...@gmail.com wrote:
 I would like to know if there are relatively straightforward Linux
 based alternatives to DRBD and heartbeat.

 Any pointers and suggestions will be gratefully accepted.

 Thanks

 i
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Round Robin DNS OCF?

2010-11-30 Thread Serge Dubrouski
You can try this tool http://sourceforge.net/projects/freyr/ that I
wrote some time ago. It's written in Perl.

On Tue, Nov 30, 2010 at 4:21 AM, Michael Kromer
michael.kro...@millenux.com wrote:
 Hi,

 wanted to know if anyone has an idea to manipulate DNS-entries in case
 of failover, by example having 3 nodes communicating with ips from
 different locations (which kills IPAddr(2) for takeover) and having them
 delivering ressources by same DNS-entry. Idea would be to remove an IP
 as soon as ressources on that node are not available anymore, and adding
 itself back as soon as everything is running.

 Any ideas?

 - mike
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] [PATCH]A revision to the delay of the fuser command of pgsql.

2010-11-04 Thread Serge Dubrouski
Honestly I've never liked that fuser. No doubt that it's too expensive
to run it in every status/monitor operation. But this proposed
solution make pgsql RA incompatible with other operation systems,
Solaris in particular. So instead I'd propose following patch:

@@ -441,7 +441,7 @@
  if [ -f $PIDFILE ]
  then
  PID=`head -n 1 $PIDFILE`
- kill -s 0 $PID /dev/null 21  fuser $OCF_RESKEY_pgdata
21 | grep $PID /dev/null 21
+ runasowner kill -s 0 $PID /dev/null 21
  return $?
  fi

It would guarantee that process with that PID is up and owned by
pg_dba user. I believe that here we can assume that that process is
PosgressSQL database. Further check with running sql monitor will make
it sure.

The complete patch is attached.

2010/11/4  renayama19661...@ybb.ne.jp:
 Hi All,

 We discovered a phenomenon to fail in monitor processing from the delay of 
 the fuser command of pgsql.

 When the output to the disk is frequent, the case which is behind with a 
 fuser command occurs.
  * When we performed the output to the mountpoint of NFS in large quantities 
 in our environment, it
 occurred.

 The fuser command searches all entries in a proc directory.
 On this account a delay occurs when we output large quantities.

 We made the patch which referred to a proc directory directly without using 
 the fuser command.

 This patch works in the output of a large quantity of disks for light 
 movement in comparison with the
 fuser command definitely.

 Please confirm a patch.
 And please apply this patch to developer-version.

 Best Regards,
 Hideo Yamauchi.

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/





-- 
Serge Dubrouski.


pgsql_no_fuser
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH]A revision to the delay of the fuser command of pgsql.

2010-11-04 Thread Serge Dubrouski
On Thu, Nov 4, 2010 at 2:22 PM, Lars Ellenberg
lars.ellenb...@linbit.com wrote:
 On Thu, Nov 04, 2010 at 04:52:49PM +0100, Dejan Muhamedagic wrote:
 Hi,

 On Thu, Nov 04, 2010 at 08:54:02AM -0600, Serge Dubrouski wrote:
  Honestly I've never liked that fuser. No doubt that it's too expensive
  to run it in every status/monitor operation. But this proposed
  solution make pgsql RA incompatible with other operation systems,
  Solaris in particular. So instead I'd propose following patch:
 
  @@ -441,7 +441,7 @@
        if [ -f $PIDFILE ]
        then
            PID=`head -n 1 $PIDFILE`
  -         kill -s 0 $PID /dev/null 21  fuser $OCF_RESKEY_pgdata
  21 | grep $PID /dev/null 21
  +         runasowner kill -s 0 $PID /dev/null 21
            return $?
        fi
 
  It would guarantee that process with that PID is up and owned by
  pg_dba user. I believe that here we can assume that that process is
  PosgressSQL database. Further check with running sql monitor will make
  it sure.
 
  The complete patch is attached.

 This looks good enough to me. If nobody other objects, I'd apply
 this patch.

 Dejan, you asked it yourself:
 why do we need to kill $PID at all, anyways?
 Why not directly do the sql monitoring?

It's possible to configure pgsql RA to monitor database over Virtual
IP, not over local UNIX socket. In this case if there were no first
status check for process PID (actually that pgsql_status function)
monitor function would succeed on all cluster nodes that were able to
connect to Virtual IP. That would break a whole cluster. So doing
status check before SQL monitoring guarantees that PostgreSQL is
running locally on a node where ra runs.


 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH]A revision to the delay of the fuser command of pgsql.

2010-11-04 Thread Serge Dubrouski
On Thu, Nov 4, 2010 at 6:55 PM,  renayama19661...@ybb.ne.jp wrote:
 Hi All,

 Thank you for a lot of comment.

   #65533; #65533; #65533; if [ -f $PIDFILE ]
   #65533; #65533; #65533; then
   #65533; #65533; #65533; #65533; #65533; PID=`head -n 1 $PIDFILE`
   - #65533; #65533; #65533; #65533; kill -s 0 $PID /dev/null 21  
   fuser
 $OCF_RESKEY_pgdata
   21 | grep $PID /dev/null 21
   + #65533; #65533; #65533; #65533; runasowner kill -s 0 $PID 
   /dev/null 21
   #65533; #65533; #65533; #65533; #65533; return $?
   #65533; #65533; #65533; fi

 The change to very beginning fuser was performed with the next patch.

  * http://www.gossamer-threads.com/lists/linuxha/dev/67871

I remember perfectly well when and why I put it there :-) I just
didn't have time to rethink it;

 Because runasowner uses ocf_run in latest pgsql, I think that RA can support 
 some instance of
 postgres.

To support several instances of PostgreSQL on the same host you have
to somehow split them in the configuration. You have several options
there:

1. Use different pg_dba user. Probably no the best option.
2. Use different unix_socket_directory variable in postgresql.conf
file for  different instances.
3. Use different pgdb for different instances.
4. use different pg_host for different instances.


 And the patch of Dejan is better than the patch which I suggested.

 I think that a patch of Dejan works definitely.

It's mine :-)


 Best Regards,
 Hideo Yamauchi.

 --- Serge Dubrouski serge...@gmail.com wrote:

 On Thu, Nov 4, 2010 at 2:22 PM, Lars Ellenberg
 lars.ellenb...@linbit.com wrote:
  On Thu, Nov 04, 2010 at 04:52:49PM +0100, Dejan Muhamedagic wrote:
  Hi,
 
  On Thu, Nov 04, 2010 at 08:54:02AM -0600, Serge Dubrouski wrote:
   Honestly I've never liked that fuser. No doubt that it's too expensive
   to run it in every status/monitor operation. But this proposed
   solution make pgsql RA incompatible with other operation systems,
   Solaris in particular. So instead I'd propose following patch:
  
   @@ -441,7 +441,7 @@
   #65533; #65533; #65533; if [ -f $PIDFILE ]
   #65533; #65533; #65533; then
   #65533; #65533; #65533; #65533; #65533; PID=`head -n 1 $PIDFILE`
   - #65533; #65533; #65533; #65533; kill -s 0 $PID /dev/null 21  
   fuser
 $OCF_RESKEY_pgdata
   21 | grep $PID /dev/null 21
   + #65533; #65533; #65533; #65533; runasowner kill -s 0 $PID 
   /dev/null 21
   #65533; #65533; #65533; #65533; #65533; return $?
   #65533; #65533; #65533; fi
  
   It would guarantee that process with that PID is up and owned by
   pg_dba user. I believe that here we can assume that that process is
   PosgressSQL database. Further check with running sql monitor will make
   it sure.
  
   The complete patch is attached.
 
  This looks good enough to me. If nobody other objects, I'd apply
  this patch.
 
  Dejan, you asked it yourself:
  why do we need to kill $PID at all, anyways?
  Why not directly do the sql monitoring?

 It's possible to configure pgsql RA to monitor database over Virtual
 IP, not over local UNIX socket. In this case if there were no first
 status check for process PID (actually that pgsql_status function)
 monitor function would succeed on all cluster nodes that were able to
 connect to Virtual IP. That would break a whole cluster. So doing
 status check before SQL monitoring guarantees that PostgreSQL is
 running locally on a node where ra runs.

 
  --
  : Lars Ellenberg
  : LINBIT | Your Way to High Availability
  : DRBD/HA support and consulting http://www.linbit.com
 
  DRBD#65533; and LINBIT#65533; are registered trademarks of LINBIT, 
  Austria.
  ___
  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
  Home Page: http://linux-ha.org/
 



 --
 Serge Dubrouski.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Serge Dubrouski.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] LRM operation WebSite_start_0 unknown error

2010-11-01 Thread Serge Dubrouski
service httpd start would give you running apache even quicker.

On Nov 1, 2010 2:59 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:

Dimitri Maziuk wrote:

 LRM operation WebSite_start_0 (call=9, rc=1, cib-update=34,
 confirmed=tr...
OTOH, changing to crm no and
 cat nodename ip/mask httpd  haresources
gives me running apache.

Tell me about advantages of heartbeat v2 again.


Dima
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
_...
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] LRM operation WebSite_start_0 unknown error

2010-11-01 Thread Serge Dubrouski
On Mon, Nov 1, 2010 at 3:37 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 Serge Dubrouski wrote:
 service httpd start would give you running apache even quicker.

 (Yes, but I'd have to shut down the production website for kernel
 upgrades, disk upgrades, and occasional hardware failures.)

 More to the point, if service httpd start works, so should
 crm configure primitive website lsb:httpd. And there is no such thing
 as unknown error -- there's messed up config file, socket already in
 use, apache binary is wrong elfclass or not executable at all, and
 that's about it.

You keep comparing oranges and apples. I don't know what's wrong with
Apache lsb, I'd never use it for production configuration neither
Heartbeat v1. It was already told several times: yes Heartbeat v1.
checks that service httpd  started, and no Heartbeat v1 does not
guarantee that it keeps running. You should not compare Heartbeat v1
to Pacemaker (Heartbeat v2 is the wrong name for it). You should
compare Pacemaker to RedHat Cluster Suite, Veritas VCS, Oracle
Clusterware (but not Oracle RAC) or HP Service Guard. And you'd be
pretty surprised how difficult those product to learn and understand.


 Dima
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] search heartbeat+drbd postgresql tutorial

2010-10-28 Thread Serge Dubrouski
I've created a very basic HowTo for this task:
http://www.clusterlabs.org/wiki/DRBD_PgSQL_HowTo

On Thu, Oct 28, 2010 at 9:48 AM, Serge Dubrouski serge...@gmail.com wrote:
 There is no formal document for that. Just follow Cluster from
 scratch to build master/slave DRBD cluster and then add pgsql service
 on top of that. Make sure that you use Pacemaker, not Heartbeat v1.
 Use OCF Resource Agent for PorgreSQL.

 On Thu, Oct 28, 2010 at 3:19 AM, ramarovelo art...@gulfsat.mg wrote:
 Hi list!
 can someone tell me where i can find how-tos for heartbeat + drbd with
 postgresql?

 Thanks
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Serge Dubrouski.




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat with postgresql

2010-10-27 Thread Serge Dubrouski
On Wed, Oct 27, 2010 at 4:41 AM, Lars Ellenberg
lars.ellenb...@linbit.com wrote:
 On Tue, Oct 19, 2010 at 11:04:10AM -0600, Serge Dubrouski wrote:
 I see you could do it and now you are going to use Pacemaker all the
 time in the future. Than I see no reason why other can't do it as
 well taking into account that Heartbeat v1 almost not supported and
 definitely has no future unless somebody will decide to fork it out of
 the current project and start a new one independent of Pacemaker.

 Why would you want to fork anything?

It was the idea of David Lee, the guy who used to support Solaris
version of Heartbeat. The primary reason was to keep Heartbeat v1
small and portable. I think that he gave up on trying to keep up with
porting latest releases of Heartbeat/Pacemaker to Solaris long time
ago.


 If you don't want Pacemaker, don't use it.

 Just get Heartbeat 3, and use it in haresources mode.

 It's still there. It won't go away anytime soon.
 It's supposed to behave just like it always behaved.

 Of course there are cases where haresources mode
 (plus mon, nagios, whatever) is sufficient.

 But no, just the single line in haresources is not the end of cluster
 configuration in v1 mode. You then have to get monitoring in place,
 and trigger scripts on monitoring events and so on.

 So in the end, maybe you'd be better off using Pacemaker right away, anyways.

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Redundant Rings Still Not There?

2010-10-23 Thread Serge Dubrouski
On Fri, Oct 22, 2010 at 5:27 PM, Robinson, Eric eric.robin...@psmnv.com wrote:
 I am building a 3-node cluster and adding it to a network that
 already
 had a separate 2-node cluster.

 The new 3-node cluster would have 2 nodes actively serving up
 resources with 1 node acting as a failover for both of the active
 nodes.

 In this case all 3 nodes have to be connected to all rings
 if you want to use those rings for Pacemaker communications.
 And those rings have to be separate from existing cluster rings.

 So in my case, each of the back-to-back connections (crossover cables)
 would be regarded as a separate ring?

Looks like you are mixing up physical connections and Corosync rings.
Ring is an abstraction in Corosync. Crossover cable is a physical
connection. DRBD doesn't need a Corosync ring for data replication,
you can use your crossovers for that. Pacemaker/Corosync membership
protocol though requires that all nodes were connected to the same
network and form one or more abstract rings to communicate to each
other. That means that using crossovers you can't build a cluster with
more that 2 nodes.

With that said if you have 3 servers connected to one network and also
forming 2 cross-connected pairs you can build a 3 node cluster with 1
Corosync ring for Pacemaker communication and also use your crossovers
for DRBD data replication. But those crossovers won't be considered as
a Corosync rings. If you want to have real redundancy rings you have
to replace your crossovers with a switch and connect all 3 nodes to
it. To have more rings you have to add more switches.


 --
 Eric Robinson



 Disclaimer - October 22, 2010
 This email and any files transmitted with it are confidential and intended 
 solely for General Linux-HA mailing list. If you are not the named addressee 
 you should not disseminate, distribute, copy or alter this email. Any views 
 or opinions presented in this email are solely those of the author and might 
 not represent those of Physicians' Managed Care or Physician Select 
 Management. Warning: Although Physicians' Managed Care or Physician Select 
 Management has taken reasonable precautions to ensure no viruses are present 
 in this email, the company cannot accept responsibility for any loss or 
 damage arising from the use of this email or attachments.
 This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Redundant Rings Still Not There?

2010-10-23 Thread Serge Dubrouski
On Sat, Oct 23, 2010 at 12:23 PM, Robinson, Eric
eric.robin...@psmnv.com wrote:
 Looks like you are mixing up physical connections
 and Corosync rings.

 I should not have mentioned DRBD at all as it confuses the question.

 Let me try it this way:

 How do I build a three-node Corosync cluster with redundant heartbeat
 paths? I don't trust the switched network or the Ethernet bonding
 drivers to be 100% reliable, and it is just good practice to have
 multiple heartbeat paths. On my old 2-node clusters, I have three
 heartbeat paths: the switched network, back-to-back links, and serial
 cables.

You have to connect all cluster nodes to all redundant paths. That
means that your only choice is several switched networks/VLANs


 It sounds like you are saying that to have multiple heartbeat paths on a
 3-node Corosync cluster, each heartbeat path must be through a separate
 switched network or VLAN. I can see why this would be the case.

Exactly.


 I was hoping that a crossover cable could be used to form a logical
 ring between two nodes, and that I could configure two logical rings
 between 3 servers.

Nope. All nodes should be able to communicate to each other though all
configured rings/paths,


 So really, maybe I'm not trying to build a 3-node cluster. What I'm
 really trying to build are two 2-node clusters where one of the physical
 servers participates in BOTH 2-node clusters. CLUSTER1 would consist of
 physical servers A and C. CLUSTER2 would consist of physical servers B
 and C.

That's probably possible but I'm not sure. In this case you can use
crossovers between A and C and B and C. BTW, your server C will have
to have more resources (memory and CPU) than A and B because
theoretically you can end up in a situation when node C has to run
applications from node A and node B at the same time.


 So maybe what I want to know is, is it possible to run multiple
 instances of Corosync on server C, such that it participates in two
 separate clusters?

I personally never done so but it should be possible, but really
complicated to configure. I really don't know how to run two instances
of Pacemaker on one node.


 Thanks for your patience. I had no idea this would end up being so
 complicated. 3-node cluster is much easier to say than to configure,
 apparently. :-)

Not at all. You just need to abandon crossovers and SLIPs and start
using switched networks.


 --
 Eric Robinson



 Disclaimer - October 23, 2010
 This email and any files transmitted with it are confidential and intended 
 solely for General Linux-HA mailing list. If you are not the named addressee 
 you should not disseminate, distribute, copy or alter this email. Any views 
 or opinions presented in this email are solely those of the author and might 
 not represent those of Physicians' Managed Care or Physician Select 
 Management. Warning: Although Physicians' Managed Care or Physician Select 
 Management has taken reasonable precautions to ensure no viruses are present 
 in this email, the company cannot accept responsibility for any loss or 
 damage arising from the use of this email or attachments.
 This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Redundant Rings Still Not There?

2010-10-22 Thread Serge Dubrouski
On Thu, Oct 21, 2010 at 9:22 AM, Robinson, Eric eric.robin...@psmnv.com wrote:
 The way resources move around has nothing to do with how you
 setup corosync rings. For each ring, all nodes must be accessible
 over the interface specified in the interface section.
 How else can one form a ring? ;-)

 I think the confusion is entering the picture because (1) I'm using
 back-to-back Ethernet connections for DRBD replication. Those are
 point-to-point links from NODE1 to NODE3 and NODE2 to NODE3, but there
 is no link necessary between NODE1 and NODE2 because there is no DRBD
 replication between them. But (2) I am also using those links for
 corosync communication because they are more reliable than using the
 bonded interfaces through the switched network (although I am using
 those too).

 So I guess what I'm trying to accomplish is to have three separate
 corosync rings:

 -- Ring 1 through the switched network that includes all three nodes,
 where pacemaker is configured with resource constraints to keep R1 and
 R2 on their assigned node pairs.

 -- Ring 2 that includes NODE1 and NODE3 (logically a two-node ring,
 though technically just back-to-back)

 -- Ring 3 that includes NODE2 and NODE3 (logically a two-node ring,
 though technically just back-to-back)

 Does that make sense?

I'm probably missing something, but as far as I know DRBD doesn't use
Corosync/OpenAIS. So Ring 2 and Ring 3 in your terminology are just
network links for DRBD data replication, right?

 --
 Eric Robinson


 Disclaimer - October 21, 2010
 This email and any files transmitted with it are confidential and intended 
 solely for General Linux-HA mailing list. If you are not the named addressee 
 you should not disseminate, distribute, copy or alter this email. Any views 
 or opinions presented in this email are solely those of the author and might 
 not represent those of Physicians' Managed Care or Physician Select 
 Management. Warning: Although Physicians' Managed Care or Physician Select 
 Management has taken reasonable precautions to ensure no viruses are present 
 in this email, the company cannot accept responsibility for any loss or 
 damage arising from the use of this email or attachments.
 This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Redundant Rings Still Not There?

2010-10-22 Thread Serge Dubrouski
On Fri, Oct 22, 2010 at 1:02 PM, Robinson, Eric eric.robin...@psmnv.com wrote:
 I'm probably missing something, but as far as I know DRBD
 doesn't use Corosync/OpenAIS. So Ring 2 and Ring 3 in your
 terminology are just network links for DRBD data replication,
 right?

 I would be using those links for both DRBD replication and (unrelatedly)
 Corosync communication.

Then I'm missing the structure of your cluster.
Are you building one cluster that Includes 3 nodes? If yes then you
need those nodes to be connected to all 3 rings.
Are you building 3 clusters: one with 3 nodes and 2 more with 2 nodes
each? If yes than why it has to be that complicated?


 --
 Eric Robinson



 Disclaimer - October 22, 2010
 This email and any files transmitted with it are confidential and intended 
 solely for General Linux-HA mailing list. If you are not the named addressee 
 you should not disseminate, distribute, copy or alter this email. Any views 
 or opinions presented in this email are solely those of the author and might 
 not represent those of Physicians' Managed Care or Physician Select 
 Management. Warning: Although Physicians' Managed Care or Physician Select 
 Management has taken reasonable precautions to ensure no viruses are present 
 in this email, the company cannot accept responsibility for any loss or 
 damage arising from the use of this email or attachments.
 This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Redundant Rings Still Not There?

2010-10-22 Thread Serge Dubrouski
On Fri, Oct 22, 2010 at 4:27 PM, Robinson, Eric eric.robin...@psmnv.com wrote:
 Then I'm missing the structure of your cluster Are you
 building one cluster that Includes 3 nodes? If yes then
 you need those nodes to be connected to all 3 rings.
 Are you building 3 clusters: one with 3 nodes and 2 more
 with 2 nodes each? If yes than why it has to be that
 complicated?

 I am building a 3-node cluster and adding it to a network that already
 had a separate 2-node cluster.

 The new 3-node cluster would have 2 nodes actively serving up resources
 with 1 node acting as a failover for both of the active nodes.

In this case all 3 nodes have to be connected to all rings if you want
to use those rings for Pacemaker communications. And those rings have
to be separate from existing cluster rings.


 --
 Eric Robinson


 Disclaimer - October 22, 2010
 This email and any files transmitted with it are confidential and intended 
 solely for General Linux-HA mailing list. If you are not the named addressee 
 you should not disseminate, distribute, copy or alter this email. Any views 
 or opinions presented in this email are solely those of the author and might 
 not represent those of Physicians' Managed Care or Physician Select 
 Management. Warning: Although Physicians' Managed Care or Physician Select 
 Management has taken reasonable precautions to ensure no viruses are present 
 in this email, the company cannot accept responsibility for any loss or 
 damage arising from the use of this email or attachments.
 This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat with postgresql

2010-10-20 Thread Serge Dubrouski
On Wed, Oct 20, 2010 at 11:09 AM, Greg Woods wo...@ucar.edu wrote:
 On Wed, 2010-10-20 at 08:13 +0200, Andrew Beekhof wrote:

  Um, maybe because heartbeat v1 has a much much much much less steep
  learning curve?

 I dispute that:

    
 http://theclusterguy.clusterlabs.org/post/178680309/configuring-heartbeat-v1-was-so-simple


 This addresses the fact that Pacemaker has many features that heartbeat
 v1 lacks. That is not in dispute, but it completely sidesteps the point
 that heartbeat v1 is sufficient for many uses and much easier to get
 working. I have not said that heartbeat v1 is better than pacemaker,
 only that it is easier to get working. The question was asked why would
 anyone want to use heartbeat v1. Here is one valid answer to that
 question. This point has been made on this list before by myself and
 others, and yet the question why would anyone want to use heartbeat v1
 continues to be asked. I understand that nobody has any interest in
 developing heartbeat v1 any more. I accept this, I have moved on to v3
 and Pacemaker. But that does not invalidate the answer to the original
 question.

First of all I didn't want to start this flame so I've never asked
why would anyone want to use heartbeat v1?. I'm not interested in
answer on this question. I asked why particular person decided to use
v1 in a particular case. I do not consider a cluster that has to
provide a shared IP address, a shared DRBD device, a shared file
system and a highly available instance of PostgreSQL as a simple
case. Even more, I think that a cluster like that should never been
built in v1 because v1 lacks monitoring resources feature so it
can't provide highly available instance of database at all.

So please let's stop this general discussion because it won't get us
anywhere. And let's concentrate on the original request if originator
is still interested in that.


 --Greg


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat with postgresql

2010-10-19 Thread Serge Dubrouski
Any particular reason for using Heartbeat v1 instead of CRM/Pacemaker?

On Mon, Oct 18, 2010 at 10:11 PM, Linux Cook linuxc...@gmail.com wrote:
 hi!

 I used the tarball package of postgresql and recompiled it. Postgres now
 resides at /usr/local/pgsql and mounting /usr/local/pgsql/data into
 /dev/drbd0.

 However, hearbeat recognizes my Filesystem and IPaddr2 resources but not my
 postgresql service. My haresources config below:

 dmcstest1 drbddisk::postgres
 Filesystem::/dev/drbd0::/usr/local/pgsql/data::ext3 IPaddr2::
 10.110.10.250/255.255.255.0/eth2
 dmcstest1 postgresql

 I've seen an error like:

 resource postgreql is active and should not be on the logs but postgresql
 is disabled on bootup. What does this mean?

 Please help!

 Thank you.

 Oliver

 On Sat, Oct 16, 2010 at 2:41 AM, Linux Cook linuxc...@gmail.com wrote:

 thanks michael and vadym,

 Will try your suggestions and will let you know.

 On Fri, Oct 15, 2010 at 4:46 AM, Vadym Chepkov vchep...@gmail.com wrote:


 On Oct 15, 2010, at 3:28 AM, Linux Cook wrote:

  hi!
 
  I've just setup heartbeat + drbd with postgresql. I'm mirroring
 /dev/drbd0
  to /var/lib/postgresql. The problem is, postgresql service can't start
  because everytime heartbeat mounts the /dev/drbd0 to
 /var/lib/postgresql, it
  changes it user and group ownership instead of just postgres user and
 group.
 

 I wouldn't recommend using /var/lib/postgres as a mount point.
 on rpm based systems you would probably break package integrity and
 I don't think it's safe to have a non-root user to be owner of the mount
 point
 Just create a new directory, '/pgsql', for example, and use it, instead.

 primitive pgsql ocf:heartbeat:pgsql \
        params pgdba=postgres pgdata=/pgsql/data
 logfile=/pgsql/log/pgsql.log \
        op monitor start-delay=60s interval=5min

 Vadym



 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat with postgresql

2010-10-19 Thread Serge Dubrouski
On Tue, Oct 19, 2010 at 10:44 AM, Greg Woods wo...@ucar.edu wrote:
 On Tue, 2010-10-19 at 10:01 -0600, Serge Dubrouski wrote:
 Any particular reason for using Heartbeat v1 instead of CRM/Pacemaker?

 Um, maybe because heartbeat v1 has a much much much much less steep
 learning curve? If you have a simple two-node cluster where one node is
 just a hot spare, it is way way way way easier to get it working with
 heartbeat v1.

 The first time I ever set up a high availability cluster, going in
 knowing nothing at all about it, I had a heartbeat v1 cluster working in
 a couple of days. Already having had considerable heartbeat v1
 experience, it took me a couple of months to get a cluster working under
 heartbeat v3/Pacemaker. The pace of development is also high enough that
 the documentation often lags behind reality. That is not a criticism, I
 know how hard it is to keep the documentation up to date (I am already
 in that mode now with these new clusters; nobody else knows how they
 work so I can't even take a vacation now that I have some production
 services running on them, until I finish writing up some administration
 procedures).

 Yes, no doubt a Pacemaker cluster is far more flexible, but when one
 doesn't need all that flexibility and just wants a simple two-node HA
 cluster, the simplicity of heartbeat v1 is very attractive.

 This shouldn't be a big a mystery as it seems to be. Face up to it:
 learning and properly configuring Pacemaker is HARD, even for
 experienced sysadmins. And unless you need the additional flexibility
 that Pacemaker offers, it seems like a lot of extra effort.

 Will I use Pacemaker all the time in the future? Yes, because I have
 already put in the effort to learn and configure it. Setting up a new
 cluster, where I had an existing one to use as a template, took less
 than a week. But that first time, it was difficult, time consuming, and
 often frustrating.

 --Greg

I see you could do it and now you are going to use Pacemaker all the
time in the future. Than I see no reason why other can't do it as
well taking into account that Heartbeat v1 almost not supported and
definitely has no future unless somebody will decide to fork it out of
the current project and start a new one independent of Pacemaker.




 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat with postgresql

2010-10-19 Thread Serge Dubrouski
On Tue, Oct 19, 2010 at 12:41 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 Serge Dubrouski wrote:

 I see you could do it and now you are going to use Pacemaker all the
 time in the future. Than I see no reason why other can't do it as
 well taking into account that Heartbeat v1 almost not supported and
 definitely has no future unless somebody will decide to fork it out of
 the current project and start a new one independent of Pacemaker.

 I have to say, if it takes less than a week to set up a cluster when
 you already know how to do it... I suspect it'll take me less than a
 week to have a working clone of heartbeat v1 -- and that's without
 doing any programming. I mean, how hard can it be to have set up mon to
 ping the other node and fire up a few scripts when it stops responding.

 Let's face it: a 2-node active/passive cluster on xover (or serial)
 cable can only really guard against hardware failure. In which case 99%
 of the time you don't need to care about split brain and everything that
 comes with that.

Ok. Please let's stop this useless holywar and try to help to solve
the original problem: why PostgreSQL doesn't want to start on
Heartbeat v1. I personally have no idea since I've never used
Heartbeat v1 and not going to.


 Dima
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat with postgresql

2010-10-19 Thread Serge Dubrouski
On Tue, Oct 19, 2010 at 1:49 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 Serge Dubrouski wrote:

 Ok. Please let's stop this useless holywar and try to help to solve
 the original problem: why PostgreSQL doesn't want to start on
 Heartbeat v1. I personally have no idea since I've never used
 Heartbeat v1 and not going to.

 OP's problem has nothing to with heartbeat versions, you're the one who
 brought that up.

Yes, because I wanted to know why he uses it.


 The easiest fix was to create /drbdfs/pgsql with proper ownership and
 symlink /var/lib/pgsql to it. Now that he's recompiled everything, who
 knows.

Or manually mount /var/lib/pgsql/data and fix ownership after that,
it'll stay then. BTW, looks like now he has a different problem that
is related to version of Heartbeat somehow.


 Dima
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat with postgresql

2010-10-19 Thread Serge Dubrouski
On Tue, Oct 19, 2010 at 2:00 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 Serge Dubrouski wrote:
 On Tue, Oct 19, 2010 at 1:49 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu 
 wrote:

 The easiest fix was to create /drbdfs/pgsql with proper ownership and
 symlink /var/lib/pgsql to it. Now that he's recompiled everything, who
 knows.

 Or manually mount /var/lib/pgsql/data and fix ownership after that,
 it'll stay then. BTW, looks like now he has a different problem that
 is related to version of Heartbeat somehow.


 Or him recompiling postgres: whatever the ocf monitoring script is
 looking for may now be in a different place. Like I said, who knows...

He's not using OCF. And that was the reason for my first question.


 Dima
 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Serge Dubrouski.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


  1   2   3   4   >