Re: [Linux-ha-dev] Dovecot OCF Resource Agent

2011-08-03 Thread Dejan Muhamedagic
Hi Jeroen,

On Fri, Jul 22, 2011 at 10:51:56AM +0200, jer...@intuxicated.org wrote:
 
 On Fri, 15 Apr 2011 14:45:59 +0200, Raoul Bhatia [IPAX]
 r.bha...@ipax.at
 wrote:
  On 04/15/2011 01:19 PM, Andrew Beekhof wrote:
  On Fri, Apr 15, 2011 at 12:53 PM, Raoul Bhatia [IPAX] r.bha...@ipax.at
  wrote:
  On 04/15/2011 11:10 AM, jer...@intuxicated.org wrote:
 
  Yes, it does the same thing but contains some additional features,
 like
  logging into a mailbox.
 
  first of all, i do not know how the others think about a ocf ra
  implemented in c. i'll suggest waiting for comments from dejan or
  fghass.
  
  the ipv6addr agent was written in C too
  the OCF standard does not dictate the language to be used - its really
  a matter of whether C is the best tool for this job
  
  thank you andrew!
  
  jeroen, can you please create a github fork off
  https://github.com/ClusterLabs/ (it's really easy!)
  
  and add your resource agent in the same fashion as IPv6addr.c [1] ?
  
  thanks,
  raoul
  
  [1]
 
 https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/IPv6addr.c
 
 Hi,
 
 I finally found some time to get the code on GitHub.
 
 https://github.com/perrit/dovecot-ocf-resource-agent
 
 As you can see it's kind of hard to merge the code in the same way as
 IPv6addr.c as it currently spans multiple files. Would you like me to just
 put it in a directory? Maybe it's a good idea to split the dovecot part and
 the mailbox login part, so that there's a mailbox login resource agent
 becomes more like the ping resource agent?

I really hate to say it, since you obviously invested quite a
bit of time to put together this agent, but C is arguably not
the best suited programming language for resource agents. I
guess that's why all init scripts are, well, shell scripts. And
all but one of our OCF resource agents. The code is around
4kloc, which is as big as some of our subsystems. That's a lot
of code to read and maintain.

Was there a good reason to choose C for the implementation?

Cheers,

Dejan

 Regards,
 Jeroen
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] location and orders : Question about a behavior ...

2011-08-03 Thread Dan Frincu
Hi,

On Tue, Aug 2, 2011 at 6:06 PM,  alain.mou...@bull.net wrote:
 Hi

 I have this simple configuration of locations and orders between resources
 group-1 , group-2 and clone-1
 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) :

 location loc1-group-1   group-1 +100: node2
 location loc1-group-2   group-2 +100: node3

 order order-group-1   inf: group-1   clone-1
 order order-group-2   inf: group-2   clone-1

 property $id=cib-bootstrap-options \
        dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \
        cluster-infrastructure=openais \
        expected-quorum-votes=2 \
        stonith-enabled=true \
        no-quorum-policy=ignore \
        default-resource-stickiness=5000 \

I use it as:
rsc_defaults $id=rsc-options \
resource-stickiness=1000
Instead of:
property $id=cib-bootstrap-options \
default-resource-stickiness=5000
And the behavior is the expected one, no failback.

HTH,
Dan


 (and no current cli- preferences)

 When I stop the node2, the group-1 is well migrated on node3
 But when node2 is up again, and that I start Pacemaker again on node2,
 the group-1 automatically comes back on node2 , and I wonder why ?

 I have other similar configuration with same location constraints and same
 default-resource-stickiness value, but without order with a clone
 resource,
 and the group does not come back automatically. But I don't understand why
 this order constraint would change this behavior ...

 Thanks for your help
 Alain Moullé

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Dan Frincu
CCNA, RHCE
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] location and orders : Question about a behavior ...

2011-08-03 Thread alain . moulle
Hi  Thanks

I don't think the 1000 or 5000 value makes any difference,
so the rsc_options could make it work ?
But do you have also the order with a clone ?
Because on other of my configurations, I have also
property $id=cib-bootstrap-options \
default-resource-stickiness=5000
and the resource does not failback automatically ... so ...
Could somebody explain ?
Thanks
Alain



De :Dan Frincu df.clus...@gmail.com
A : General Linux-HA mailing list linux-ha@lists.linux-ha.org
Date :  03/08/2011 13:00
Objet : Re: [Linux-HA] location and orders : Question about a behavior ...
Envoyé par :linux-ha-boun...@lists.linux-ha.org



Hi,

On Tue, Aug 2, 2011 at 6:06 PM,  alain.mou...@bull.net wrote:
 Hi

 I have this simple configuration of locations and orders between 
resources
 group-1 , group-2 and clone-1
 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) :

 location loc1-group-1   group-1 +100: node2
 location loc1-group-2   group-2 +100: node3

 order order-group-1   inf: group-1   clone-1
 order order-group-2   inf: group-2   clone-1

 property $id=cib-bootstrap-options \
dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
stonith-enabled=true \
no-quorum-policy=ignore \
default-resource-stickiness=5000 \

I use it as:
rsc_defaults $id=rsc-options \
resource-stickiness=1000
Instead of:
property $id=cib-bootstrap-options \
default-resource-stickiness=5000
And the behavior is the expected one, no failback.

HTH,
Dan


 (and no current cli- preferences)

 When I stop the node2, the group-1 is well migrated on node3
 But when node2 is up again, and that I start Pacemaker again on node2,
 the group-1 automatically comes back on node2 , and I wonder why ?

 I have other similar configuration with same location constraints and 
same
 default-resource-stickiness value, but without order with a clone
 resource,
 and the group does not come back automatically. But I don't understand 
why
 this order constraint would change this behavior ...

 Thanks for your help
 Alain Moullé

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Dan Frincu
CCNA, RHCE
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] location and orders : Question about a behavior ...

2011-08-03 Thread Dan Frincu
Hi,

On Wed, Aug 3, 2011 at 2:22 PM,  alain.mou...@bull.net wrote:
 Hi  Thanks

 I don't think the 1000 or 5000 value makes any difference,

The values make little difference, it's about having a higher score atm.

 so the rsc_options could make it work ?

Yes, I believe so.

 But do you have also the order with a clone ?

No.

 Because on other of my configurations, I have also
 property $id=cib-bootstrap-options \
        default-resource-stickiness=5000
 and the resource does not failback automatically ... so ...
 Could somebody explain ?

Try the following:
crm_verify -L 21 | grep stick

And see what scores (weights) are given to resources. Based on these
weights it might make more sense.

HTH,
Dan

 Thanks
 Alain



 De :    Dan Frincu df.clus...@gmail.com
 A :     General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Date :  03/08/2011 13:00
 Objet : Re: [Linux-HA] location and orders : Question about a behavior ...
 Envoyé par :    linux-ha-boun...@lists.linux-ha.org



 Hi,

 On Tue, Aug 2, 2011 at 6:06 PM,  alain.mou...@bull.net wrote:
 Hi

 I have this simple configuration of locations and orders between
 resources
 group-1 , group-2 and clone-1
 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) :

 location loc1-group-1   group-1 +100: node2
 location loc1-group-2   group-2 +100: node3

 order order-group-1   inf: group-1   clone-1
 order order-group-2   inf: group-2   clone-1

 property $id=cib-bootstrap-options \
        dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \
        cluster-infrastructure=openais \
        expected-quorum-votes=2 \
        stonith-enabled=true \
        no-quorum-policy=ignore \
        default-resource-stickiness=5000 \

 I use it as:
 rsc_defaults $id=rsc-options \
        resource-stickiness=1000
 Instead of:
 property $id=cib-bootstrap-options \
        default-resource-stickiness=5000
 And the behavior is the expected one, no failback.

 HTH,
 Dan


 (and no current cli- preferences)

 When I stop the node2, the group-1 is well migrated on node3
 But when node2 is up again, and that I start Pacemaker again on node2,
 the group-1 automatically comes back on node2 , and I wonder why ?

 I have other similar configuration with same location constraints and
 same
 default-resource-stickiness value, but without order with a clone
 resource,
 and the group does not come back automatically. But I don't understand
 why
 this order constraint would change this behavior ...

 Thanks for your help
 Alain Moullé

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Dan Frincu
 CCNA, RHCE
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Dan Frincu
CCNA, RHCE
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] location and orders : Question about a behavior ...

2011-08-03 Thread alain . moulle
Hi,

knowing that res1-[1-3] are in group-1 and res2-[1-3] are in group-2, the 
crm_verify -L 21 | grep stick displays:

debug: unpack_config: Default stickiness: 5000
debug: common_apply_stickiness: Resource clone-1:0: preferring current 
location (node=node2, weight=1)
debug: common_apply_stickiness: Resource res1-1: preferring current 
location (node=node2, weight=5000)
debug: common_apply_stickiness: Resource res1-2: preferring current 
location (node=node2, weight=5000)
debug: common_apply_stickiness: Resource res1-3: preferring current 
location (node=node2, weight=5000)

debug: common_apply_stickiness: Resource clone:1: preferring current 
location (node=node3, weight=1)
debug: common_apply_stickiness: Resource res2-1: preferring current 
location (node=node3, weight=5000)
debug: common_apply_stickiness: Resource res2-2: preferring current 
location (node=node3, weight=5000)
debug: common_apply_stickiness: Resource res2-3: preferring current 
location (node=node3, weight=5000)

but I don't know how to make conclusions of this information ...

Alain 



De :Dan Frincu df.clus...@gmail.com
A : General Linux-HA mailing list linux-ha@lists.linux-ha.org
Date :  03/08/2011 13:28
Objet : Re: [Linux-HA] location and orders : Question about a behavior ...
Envoyé par :linux-ha-boun...@lists.linux-ha.org



Hi,

On Wed, Aug 3, 2011 at 2:22 PM,  alain.mou...@bull.net wrote:
 Hi  Thanks

 I don't think the 1000 or 5000 value makes any difference,

The values make little difference, it's about having a higher score atm.

 so the rsc_options could make it work ?

Yes, I believe so.

 But do you have also the order with a clone ?

No.

 Because on other of my configurations, I have also
 property $id=cib-bootstrap-options \
default-resource-stickiness=5000
 and the resource does not failback automatically ... so ...
 Could somebody explain ?

Try the following:
crm_verify -L 21 | grep stick

And see what scores (weights) are given to resources. Based on these
weights it might make more sense.

HTH,
Dan

 Thanks
 Alain



 De :Dan Frincu df.clus...@gmail.com
 A : General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Date :  03/08/2011 13:00
 Objet : Re: [Linux-HA] location and orders : Question about a behavior 
...
 Envoyé par :linux-ha-boun...@lists.linux-ha.org



 Hi,

 On Tue, Aug 2, 2011 at 6:06 PM,  alain.mou...@bull.net wrote:
 Hi

 I have this simple configuration of locations and orders between
 resources
 group-1 , group-2 and clone-1
 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) :

 location loc1-group-1   group-1 +100: node2
 location loc1-group-2   group-2 +100: node3

 order order-group-1   inf: group-1   clone-1
 order order-group-2   inf: group-2   clone-1

 property $id=cib-bootstrap-options \
dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \
cluster-infrastructure=openais \
expected-quorum-votes=2 \
stonith-enabled=true \
no-quorum-policy=ignore \
default-resource-stickiness=5000 \

 I use it as:
 rsc_defaults $id=rsc-options \
resource-stickiness=1000
 Instead of:
 property $id=cib-bootstrap-options \
default-resource-stickiness=5000
 And the behavior is the expected one, no failback.

 HTH,
 Dan


 (and no current cli- preferences)

 When I stop the node2, the group-1 is well migrated on node3
 But when node2 is up again, and that I start Pacemaker again on node2,
 the group-1 automatically comes back on node2 , and I wonder why ?

 I have other similar configuration with same location constraints and
 same
 default-resource-stickiness value, but without order with a clone
 resource,
 and the group does not come back automatically. But I don't understand
 why
 this order constraint would change this behavior ...

 Thanks for your help
 Alain Moullé

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Dan Frincu
 CCNA, RHCE
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Dan Frincu
CCNA, RHCE
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] The active trap of the SNMP is delayed.

2011-08-03 Thread Gao,Yan
Hi Hideo,

On 08/02/11 09:14, renayama19661...@ybb.ne.jp wrote:
 Hi Yan,
 
 I confirmed that a trap was transmitted with a patch definitely.
OK, thanks!

 
 We request that we apply a patch to each pacemaker-mgmt of pacemaker1.0 and 
 pacemaker1.1.
Pushed. Since we don't have a separate branch, you might need to
back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
pacemaker-1.0.x

  
  * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new 
 version is released.( pacemaker-mgmt-2.1.0 ? )

We'll probably tag a new version in the near future.

Regards,
  Gao,Yan
-- 
Gao,Yan y...@suse.com
Software Engineer
China Server Team, SUSE.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] location and orders : Question about a behavior ...

2011-08-03 Thread Dan Frincu
Hi,

On Wed, Aug 3, 2011 at 3:00 PM,  alain.mou...@bull.net wrote:
 Hi,

 knowing that res1-[1-3] are in group-1 and res2-[1-3] are in group-2, the
 crm_verify -L 21 | grep stick displays:

 debug: unpack_config: Default stickiness: 5000
 debug: common_apply_stickiness: Resource clone-1:0: preferring current
 location (node=node2, weight=1)
 debug: common_apply_stickiness: Resource res1-1: preferring current
 location (node=node2, weight=5000)
 debug: common_apply_stickiness: Resource res1-2: preferring current
 location (node=node2, weight=5000)
 debug: common_apply_stickiness: Resource res1-3: preferring current
 location (node=node2, weight=5000)

 debug: common_apply_stickiness: Resource clone:1: preferring current
 location (node=node3, weight=1)
 debug: common_apply_stickiness: Resource res2-1: preferring current
 location (node=node3, weight=5000)
 debug: common_apply_stickiness: Resource res2-2: preferring current
 location (node=node3, weight=5000)
 debug: common_apply_stickiness: Resource res2-3: preferring current
 location (node=node3, weight=5000)

 but I don't know how to make conclusions of this information ...

Well, this isn't the only way to obtain information on score
allocation, there is also ptest -saL and the crm_verify, adding -V's
to each would increase the verbosity of the output. Anyway, you may
have a case where the score for a group on a node is higher than the
default stickiness value, therefore the failback occurs.

Use this script to get a better idea of what scores are assigned to
resources and then see what's causing this behavior.

http://hg.clusterlabs.org/pacemaker/1.1/raw-file/01e86afaaa6d/extra/showscores.sh

Regards,
Dan


 Alain



 De :    Dan Frincu df.clus...@gmail.com
 A :     General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Date :  03/08/2011 13:28
 Objet : Re: [Linux-HA] location and orders : Question about a behavior ...
 Envoyé par :    linux-ha-boun...@lists.linux-ha.org



 Hi,

 On Wed, Aug 3, 2011 at 2:22 PM,  alain.mou...@bull.net wrote:
 Hi  Thanks

 I don't think the 1000 or 5000 value makes any difference,

 The values make little difference, it's about having a higher score atm.

 so the rsc_options could make it work ?

 Yes, I believe so.

 But do you have also the order with a clone ?

 No.

 Because on other of my configurations, I have also
 property $id=cib-bootstrap-options \
        default-resource-stickiness=5000
 and the resource does not failback automatically ... so ...
 Could somebody explain ?

 Try the following:
 crm_verify -L 21 | grep stick

 And see what scores (weights) are given to resources. Based on these
 weights it might make more sense.

 HTH,
 Dan

 Thanks
 Alain



 De :    Dan Frincu df.clus...@gmail.com
 A :     General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Date :  03/08/2011 13:00
 Objet : Re: [Linux-HA] location and orders : Question about a behavior
 ...
 Envoyé par :    linux-ha-boun...@lists.linux-ha.org



 Hi,

 On Tue, Aug 2, 2011 at 6:06 PM,  alain.mou...@bull.net wrote:
 Hi

 I have this simple configuration of locations and orders between
 resources
 group-1 , group-2 and clone-1
 (on a two nodes ha cluster with Pacemaker-1.1.2-7 /corosync-1.2.3-21) :

 location loc1-group-1   group-1 +100: node2
 location loc1-group-2   group-2 +100: node3

 order order-group-1   inf: group-1   clone-1
 order order-group-2   inf: group-2   clone-1

 property $id=cib-bootstrap-options \
        dc-version=1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe \
        cluster-infrastructure=openais \
        expected-quorum-votes=2 \
        stonith-enabled=true \
        no-quorum-policy=ignore \
        default-resource-stickiness=5000 \

 I use it as:
 rsc_defaults $id=rsc-options \
        resource-stickiness=1000
 Instead of:
 property $id=cib-bootstrap-options \
        default-resource-stickiness=5000
 And the behavior is the expected one, no failback.

 HTH,
 Dan


 (and no current cli- preferences)

 When I stop the node2, the group-1 is well migrated on node3
 But when node2 is up again, and that I start Pacemaker again on node2,
 the group-1 automatically comes back on node2 , and I wonder why ?

 I have other similar configuration with same location constraints and
 same
 default-resource-stickiness value, but without order with a clone
 resource,
 and the group does not come back automatically. But I don't understand
 why
 this order constraint would change this behavior ...

 Thanks for your help
 Alain Moullé

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




 --
 Dan Frincu
 CCNA, RHCE
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

[Linux-HA] About cli-preference

2011-08-03 Thread alain . moulle
Hi

When we do a crm resource migrate resource-name, crm add a cli-preference 
in the configuration for
this resource.
I wonder if there is a way to tell Pacemaker that, once the resource is
running on the new target node, it could automatically remove the 
cli-preference ?
or for example after a given life-time for this cli-preference ?
(knowing that we have a configuration with resource-stickiness etc. which 
avoids
the automatic failback of the resource when the cli-preference is removed)

Thanks
Alain Moullé
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] About cli-preference

2011-08-03 Thread Dan Frincu
Hi,

On Wed, Aug 3, 2011 at 5:23 PM,  alain.mou...@bull.net wrote:
 Hi

 When we do a crm resource migrate resource-name, crm add a cli-preference
 in the configuration for
 this resource.
 I wonder if there is a way to tell Pacemaker that, once the resource is
 running on the new target node, it could automatically remove the
 cli-preference ?
 or for example after a given life-time for this cli-preference ?

# crm resource help migrate

Migrate a resource to a different node. If node is left out, the
resource is migrated by creating a constraint which prevents it from
running on the current node. Additionally, you may specify a
lifetime for the constraint---once it expires, the location
constraint will no longer be active.

Usage:
...
migrate rsc [node] [lifetime]
...

You can specify a lifetime.

HTH,
Dan

 (knowing that we have a configuration with resource-stickiness etc. which
 avoids
 the automatic failback of the resource when the cli-preference is removed)

 Thanks
 Alain Moullé
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Dan Frincu
CCNA, RHCE
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Heartbeat Restart is not same as Stop and Start

2011-08-03 Thread Rahul Kanna
Hi,

Our system setup:

Heartbeat 3.0.3
DRBD (to manage file system and it is one of the resource managed by CRM)
Redhat Linux
Pacemaker

We have built an application on top of Linux-HA for users to configure
cluster by giving IP addresses of the nodes, do operations like Restart
system, Change host names, Resolve split-brain scenario etc.
In our application, we ran into problem when we do heartbeat restart for
some operation and then when user does Restart System which internally
runs the command shutdown -r now. I believe this due to heartbeat lsb
script and I have explained the scenario below.

Problem:

In the heartbeat lsb script, restart does not remove and touches the
heartbeat lock file.

On, heartbeat start, the lsb script starts heartbeat and touches
/var/lock/subsys/heartbeat lock file.

On, heartbeat stop, the lsb script stops heartbeat and removes the lock
file at /var/lock/subsys/heartbeat.

On, heartbeat restart, the lsb script stops heartbeat and starts
heartbeat. But DOES NOT remove or touches the lock file.

We call heartbeat restart instead of heartbeat start through our script
because we are not sure whether heartbeat is already running or not. So when
heartbeat restart is called when heartbeat is NOT running, heartbeat lsb
script tries to stop but its not running so it just starts heartbeat BUT
after starting, heartbeat lock file is not touched (because of restart in
heartbeat lsb). So now, in the system heartbeat is running (can verify this
by looking for heartbeat process or heartbeat status command) but there is
no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux
kernal to know what all process it has to stop when it shuts down (shutdown
-r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not
running (because there is no lock file) and does not stop heartbeat
properly. When it comes back up, heartbeat is started but heartbeat state is
not correct (because it was not stopped properly).
Due to this, this node is identifies as Primary though the erstwhile
Secondary node has become Primary now and this causes split-brain.

So I believe, heartbeat restart should do exactly as heartbeat stop and
heartbeat start which is not the case now.
Can you please let me know if my understanding is correct and it is a bug in
Heartbeat lsb script? Thanks for looking into it.

I have given below the relevant code from heartbeat lsb script as well

File: /etc/init.d/heartbeat

  start)
RunStartStop pre-start
StartHA
RC=$?
echo
if
  [ $RC -eq 0 ]
then
  [ ! -d $LOCKDIR ]  mkdir -p $LOCKDIR
  touch $LOCKDIR/$SUBSYS
fi
RunStartStop post-start $RC
;;

  stop)
RunStartStop pre-stop
StopHA
RC=$?
echo
if
  [ $RC -eq 0 ]
then
  rm -f $LOCKDIR/$SUBSYS
fi
RunStartStop post-stop $RC
;;

  restart)
sleeptime=`ha_parameter deadtime`
StopHA
echo
echo -n Waiting to allow resource takeover to complete:
sleep $sleeptime
sleep 10 # allow resource takeover to complete (hopefully).
echo_success
echo
StartHA
echo
;;
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start

2011-08-03 Thread mike
Permission problem perhaps? Not really sure what you're doing but the 
fact that you have users configuring the cluster (why do you do this 
btw?) may be pointing to a permission issue.

-mgb
On 11-08-03 06:57 PM, Rahul Kanna wrote:
 Hi,

 Our system setup:

 Heartbeat 3.0.3
 DRBD (to manage file system and it is one of the resource managed by CRM)
 Redhat Linux
 Pacemaker

 We have built an application on top of Linux-HA for users to configure
 cluster by giving IP addresses of the nodes, do operations like Restart
 system, Change host names, Resolve split-brain scenario etc.
 In our application, we ran into problem when we do heartbeat restart for
 some operation and then when user does Restart System which internally
 runs the command shutdown -r now. I believe this due to heartbeat lsb
 script and I have explained the scenario below.

 Problem:

 In the heartbeat lsb script, restart does not remove and touches the
 heartbeat lock file.

 On, heartbeat start, the lsb script starts heartbeat and touches
 /var/lock/subsys/heartbeat lock file.

 On, heartbeat stop, the lsb script stops heartbeat and removes the lock
 file at /var/lock/subsys/heartbeat.

 On, heartbeat restart, the lsb script stops heartbeat and starts
 heartbeat. But DOES NOT remove or touches the lock file.

 We call heartbeat restart instead of heartbeat start through our script
 because we are not sure whether heartbeat is already running or not. So when
 heartbeat restart is called when heartbeat is NOT running, heartbeat lsb
 script tries to stop but its not running so it just starts heartbeat BUT
 after starting, heartbeat lock file is not touched (because of restart in
 heartbeat lsb). So now, in the system heartbeat is running (can verify this
 by looking for heartbeat process or heartbeat status command) but there is
 no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux
 kernal to know what all process it has to stop when it shuts down (shutdown
 -r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not
 running (because there is no lock file) and does not stop heartbeat
 properly. When it comes back up, heartbeat is started but heartbeat state is
 not correct (because it was not stopped properly).
 Due to this, this node is identifies as Primary though the erstwhile
 Secondary node has become Primary now and this causes split-brain.

 So I believe, heartbeat restart should do exactly as heartbeat stop and
 heartbeat start which is not the case now.
 Can you please let me know if my understanding is correct and it is a bug in
 Heartbeat lsb script? Thanks for looking into it.

 I have given below the relevant code from heartbeat lsb script as well

 File: /etc/init.d/heartbeat

start)
  RunStartStop pre-start
  StartHA
  RC=$?
  echo
  if
[ $RC -eq 0 ]
  then
[ ! -d $LOCKDIR ]  mkdir -p $LOCKDIR
touch $LOCKDIR/$SUBSYS
  fi
  RunStartStop post-start $RC
  ;;

stop)
  RunStartStop pre-stop
  StopHA
  RC=$?
  echo
  if
[ $RC -eq 0 ]
  then
rm -f $LOCKDIR/$SUBSYS
  fi
  RunStartStop post-stop $RC
  ;;

restart)
  sleeptime=`ha_parameter deadtime`
  StopHA
  echo
  echo -n Waiting to allow resource takeover to complete:
  sleep $sleeptime
  sleep 10 # allow resource takeover to complete (hopefully).
  echo_success
  echo
  StartHA
  echo
  ;;
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] The active trap of the SNMP is delayed.

2011-08-03 Thread renayama19661014
Hi Yan,

 Pushed. Since we don't have a separate branch, you might need to
 back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
 pacemaker-1.0.x

Thanks!!

However, we need the release of pacemaker-mgmt for Pacemaker1.0.

Is it impossible you apply a patch to a repository of pacemaker-mgmt-2.0.0, and 
to release?

 * http://hg.clusterlabs.org/pacemaker/pygui/rev/18332eae086e

   * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new 
  version is released.( pacemaker-mgmt-2.1.0 ? )
 
 We'll probably tag a new version in the near future.

Ok.

Beat Regards,
Hideo Yamauchi.


--- On Wed, 2011/8/3, Gao,Yan y...@novell.com wrote:

 Hi Hideo,
 
 On 08/02/11 09:14, renayama19661...@ybb.ne.jp wrote:
  Hi Yan,
  
  I confirmed that a trap was transmitted with a patch definitely.
 OK, thanks!
 
  
  We request that we apply a patch to each pacemaker-mgmt of pacemaker1.0 and 
  pacemaker1.1.
 Pushed. Since we don't have a separate branch, you might need to
 back-port this patch to pacemaker-mgmt-2.0.0, which is compatible with
 pacemaker-1.0.x
 
   
   * After this correction, pacemaker-mgmt of Pacemaker1.0 hopes that a new 
 version is released.( pacemaker-mgmt-2.1.0 ? )
 
 We'll probably tag a new version in the near future.
 
 Regards,
   Gao,Yan
 -- 
 Gao,Yan y...@suse.com
 Software Engineer
 China Server Team, SUSE.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems