from:"Keisuke MORI"

Re: [Linux-HA] [ha-wg-technical] [Pacemaker] [RFC] Organizing HA Summit 2015

2014-12-22 Thread Keisuke MORI

Hi all,

Really late response but,
I will be joining the HA summit, with a few colleagues from NTT.

See you guys in Brno,
Thanks,


2014-12-08 22:36 GMT+09:00 Jan Pokorný jpoko...@redhat.com:
 Hello,

 it occured to me that if you want to use the opportunity and double
 as as tourist while being in Brno, it's about the right time to
 consider reservations/ticket purchases this early.
 At least in some cases it is a must, e.g., Villa Tugendhat:

 http://rezervace.spilberk.cz/langchange.aspx?mrsname=languageId=2returnUrl=%2Flist

 On 08/09/14 12:30 +0200, Fabio M. Di Nitto wrote:
 DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices.

 My suggestion would be to have a 2 days dedicated HA summit the 4th and
 the 5th of February.

 --
 Jan

 ___
 ha-wg-technical mailing list
 ha-wg-techni...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical




-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] [PATCH] High: ccm: fix a memory leak when a client exits

2013-09-09 Thread Keisuke MORI

Hi,

2013/9/6 Lars Ellenberg lars.ellenb...@linbit.com:
 On Wed, Sep 04, 2013 at 08:16:44PM +0900, Keisuke MORI wrote:
 Hi,

 The attached patch will fix a memory leak in ccm that occurs whenever a ccm
 client disconnect.

 Thank you.

 This may introduce double free for client_delete_all() now?

No, I do not think it does.
When an individual client exits, client_delete() removes the ipc
object from ccm_hashclient and hence
client_detete_all() will never call client_destroy() for the same ipc
object again.
The valgrind result did not complain regarding to this either.

Am I missing your point?



 All this aparently useless indirection seems to be from a time
 when client_destroy explicitly called into a -ops-destroy virtual
 function. Which it no longer does.

 So I think dropping the explicit calls to client_destroy, as well as
 the other then useless indirection functions, but instead do a
 g_hash_table_new_full with g_free in client_init would be the way to go.

It might be doable, but I do not think it is necessary to rewrite the
code for fixing this issue.

Thanks,



 Could you have a look?

 It would not affect to most of the installations because only crmd and cib
 are the client, but if you run any ccm client such as crm_node command
 periodically, ccm will increase its memory consumption.

 The valgrind outputs are also attached as the evidence of the leakage and
 the fix by the patch;
 The results are taken after crm_node command is executed 100 times.

 There still exists some definitely / indirectly / possibly lost , but as
 long as I've investigated they are all allocated only at the invocation
 time and not considered as a leak. Double checks are welcome.

 Thanks,
 --
 Keisuke MORI

 Cheers,
 Lars

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] High: ccm: fix a memory leak when a client exits

2013-09-04 Thread Keisuke MORI

Hi,

The attached patch will fix a memory leak in ccm that occurs whenever a ccm
client disconnect.

It would not affect to most of the installations because only crmd and cib
are the client, but if you run any ccm client such as crm_node command
periodically, ccm will increase its memory consumption.

The valgrind outputs are also attached as the evidence of the leakage and
the fix by the patch;
The results are taken after crm_node command is executed 100 times.

There still exists some definitely / indirectly / possibly lost , but as
long as I've investigated they are all allocated only at the invocation
time and not considered as a leak. Double checks are welcome.

Thanks,
-- 
Keisuke MORI


ccm-memleak.patch
Description: Binary data


valgrind-NG-HB305.log
Description: Binary data


valgrind-OK-HB305patched.log
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH][heartbeat] skip unnecessary waiting for lrmd invocation when LRMD_MAX_CHILDREN is set

2013-06-14 Thread Keisuke MORI

Hi,

The heartbeat init.d script waits for the lrmd invocation when
LRMD_MAX_CHILDREN is set and it does not return until initdead time is past
(or 40s timeout) unless you invoke all the nodes in the cluster at a same
time.

But this should not be necessary because the recent version of lrmd
(cluster-glue-1.0.10 or later) looks at the environment variable by itself.

The attached patch will improve so that the init.d script returns
immediately as usual even if the variable is set.

Regards,
-- 
Keisuke MORI


lrmd-max-children.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] IPaddr2 support of ipv6

2013-04-02 Thread Keisuke MORI

Hi,

2013/3/29 David Vossel dvos...@redhat.com:
 Hi,

 It looks like ipv6 support got added to the IPaddr2 agent last year.  I'm 
 curious why the metadata only advertises that the 'ip' option should be an 
 IPv4 address.

   ip (required): The IPv4 address to be configured in dotted quad 
 notation, for example192.168.1.1.

 Is this just an oversight?  If so this patch would probably help. 
 https://github.com/davidvossel/resource-agents/commit/07be0019a50b96743536ab50727b56d9175bf95f

Ah, yes that's just an oversight. Thank you for pointing out.
Would you submit your patch as a pull request?

Thanks,
-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-25 Thread Keisuke MORI

Hi Nick,

Could you privide which version of resource-agents you're using?

Prior to 3.9.2, IPv6addr requires a static IPv6 address with the
exactly same prefix to find out an apropriate nic; so you should have
statically assigned   2600:3c00::34:c003/116 on eth0 for example.

As of 3.9.3, it has relaxed and the specified nic is always used no
matter if the prefix does not match; so it should just work. (at least
it works for me)

Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a
virtual IPv6 address, which is brand new and I would prefer this
because it uses the standard ip command.

Thanks,

2013/3/25 Nick Walke tubaguy50...@gmail.com:
 This the correct place to report bugs?
 https://github.com/ClusterLabs/resource-agents

 Nick


 On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.dewrote:

 Hello Nick,

  I shouldn't be able to do that if the IPv6 module wasn't loaded,
  correct?

 that is correct. I tried modifying my netmask to copy yours. And I get
 the same error, you do:

 ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete):
 unknown error

 So probably a bug in the resource agent. Manually adding and removing
 works:

 (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0
 (node-62) [~] ip -6 addr show dev eth0
 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
 inet6 2a01:4f8:bb:400::2/116 scope global
valid_lft forever preferred_lft forever
 inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
valid_lft 2591887sec preferred_lft 604687sec
 inet6 fe80::225:90ff:fe97:dbb0/64 scope link
valid_lft forever preferred_lft forever
 (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0

 Nick, you can do the following things to resolve this:

 - Hunt down the bug and fix it or let someone else do it for you

 - Use another netmask, if possible (fighting the symptoms instead
 of
   resolving the root cause)

 - Write your own resource agent (fighting the symptoms instead of
   resolving the root cause)

 Cheers,
 Thomas
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-25 Thread Keisuke MORI

2013/3/25 Nick Walke tubaguy50...@gmail.com:
 Looks like 3.9.2-5.  So I need to statically assign the address I want to
 use before using it with IPv6addr?

Yes.


 On Mar 25, 2013 3:44 AM, Keisuke MORI keisuke.mori...@gmail.com wrote:

 Hi Nick,

 Could you privide which version of resource-agents you're using?

 Prior to 3.9.2, IPv6addr requires a static IPv6 address with the
 exactly same prefix to find out an apropriate nic; so you should have
 statically assigned   2600:3c00::34:c003/116 on eth0 for example.

 As of 3.9.3, it has relaxed and the specified nic is always used no
 matter if the prefix does not match; so it should just work. (at least
 it works for me)

 Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a
 virtual IPv6 address, which is brand new and I would prefer this
 because it uses the standard ip command.

 Thanks,

 2013/3/25 Nick Walke tubaguy50...@gmail.com:
  This the correct place to report bugs?
  https://github.com/ClusterLabs/resource-agents
 
  Nick
 
 
  On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.de
 wrote:
 
  Hello Nick,
 
   I shouldn't be able to do that if the IPv6 module wasn't loaded,
   correct?
 
  that is correct. I tried modifying my netmask to copy yours. And I get
  the same error, you do:
 
  ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete):
  unknown error
 
  So probably a bug in the resource agent. Manually adding and removing
  works:
 
  (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0
  (node-62) [~] ip -6 addr show dev eth0
  2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
  inet6 2a01:4f8:bb:400::2/116 scope global
 valid_lft forever preferred_lft forever
  inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
 valid_lft 2591887sec preferred_lft 604687sec
  inet6 fe80::225:90ff:fe97:dbb0/64 scope link
 valid_lft forever preferred_lft forever
  (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0
 
  Nick, you can do the following things to resolve this:
 
  - Hunt down the bug and fix it or let someone else do it for you
 
  - Use another netmask, if possible (fighting the symptoms
 instead
  of
resolving the root cause)
 
  - Write your own resource agent (fighting the symptoms instead
 of
resolving the root cause)
 
  Cheers,
  Thomas
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems



 --
 Keisuke MORI
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-ha-dev] [PATCH] crmsh: fix in python version checking

2013-02-28 Thread Keisuke MORI

Hi Dejan,

Here is a trivial patch for crmsh.
It is totally harmless because it affects only when the python version
is too old and crmsh would abort anyway :)

Thanks,
-- 
Keisuke MORI


python-version.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] announcement: resource-agents release candidate 3.9.5rc1

2013-01-31 Thread Keisuke MORI

Hi,

Does IPaddr2 need to support 'eth0:label' format in a single 'nic'
parameter when you want to use an interface label?

I thought it does't from the mata-data description of 'nic' but it
looks conflicting with the 'iflabel' description;

nic:
 Do NOT specify an alias interface in the form eth0:1 or anything here;
 rather, specify the base interface only.
 If you want a label, see the iflabel parameter.

iflabel:
 (omit)
 If a label is specified in nic name, this parameter has no effect.


The latest IPaddr2 (findif.sh version) would reject it as an invalid
nic has specified.
If we do need to support it I will submit a patch for this.

Thanks,


2013/1/30 Dejan Muhamedagic de...@suse.de:
 Hello,

 The current resource-agents repository has been tagged
 v3.9.5rc1. It is mainly a bug fix release.

 The full list of changes for the linux-ha RA set is available in
 ChangeLog:

 https://github.com/ClusterLabs/resource-agents/blob/v3.9.5rc1/ChangeLog

 We'll allow a week for testing. The final release is planned for
 Feb 6.

 Many thanks to all contributors!

 Best,

 The resource-agents maintainers
 ___
 ha-wg-technical mailing list
 ha-wg-techni...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] announcement: resource-agents release candidate 3.9.5rc1

2013-01-31 Thread Keisuke MORI

Hi Dejan,

2013/1/31 Dejan Muhamedagic de...@suse.de:
 The latest IPaddr2 (findif.sh version) would reject it as an invalid
 nic has specified.
 If we do need to support it I will submit a patch for this.

 I'd rather just update the iflabel description.

Me too, but

 After all,
 normally one doesn't need to specify the nic. But you'll
 get different preferences from different people :)
 However, it seems to be a regression, so we should probably allow
 labels.

Yes, that is true. I will submit a patch tomorrow.

 BTW, is this related to
 https://github.com/ClusterLabs/resource-agents/issues/200 ?

Yes, the proposed patch in #200 would not support the nic:label format
either so it should be re-written in a different way. Honestly I'm
wondering if it is really worth to fix #200 though.

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] agents: including LGPL license file

2012-12-25 Thread Keisuke MORI

Hi Dejan,

2012/12/24 Dejan Muhamedagic de...@suse.de:
 Hi Keisuke-san,

 On Thu, Dec 20, 2012 at 02:14:10PM +0900, Keisuke MORI wrote:
 Hi,

 The resource-agents package is licensed under GPL and LGPL,
 but the full copy of LGPL license file is missing
 as opposed to the heartbeat and the glue packages that includes it.

 Why don't we include COPYING.LGPL in the agents package too
 as the verbatim copy of LGPL license for the consistency?

 Not really an expert in the area, but I think there's no problem
 adding a copy of a license.

Thanks for your comment. I will submit a pull request for that.

There is no problem with the current package at all, but adding it
would be good to clarify which licenses we are using more precisely.

The background is that our legal division advised that there was a
'bogus' OSS project which claims as if they are using a popular OSS
license defined by OSI but actually they derived it with some
additional clauses and limitations for their benefits. Including a
verbatim copy of a license file will help for clarifying that we are a
valid OSS project.

Thanks,


-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] agents: including LGPL license file

2012-12-19 Thread Keisuke MORI

Hi,

The resource-agents package is licensed under GPL and LGPL,
but the full copy of LGPL license file is missing
as opposed to the heartbeat and the glue packages that includes it.

Why don't we include COPYING.LGPL in the agents package too
as the verbatim copy of LGPL license for the consistency?

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] RA trace facility

2012-11-28 Thread Keisuke MORI

Hi,

2012/11/27 Dejan Muhamedagic de...@suse.de:
(...)
 It might be also helpful if it has a kind of 'hook' functionality that
 allows you to execute an arbitrary script for collecting the runtime
 information such as CPU usage, memory status, I/O status or the list
 of running processes etc. for diagnosis.

 Yes. I guess that one could run such a hook in background. Did
 you mean that?

I first thought that it simply runs a one-shot hook at the invocation
of the RA instance,
but it would be great if it can run in background while running a RA operation.

 Or once the RA instance exited? This is a bit
 different feature though.

It is also possible if it can run a hook at the event of the RA
timeouts or a command in the RA gets stuck in some reason.

Thanks,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] RA trace facility

2012-11-22 Thread Keisuke MORI

Hi,

2012/11/22 Dejan Muhamedagic de...@suse.de:
 Hi Lars,

 On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote:
 On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote:

  Hi,
 
  This is little something which could help while debugging
  resource agents. Setting the environment variable __OCF_TRACE_RA
  would cause the resource agent run to be traced (as in set -x).
  PS4 is set accordingly (that's a bash feature, don't know if
  other shells support it). ocf-tester got an option (-X) to turn
  the feature on. The agent itself can also turn on/off tracing
  via ocf_start_trace/ocf_stop_trace.
 
  Do you find anything amiss?

 I *really* like this.

 But I'd like a different way to turn it on - a standard one that is
 available via the CIB configuration, without modifying the script.

 I don't really want that the script gets modified either.
 The above instructions are for people developing a new RA.

I like this, too.
I would be useful when you need to diagnose in the production
environment if you can enable / disable it without any modifications
to RAs.

It might be also helpful if it has a kind of 'hook' functionality that
allows you to execute an arbitrary script for collecting the runtime
information such as CPU usage, memory status, I/O status or the list
of running processes etc. for diagnosis.

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [resource-agents] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6. (#97)

2012-10-18 Thread Keisuke MORI

Hi Lars,

Thank you for your comments,
I'm going to answer to your comments below,
and if you have further comments I would greatly appreciate it.


2012/10/16 Lars Ellenberg l...@linbit.com:

 Again, apollogies for not having this sent out when I wrote it,
 I'm unsure why it hibernated in my Draft folder for five month,
 but it was not the only one :(

 I realize the pull request has meanwhile been closed,
 and we do have a findif.sh implementation in current git.

 Still I'll just send these comments as I wrote them back then,
 maybe some of them still apply.

 At the end, there is a bit of comment how to maybe re-implement
 the ipcheck and ifcheck functions without grep and awk.

 Feel free to ignore for now, though.
 I'll try to review again whats in git now,
 and send a proper git diff, once I find the time

   ;-)

 On Wed, May 30, 2012 at 11:20:03PM -0700, Keisuke MORI wrote:
 This is a proposal enhancement of IPaddr2 to support IPv6 as well as IPv4.

 I would appreciate your comments and suggestions for merging this into
 the upstream.

 NOTE: This pull request is meant for reviewing the code and
 discussions, and not intended to be merged as is at this moment.

 Github pull request comments are IMO not the best place to discuss these
 things, so I send to the linux-ha-dev mailing list as well.

 ## Benefits:

 * Unify the usage, behavior and the code maintenance between IPv4 and  IPv6 
 on Linux.

   The usage of IPaddr2 and IPv6addr are similar but they have
   different parameters and different behaviors.　In particular, they
   may choose a different interface depending on your configuration
   even if you provided similar parameters  in the past.

   IPv6addr is written in C and rather hard to make improvements. As
   /bin/ip already supports both IPv4 and IPv6, we can share the most
   of the code of IPaddr2 written in bash.


 IPv6addr is supposed to run on non-Linux as well.
 So we better not deprecate it, as long as all the world is not Linux.

Agreed.


 * usable for LVS on IPv6.

   IPv6addr does not support lvs_support=true and unfortunately there
   is no possible way to use LVS on IPv6 right now.

   IPaddr2(/bin/ip) works for LVS configurations without enabling
   lvs_support both for IPv4 and IPv6.

   (You don't have to remove an address on the loopback interface if
   the virtual address is assigned by using /bin/ip.)

   See also:
   http://www.gossamer-threads.com/lists/linuxha/dev/76429#76429

 * retire the old 'findif' binary.

   'findif' binary is replaced by a shell script version of findif,
   originally developed by lge.  See  ClusterLabs/resource-agents#53 :
   findif could be rewritten in shell

 * easier support for other pending issues

   These pending issues can be fix based on this new IPaddr2.  *
   ClusterLabs/resource-agents#68 : Allow ipv6addr to mark new address
   as deprecated  * ClusterLabs/resource-agents#77 : New RA that
   controls IPv6 address in loopback interface


 ## Notes / Changes:

 * findif semantics changes

   There are some incompatibility in deciding which interface to be
   used when your configuration is ambiguous. But in reality it should
   not be a problem as long as it's configured properly.

   The changes mostly came from fixing a bug in the findif binary
   (returns a wrong broadcast) or merging the difference between
   (old)IPaddr2 and IPv6addr.　  See the ofct test cases for details.
   (case No.6, No.9, No.10, No.12, No.15 in IPaddr2v4 test cases)

   Other notable changes are described below.

 * broadcast parameter for IPv4

   broadcast parameter may be required along with cidr_netmask when
   you want use a different subnet mask from the static IP address.
   It's because doing such calculation is difficult in the shell script
   version of findif.

   See the ofct test cases for details. (case No.11, No.14, No.16,
   No.17 in IPaddr2v4 test cases)

   This limitation may be eliminated if we would remove brd options
   from the /bin/ip command line.

 If we do not specify the broadcast at all, ip will do the right thing
 by default, I think.  We should only use it on the ip command line, if
 it is in the input parameters.
 I don't really have a use case for the broadcast address to not be the
 default, so I would be ok with dropping it completely.

It has been fixed and the latest code in the repo now should work like you said.



 * loopback(lo) now requires cidr_netmask or broadcast.

   See the ofct test case in the IPaddr2 ocft script. The reason is
   similar to the previous one.

 We really need to avoid breaking existing configurations.
 So we need to fix this.
 If we find nothing better, with some heuristic.

It has also been fixed now and loopback can be used as same as before.



 * loose error check for nic for a IPv6 link-local address.

   IPv6addr was able to check this, but in the shell script it is hard
   to determine a link-local address (requires bitmask calculation). I
   do not think it's worth

Re: [Linux-HA] Question about pacemaker + heartbeat + postgres in active/passive configuration

2012-08-19 Thread Keisuke MORI

Hi,

2012/8/16 Renee Riffee riffe...@gmail.com:
 His presentation is here:
 http://www.slideshare.net/ksk_ha/linuxconfauhaminiconfpgsql9120120116

 Click on the Save File button in the bar...

Yes, that is it. Thank you for finding and sharing it :).

The developer's wiki page provides more up-to-date information for
pgsql RA after it has included to the upstream:
https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication

As for the configuration for heartbeat, documents from linux-ha.org might help:
http://www.linux-ha.org/doc/users-guide/_creating_an_initial_heartbeat_configuration.html

Regards,
Keisuke MORI


 On Aug 15, 2012, at 4:54 PM, DENNY, MICHAEL wrote:

 Hi, Andrew.  Where can I find the presentation by Keisuke?  Btw, I use 
 heartbeat in combo with pacemaker also...but with MySQL...was just a 
 decision based on an impression at the time that most of the documentation 
 content was about heartbeat...and that heartbeat had a more stability.   And 
 it seemed that there was nothing negative being posted that would deter me 
 about the choice.   After being on this distribution list for a while, I now 
 think it's time for me to move to Corosync.

 - Mike

 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof
 Sent: Wednesday, August 15, 2012 7:40 PM
 To: General Linux-HA mailing list
 Subject: Re: [Linux-HA] Question about pacemaker + heartbeat + postgres in 
 active/passive configuration

 On Wed, Aug 15, 2012 at 10:25 PM, Renee Riffee riffe...@gmail.com wrote:
 Hello everyone,

 Apologies if this is not the correct group for this question, but I am 
 seeking information on how to set up pacemaker with heartbeat and postgres 
 in an active/passive streaming (pg 9.1) configuration.  I would prefer to 
 use heartbeat and not corosync, although most of the good tutorials like 
 the Cluster_from_scratch document use corosync with pacemaker, but I don't 
 have any shared storage to use it with on my machines.

 The presentation by Keisuke Mori of Linux-HA Japan is beautiful and exactly 
 what I really want to do, but I need more information on how to use 
 pacemaker with heartbeat instead of corosync.

 Assuming pacemaker was built with heartbeat support, simply install
 heartbeat and add:

   crm respawn

 to ha.cf

 Start heartbeat, job done.
 Resource configuration is unchanged from what you see in Clusters from 
 Scratch

 Does anyone know of a recipe or tutorial that goes through what his 
 presentation shows?

 Kind regards,
 -Renee


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.

2012-06-04 Thread Keisuke MORI

Hi Alan,

Thank you for your comments.

2012/5/31 Alan Robertson al...@unix.sh:
 It's straightforward to determine if an IP address is link-local or not -
 for an already configured address.

 3: eth1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state
 UP qlen 1000
     link/ether 94:db:c9:3f:7c:20 brd ff:ff:ff:ff:ff:ff
     inet 10.10.10.30/24 brd 10.10.10.255 scope global eth1
     inet6 fe80::96db:c9ff:fe3f:7c20/64 scope link
    valid_lft forever preferred_lft forever

 This works uniformly for both ipv4 and ipv6 addresses (quite nice!)

It's an interesting idea, but I don't think we need to care about IPv4
link-local addresses
because users can configure using the same manner as a regular IP address.
(and it's used very rarely)

In the case of IPv6 link-local addresses it is almost always a wrong
configuration if nic is missing
(the socket API mandate it) so we want to check it.


 However, for addresses which are not yet up (which is unfortunately what
 you're concerned with),  ipv6 link-local addresses take the form
   fe80:: -- followed by 64-bits of MAC addresses (48 bit
 MACs are padded out)

 http://en.wikipedia.org/wiki/Link-local_address

 MAC addresses never begin with 4 bytes of zeros, so the regular expression
 to match this is pretty straightforward.  This isn't a bad approximation
 (but could easily be made better):

Yes, you are right. Matching to 'fe80::' should be pretty easy and good enough.
Why I could not think of such a simple idea :)



 islinklocal() {
   if
  echo $1 | grep -i '^fe80::[^:]*:[^:]*:[^:]*:[^:]*$' /dev/null

We should also accept 'fe80::1'.
Anyway I will look into this way.

Thanks,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.

2012-05-31 Thread Keisuke MORI

I would like to propose an enhancement of IPaddr2 to support IPv6 as
well as IPv4.

I've submitted this as a pull request #97 but also posting to the ML
for a wider audience.

I would appreciate your comments and suggestions for merging this into
the upstream.


[RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.
https://github.com/ClusterLabs/resource-agents/pull/97


## Benefits:

* Unify the usage, behavior and the code maintenance between IPv4 and
  IPv6 on Linux.

  The usage of IPaddr2 and IPv6addr are similar but they have
  different parameters and different behaviors.
  In particular, they may choose a different interface depending
  on your configuration even if you provided similar parameters
  in the past.

  IPv6addr is written in C and rather hard to make improvements.
  As /bin/ip already supports both IPv4 and IPv6, we can share
  the most of the code of IPaddr2 written in bash.

* usable for LVS on IPv6.

  IPv6addr does not support lvs_support=true and unfortunately
  there is no possible way to use LVS on IPv6 right now.

  IPaddr2(/bin/ip) works for LVS configurations without
  enabling lvs_support both for IPv4 and IPv6.

  (You don't have to remove an address on the loopback interface
  if the virtual address is assigned by using /bin/ip.)

  See also:
  http://www.gossamer-threads.com/lists/linuxha/dev/76429#76429

* retire the old 'findif' binary.

  'findif' binary is replaced by a shell script version of
  findif, originally developed by lge.
  See findif could be rewritten in shell :
  https://github.com/ClusterLabs/resource-agents/issues/53

* easier support for other pending issues

  These pending issues can be fix based on this new IPaddr2.
  * Allow ipv6addr to mark new address as deprecated
https://github.com/ClusterLabs/resource-agents/issues/68
  * New RA that controls IPv6 address in loopback interface
https://github.com/ClusterLabs/resource-agents/pull/77


## Notes / Changes:

* findif semantics changes

  There are some incompatibility in deciding which interface to
  be used when your configuration is ambiguous. But in reality
  it should not be a problem as long as it's configured properly.

  The changes mostly came from fixing a bug in the findif binary
  (returns a wrong broadcast) or merging the difference between
  (old)IPaddr2 and IPv6addr.
  See the ofct test cases for details.
  (case No.6, No.9, No.10, No.12, No.15 in IPaddr2v4 test cases)

  Other notable changes are described below.

* broadcast parameter for IPv4

  broadcast parameter may be required along with cidr_netmask
  when you want use a different subnet mask from the static IP address.
  It's because doing such calculation is difficult in the shell
  script version of findif.

  See the ofct test cases for details.
  (case No.11, No.14, No.16, No.17 in IPaddr2v4 test cases)

  This limitation may be eliminated if we would remove
  brd options from the /bin/ip command line.

* loopback(lo) now requires cidr_netmask or broadcast.

  See the ofct test case in the IPaddr2 ocft script.
  The reason is similar to the previous one.

* loose error check for nic for a IPv6 link-local address.

  IPv6addr was able to check this, but in the shell script it is
  hard to determine a link-local address (requires bitmask calculation).
  I do not think it's worth to implement it in shell.

* send_ua: a new binary

  We need one new binary as a replacement of send_arp for IPv6 support.
  IPv6addr.c is reused to make this command.


Note that IPv6addr RA is still there and you can continue to use
it for the backward compatibility.


## Acknowledgement

Thanks to Tomo Nozawa-san for his hard work for writing and
testing this patch.

Thanks to Lars Ellenberg for the first findif.sh implementation.


Best Regards,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Modified patch for RA

2012-05-10 Thread Keisuke MORI

Hi Yves,

Thank you for revising the patch.
I've confirmed that this patch resumes the log level for mysql_status as before.

2012/5/5 Yves Trudeau y.trud...@videotron.ca:
 Hi Dejan,
  here's another modified patch for the mysql agent of the commit version
 4c18035 (g...@github.com:y-trudeau/resource-agents.git branch mysql-repl).
  Following a comment of Keisuke, I put back the log level for mysql_status
 in probe mode.

 Regards,

 Yves

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Patch to mysql RA for replication

2012-04-30 Thread Keisuke MORI

Hi Yves,

2012/4/19 Yves Trudeau y.trud...@videotron.ca:
 - cleanup loglevel

Why did you remove all the loglevel stuff away?
Was there anything wrong with that?

After your patch, the RA will generate inappropriate ERROR logs
whenever it starts/stops/probes even though they're all _expected_
results and nothing to worry about.

It's confusing for users and we have been trying to eliminate such
confusing ERROR logs as possible. The loglevel code is intended to use
INFO level when it's an expected result, and to use ERROR level only
when it's considered a failure.
https://github.com/ClusterLabs/resource-agents/commit/72952904b67b85e1809f90255a55ce39eb2a8922

I would like to revert them back.

Thanks,

 Hi Dejan,
  here's my patch to the mysql agent in the commit version 4c18035. Sorry for
 being inept with git.

 Included here:

 - attribute for replication_info
 - put in a variable error code 1040
 - put in a variable the long call to crm_attribute for replication_info
 - cleanup loglevel
 - defined a value for DEBUG_LOG

 Like I wrote before, I didn't find any solution yet to remove the IP
 attribute for each node.  Using a replication_VIP breaks the operation of
 the agent as it removes the easy way to add new nodes (or rejoin)

 Regards,

 Yves

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] cluster-glue: correct a build dependency on CentOS6/RHEL6

2012-03-09 Thread Keisuke MORI

Hi,

Please consider the attached patch for cluster-glue.
Thanks,

-- 
Keisuke MORI


export-libuuid.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Fwd: [lvs-users] defunct checkcommand processes w/ ldirectord

2012-02-29 Thread Keisuke MORI

Hi,

A bug report and a proposed patch to ldirectord were posted to
lvs-users ML a little while ago.
I think it's worth to include.

http://lists.graemef.net/pipermail/lvs-users/2012-February/024430.html

-- Forwarded message --
From: Sohgo Takeuchi so...@sohgo.dyndns.org
Date: 2012/2/11
Subject: Re: [lvs-users] defunct checkcommand processes w/ ldirectord
To: lvs-us...@linuxvirtualserver.org, da...@davidcoulson.net



Hello, David

From: David Coulson da...@davidcoulson.net
|
 I'm running ldirectord with a few external checkcommands, but end up
 with numerous defunct processes on the system.

 Seems like this issue:

 http://archive.linuxvirtualserver.org/html/lvs-users/2010-08/msg00040.html

 I realize I can set an alarm on my command and cause it to exit, but is
 there a fix for ldirectord to correctly clean up processes which are
 killed due to the internal timeout?

Please try the following patch.

diff --git a/ldirectord/ldirectord.in b/ldirectord/ldirectord.in
index 5d26114..c28eb40 100644
--- a/ldirectord/ldirectord.in
+++ b/ldirectord/ldirectord.in
@@ -2671,19 +2671,21 @@ sub run_child
       my $real = $$v{real};
       my $virtual_id = get_virtual_id_str($v);
       my $checkinterval = $$v{checkinterval} || $CHECKINTERVAL;
       $0 = ldirectord $virtual_id;
       while (1) {
               foreach my $r (@$real) {
                       $0 = ldirectord $virtual_id checking $$r{server};
                       _check_real($v, $r);
+                       check_signal();
               }
               $0 = ldirectord $virtual_id;
               sleep $checkinterval;
+               check_signal();
               ld_emailalert_resend();
       }
 }

 sub _check_real
 {
       my $v = shift;
       my $r = shift;

___
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-us...@linuxvirtualserver.org
Send requests to lvs-users-requ...@linuxvirtualserver.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users


Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra

2012-02-29 Thread Keisuke MORI

Hi,

Any update on this?

2012/2/1 Keisuke MORI keisuke.mori...@gmail.com:
 Hi Dejan,

 2012/1/31 Dejan Muhamedagic de...@suse.de:
 Hi Keisuke-san,

 On Tue, Jan 31, 2012 at 09:52:24PM +0900, Keisuke MORI wrote:
 Hi Dejan

 2012/1/31 Dejan Muhamedagic de...@suse.de:
  Hi Keisuke-san,
 (...)
  On Tue, Jan 31, 2012 at 08:46:35PM +0900, Keisuke MORI wrote:
  The current RA will try to check the top page (http://localhost:80)
  as the default behavior if you have not enabled server-status in 
  httpd.conf
  and it would fail to start even for the apache's default test page:)
 
  Hmm, the current RA would produce an error for that URL:
 
  488     case $STATUSURL in
  489         http://*/*) ;;
  490         *)
  491         ocf_log err Invalid STATUSURL $STATUSURL
  492         exit $OCF_ERR_ARGS ;;
  493     esac

 Strange. That URL is generated by the RA itself.

 apache-conf.sh:
    119  buildlocalurl() {
    120    [ x$Listen != x ] 
    121          echo http://${Listen}; ||
    122          echo ${LOCALHOST}:${PORT}


 Probably we should relax the validation pattern, as just 'http://*' ?

 Agreed. I thought that the intention was to always use the status
 page, but obviously people figured out that they could skip that.
 Just as well.

 Thank you for your productive comments and discussions!

 As the result of the discussion regarding to this topic,
 I would suggest two patches as the pull request below:
 https://github.com/ClusterLabs/resource-agents/pull/54

 Regards,


 --
 Keisuke MORI



-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] hb_report fails to sudo

2012-02-21 Thread Keisuke MORI

Hi,

hb_report in cluster-glue-1.0.8 or later fails on an error even if it
runs as root, at least on RHEL:

   ---
# id -u
0
# hb_report -f 16:00 report1
sudo: sorry, you must have a tty to run sudo
(...)
   ---

It seems introduced by this changeset:
http://hg.linux-ha.org/glue/rev/f55d68c37426


Apparently two issues are involved:
1) it tries to use sudo even when invoked as root.
2) sudo may be prohibited without tty on some distros such as RHEL for
a security sake.


The attached patch would fix for 1).
You can workaround it by specifying '-u root' explicitly until it gets fixed.

As for 2), it seems that the current hb_report need to _disable_ tty on ssh, so
you would need an additional configuration to /etc/sudoers on such
distros if you want to use a regular user to ssh.

Regards,

-- 
Keisuke MORI


hb_report-sudo-root.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra

2012-01-31 Thread Keisuke MORI

Hi Dejan,

2012/1/31 Dejan Muhamedagic de...@suse.de:
 Hi Keisuke-san,

 On Mon, Jan 30, 2012 at 08:38:35PM +0900, Keisuke MORI wrote:
 Hi,

 2012/1/28 Dejan Muhamedagic de...@suse.de:
  Hi,
 
  On Fri, Jan 20, 2012 at 02:09:13PM +0900, nozawat wrote:
  Hi Dejan
 
   I'm agreeable in the opinion.
   I send the patch which I revised.
 
   I'll apply this one. BTW, can you share your use case.
  If there is not -z option, the following HTML files return an error.
  - example ---
  html
  body
  test
  /body
  /html
  ---
  I placed a page for checks and was going to monitor it.
 
  Even though I said I'd apply this one, I'm now rather reluctant,
  because it may break some existing configurations, for instance
  if there are anchors in the regular expression (^ or $).
 
  Why is it important to match multiple lines?
  Just curious: how do you put this string into statusurl?

 The problem is that the default value of testregex assumes that
 /body and /html tags are in a single line,
 although it is very common that the HTML contents return them
 as multiple lines.

  TESTREGEX=${OCF_RESKEY_testregex:-'/ *body *[[:space:]]*/ *html *'}

 I think that it will not be a problem when you are using apache with
 'server-status' handler enabled
 because in that case apache seems return those tags in a single line,
 but it is also a common use case that the RA monitors to, say, the
 index.html on the top page.

 Ah, but in that case, i.e. if another page is specified to be
 monitored, testregex should be adjusted accordingly. The default
 are guaranteed to work only with the apache status page. Though
 I'm not really happy with the default regular expression we
 cannot change that.


The current RA will try to check the top page (http://localhost:80)
as the default behavior if you have not enabled server-status in httpd.conf
and it would fail to start even for the apache's default test page:)

I agree that an user should change the testregex accordingly when
they specify the page to be monitored, but I just wanted to make it work
with a default configuration.


 As for the regular expression like ^ or $, it looks like working as
 expected with -z option in my quick tests.
 Do you have any examples that it may break the configuration?

 For instance, what I see here in the status page is also a PID at
 the beginning of line:

 xen-d:~ # wget -q -O- -L --no-proxy  --bind-address ::1 
 http://[::1]/server-status | grep ^PID
 PID Key: br /
 xen-d:~ # wget -q -O- -L --no-proxy  --bind-address ::1 
 http://[::1]/server-status | grep -z ^PID
 xen-d:~ # echo $?
 1

Hmm, OK you are right. My test was not enough.
(Thanks to Lars for the comprehensive tests in the other mail!)

Now I understand that we should not support multiple lines matching
so that we can support ^ or $ as testregex in various platforms.
It is reasonable.



 If we should not really support multiple lines matching, then that is
 fine for us too,
 but in that case it would be preferable that the default value of
 testregex is something better
 for a single line matching, like just '/ *html *'.
 (and also we should mention about it in the meta-data document)

 Hmm, I'd really expect that in case a different page is checked,
 then also another test string is specified. After all, shouldn't
 that be part of the content, rather than an HTML code which can
 occur in any HTTP reply.

 But we could just as well reduce to the default regular
 expression to '/ *html *'. If nobody objects :)

Yes.


Regards,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra

2012-01-31 Thread Keisuke MORI

Hi Dejan

2012/1/31 Dejan Muhamedagic de...@suse.de:
 Hi Keisuke-san,
(...)
 On Tue, Jan 31, 2012 at 08:46:35PM +0900, Keisuke MORI wrote:
 The current RA will try to check the top page (http://localhost:80)
 as the default behavior if you have not enabled server-status in httpd.conf
 and it would fail to start even for the apache's default test page:)

 Hmm, the current RA would produce an error for that URL:

 488     case $STATUSURL in
 489         http://*/*) ;;
 490         *)
 491         ocf_log err Invalid STATUSURL $STATUSURL
 492         exit $OCF_ERR_ARGS ;;
 493     esac

Strange. That URL is generated by the RA itself.

apache-conf.sh:
   119  buildlocalurl() {
   120[ x$Listen != x ] 
   121  echo http://${Listen}; ||
   122  echo ${LOCALHOST}:${PORT}


Probably we should relax the validation pattern, as just 'http://*' ?






 I agree that an user should change the testregex accordingly when
 they specify the page to be monitored, but I just wanted to make it work
 with a default configuration.

 Of course.

  As for the regular expression like ^ or $, it looks like working as
  expected with -z option in my quick tests.
  Do you have any examples that it may break the configuration?
 
  For instance, what I see here in the status page is also a PID at
  the beginning of line:
 
  xen-d:~ # wget -q -O- -L --no-proxy  --bind-address ::1 
  http://[::1]/server-status | grep ^PID
  PID Key: br /
  xen-d:~ # wget -q -O- -L --no-proxy  --bind-address ::1 
  http://[::1]/server-status | grep -z ^PID
  xen-d:~ # echo $?
  1

 Hmm, OK you are right. My test was not enough.
 (Thanks to Lars for the comprehensive tests in the other mail!)

 Now I understand that we should not support multiple lines matching
 so that we can support ^ or $ as testregex in various platforms.
 It is reasonable.


 
  If we should not really support multiple lines matching, then that is
  fine for us too,
  but in that case it would be preferable that the default value of
  testregex is something better
  for a single line matching, like just '/ *html *'.
  (and also we should mention about it in the meta-data document)
 
  Hmm, I'd really expect that in case a different page is checked,
  then also another test string is specified. After all, shouldn't
  that be part of the content, rather than an HTML code which can
  occur in any HTTP reply.
 
  But we could just as well reduce to the default regular
  expression to '/ *html *'. If nobody objects :)

 Yes.

 So, with this the RA will always match any HTML. That should be
 fine for the default.

 Cheers,

 Dejan


 Regards,

 --
 Keisuke MORI
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


Regards,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra

2012-01-31 Thread Keisuke MORI

Hi Dejan,

2012/1/31 Dejan Muhamedagic de...@suse.de:
 Hi Keisuke-san,

 On Tue, Jan 31, 2012 at 09:52:24PM +0900, Keisuke MORI wrote:
 Hi Dejan

 2012/1/31 Dejan Muhamedagic de...@suse.de:
  Hi Keisuke-san,
 (...)
  On Tue, Jan 31, 2012 at 08:46:35PM +0900, Keisuke MORI wrote:
  The current RA will try to check the top page (http://localhost:80)
  as the default behavior if you have not enabled server-status in 
  httpd.conf
  and it would fail to start even for the apache's default test page:)
 
  Hmm, the current RA would produce an error for that URL:
 
  488     case $STATUSURL in
  489         http://*/*) ;;
  490         *)
  491         ocf_log err Invalid STATUSURL $STATUSURL
  492         exit $OCF_ERR_ARGS ;;
  493     esac

 Strange. That URL is generated by the RA itself.

 apache-conf.sh:
    119  buildlocalurl() {
    120    [ x$Listen != x ] 
    121          echo http://${Listen}; ||
    122          echo ${LOCALHOST}:${PORT}


 Probably we should relax the validation pattern, as just 'http://*' ?

 Agreed. I thought that the intention was to always use the status
 page, but obviously people figured out that they could skip that.
 Just as well.

Thank you for your productive comments and discussions!

As the result of the discussion regarding to this topic,
I would suggest two patches as the pull request below:
https://github.com/ClusterLabs/resource-agents/pull/54

Regards,


-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] new resource agents release

2012-01-30 Thread Keisuke MORI

Hi Dejan,

2012/1/27 Dejan Muhamedagic de...@suse.de:
 2. apache RA: testregex matching fix
     http://www.gossamer-threads.com/lists/linuxha/dev/77619#77619

     This looks as a regression since heartbeat-2.1.4 from user's point of 
 view;
      one of our customer reported that they had been using 2.1.4 and
 apache without problems
      and when they tried to upgrade to the recent Pacemaker without
 any changes in apache,
      it failed because of this issue.

 You're referring to apache-002.patch? Well, that's unfortunate as
 the two are incompatible, i.e. if the configuration has
 'whatever-string$' in testregex and we reintroduce
 tr '\012' ' ' that would break such configurations.

Oops, sorry,  I was only referring to apache-001.patch.


 This changed a bit more than three years ago and so far nobody
 complained. So, perhaps better to leave it as it is and whoever
 wants to upgrade from a +3 old installation should anyway do
 some good testing. What do you think?

Ok, let's move  the discussion on to the relevant thread about this topic.

Regards,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra

2012-01-30 Thread Keisuke MORI

Hi,

2012/1/28 Dejan Muhamedagic de...@suse.de:
 Hi,

 On Fri, Jan 20, 2012 at 02:09:13PM +0900, nozawat wrote:
 Hi Dejan

  I'm agreeable in the opinion.
  I send the patch which I revised.

  I'll apply this one. BTW, can you share your use case.
 If there is not -z option, the following HTML files return an error.
 - example ---
 html
 body
 test
 /body
 /html
 ---
 I placed a page for checks and was going to monitor it.

 Even though I said I'd apply this one, I'm now rather reluctant,
 because it may break some existing configurations, for instance
 if there are anchors in the regular expression (^ or $).

 Why is it important to match multiple lines?
 Just curious: how do you put this string into statusurl?

The problem is that the default value of testregex assumes that
/body and /html tags are in a single line,
although it is very common that the HTML contents return them
as multiple lines.

 TESTREGEX=${OCF_RESKEY_testregex:-'/ *body *[[:space:]]*/ *html *'}

I think that it will not be a problem when you are using apache with
'server-status' handler enabled
because in that case apache seems return those tags in a single line,
but it is also a common use case that the RA monitors to, say, the
index.html on the top page.

As for the regular expression like ^ or $, it looks like working as
expected with -z option in my quick tests.
Do you have any examples that it may break the configuration?

If we should not really support multiple lines matching, then that is
fine for us too,
but in that case it would be preferable that the default value of
testregex is something better
for a single line matching, like just '/ *html *'.
(and also we should mention about it in the meta-data document)

Regards,
Keisuke MORI


 Cheers,

 Dejan

 Regards,
 Tomo

 2012年1月20日4:20 Dejan Muhamedagic de...@suse.de:
  Hi,
 
  On Thu, Jan 19, 2012 at 11:42:07AM +0900, nozawat wrote:
  Hi Dejan and Lars
 
   I send the patch which settled a conventional argument.
   1)apache-001.patch
 -I am the same with the patch which I sent last time.
 -It is the version that I added an option of the grep to.
 
  I'll apply this one. BTW, can you share your use case.
 
   2)apache-002.patch
 -It is a processing method using tr at the age of HB2.1.4.
 
  Can't recall or see from the history why tr(1) was dropped (and
  it was me who removed it :( But I guess there was a reason for
  that.
 
   3)http-mon.sh.patch
 -It is the patch which coupled my suggestion with A.
 
  After trying to rework the patch a bit, I think now that we need
  a different user interface, i.e. we should introduce a boolean
  parameter, say use_ipv6, then fix interface bind addresses
  depending on that. For instance, if user wants to use curl, then
  we'd need to add the -g option to make it work with IPv6.
 
  We can also try to figure out from the statusurl content if
  it contains an IPv6 address (echo $statusurl | grep -qs ::)
  then make the http client use IPv6 automatically.
 
  Would that work for you? Opinions?
 
  Cheers,
 
  Dejan
 
   1) and 2) improve malfunction at the time of the monitor processing.
   3) supports IPv6.
 
   The malfunction is not revised when I do not apply at least 1) or 2).
   I think that 2) plan is good, but leave the final judgment to Dejan.
 
  Regards,
  Tomo
 
  2012年1月19日1:12 Dejan Muhamedagic de...@suse.de:
   Hi,
  
   On Wed, Jan 18, 2012 at 11:19:58AM +0900, nozawat wrote:
   Hi Dejna and Lars
  
When, for example, it is a logic of the examples of Lars to try both,
in the case of IPv6, is the check of IPv4 that I enter every time?
Don't you hate that useless processing enters every time?
  
In that case, I think that I should give a parameter such as
   OCF_RESKEY_bindaddress.
   --
   bind_address=127.0.0.1
   if [ -n $OCF_RESKEY_bindaddress ]; then
 bind_address=$OCF_RESKEY_bindaddress
   fi
   WGETOPTS=-O- -q -L --no-proxy --bind-address=$bind_address
   --
  
   That's fine too. We can combine yours and Lars' proposal, i.e. in
   case bindaddress is not set, it tries both. Do you think you
   could prepare such a patch?
  
   BTW, the extra processing is minimal, in particular compared to
   the rest this RA does.
  
   Thanks,
  
   Dejan
  
   Regards,
   Tomo
  
   2012年1月17日23:28 Dejan Muhamedagic de...@suse.de:
On Tue, Jan 17, 2012 at 11:41:41AM +0100, Lars Ellenberg wrote:
On Tue, Jan 17, 2012 at 11:07:09AM +0900, nozawat wrote:
 Hi Dejan and Lars,

 I send the patch which I revised according to indication of Lars.

  OK. I guess that this won't introduce a regression. And I guess
  that sometimes one may need a newline in the test string.
 I seemed to surely take such a step in the past.
 However, I thought that the tr processing was deleted that load 
 became higher.
 Therefore I used the -z option.
   
Thinking about it, maybe to reduce chance

[Linux-ha-dev] [GIT PULL] Medium: IPv6addr: handle a link-local address properly in send_ua

2011-11-23 Thread Keisuke MORI

Dejan,

Would you consider to pull this patch to resolve issue #29 on the github?
https://github.com/ClusterLabs/resource-agents/pull/34

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] LVS support for IPv6

2011-11-15 Thread Keisuke MORI

Hello all,

I would like to use LVS Direct Routing configuration on IPv6
but I encountered a problem as described below.

I'm now going to fix it but I've found that there may be a several
arguments to decide how it should be fix, so
I would like to ask everybody's opinion before I proceed.

Please give me your thoughts, comments about how we should fix it.


Symptom:

On IPv4, I have been using IPaddr2 RA for LVS DR and it works like a charm.
On IPv6, I tried to use IPv6addr RA for the virtual IPv6
address with a similar configuration as IPv4 but
the address would not become reachable from another node.

The ip command shows that the duplicate address has assigned to both lo
and ethX and one has 'dadfailed' flag (Duplicate Address Detection
defined in RFC4862).

# ip addr show
1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue state UNKNOWN
inet6 2004::210/128 scope global
   valid_lft forever preferred_lft forever
(...)
5: eth3: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc mq state
UP qlen 1000
inet6 2004::210/64 scope global tentative dadfailed
   valid_lft forever preferred_lft forever


Arguments:

1) Which RA should be improved? IPaddr2 or IPv6addr?

Obviously we have two approaches to fix this and each has pros/cons.

 a) improve IPadd2 to support IPv4/IPv6 dual stack
 b) improve IPv6addr to remove a duplicate IPv6 address on the loopback.

 As for a),
   pros: easy to maintain as a single code base.
 uniform behavior between IPv4/IPv6 since ip command already
supports dual stack.
   cons: it changes the policy of the recommended RA for IPv6 on Linux.
 need a new binary for the replacement of send_arp for IPv6.

 As for b),
   pros: no changes for the existing IPaddr2.
   cons: need to implement lvs_support=true equivalent feature in C,
 which may make the code to maintain harder.


2) Is lvs_support=true functionality really necessary?

When I use IPaddr2 for LVS on IPv4, it's been working perfectly
*without* lvs_support=true.
In this case the same IP addresses are assigned to both lo and ethX, and
still works everything fine.

Addition to this, the latest IPaddr2 has a bug and it does not
remove the IP address on lo even if lvs_support=true.
This was reported once before:
http://www.gossamer-threads.com/lists/linuxha/pacemaker/71106#71106


On IPv6, I also tried assigning an IPv6 address to both lo and ethX by ip
command manually, and it seems to work fine as same as IPv4.
The weird 'dadfailed' flag was not seen when I use ip command.



Proposed solution:

As considering all arguments above, I would like to suggest the
following modification:

 - improve IPaddr2 as IPv4/IPv6 dual stack support.
 - recommend to use IPaddr2 both for IPv4/IPv6 on Linux in the future.
   IPaddr/IPv6addr would be only left for legacy and cross platform support.
 - lvs_support=true option would be deprecated and no longer necessary.

Any opinions, suggestions are appreciated.
I will work on it after we all agree on how we should fix it.


Regards,

Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [Linux-HA] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th

2011-10-14 Thread Keisuke MORI

Hi,

2011/10/10 Dejan Muhamedagic de...@suse.de:
 On Sun, Oct 09, 2011 at 11:28:41PM +1100, Andrew Beekhof wrote:
 On Sat, Oct 8, 2011 at 6:03 AM, Digimer li...@alteeve.com wrote:
  On 10/07/2011 02:58 PM, Florian Haas wrote:
  Vienna before the early afternoon of Saturday the 29th, so if anyone has
  plans to do something interesting that Saturday morning I'd be more than
  happy to join.
 
  Cheers,
  Florian
 
  I'm going to be in the city all day Saturday as well.
 
  Knowing there will be at least a few who will have trouble making the
  unofficial meeting on the 26th,

 The 26th is just the meeting start.

 It's not 26th, but 25th. It also says so in the subject line.
 I'll be in Prague only on 25th.

I'm trying to settle my schedule being in Prague from 25th to 28th.
See you everybody over there.

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] regressions in resource-agents 3.9.1

2011-06-27 Thread Keisuke MORI

Hi,

Is there any backlogs for the 3.9.2 release?
I'm very looking forward to see it soon since 3.9.1 was not really
usable for me...

Thanks,

2011/6/22 Dejan Muhamedagic deja...@fastmail.fm:
 Hi all,

 On Wed, Jun 22, 2011 at 11:22:48PM +0900, Keisuke MORI wrote:
 2011/6/22 Florian Haas florian.h...@linbit.com:
  On 2011-06-22 11:48, Dejan Muhamedagic wrote:
  Hello all,
 
  Unfortunately, it turned out that there were two regressions in
  the 3.9.1 release:
 
  - iscsi on platforms which run open-iscsi 2.0-872 (see
    http://developerbugs.linux-foundation.org/show_bug.cgi?id=2562)
 
  - pgsql probes with shared storage (iirc), see
    http://marc.info/?l=linux-ham=130858569405820w=2
 
  Thanks to Vadym Chepkov for finding and reporting them.
 
  I'd suggest to make a quick fix release 3.9.2.
 
  Opinions?
 
  Agree.

 +1

 OK. Let's do that on Friday morning. Tomorrow is holiday here.

 Cheers,

 Dejan

 --
 Keisuke MORI
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] regressions in resource-agents 3.9.1

2011-06-22 Thread Keisuke MORI

2011/6/22 Florian Haas florian.h...@linbit.com:
 On 2011-06-22 11:48, Dejan Muhamedagic wrote:
 Hello all,

 Unfortunately, it turned out that there were two regressions in
 the 3.9.1 release:

 - iscsi on platforms which run open-iscsi 2.0-872 (see
   http://developerbugs.linux-foundation.org/show_bug.cgi?id=2562)

 - pgsql probes with shared storage (iirc), see
   http://marc.info/?l=linux-ham=130858569405820w=2

 Thanks to Vadym Chepkov for finding and reporting them.

 I'd suggest to make a quick fix release 3.9.2.

 Opinions?

 Agree.

+1

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] using the pacemaker logo for the xing group

2011-06-21 Thread Keisuke MORI

Hi Erkan,

As I've sent a personal email to you and as Ikeda-san already replied to you,
Anybody may use the logo in conjunction with any Pacemaker / Linux-HA
related projects.

The logo is a contribution from the Japanese Pacemaker / Linux-HA community,
so asking the permission to the Japanese mailing list as you did is
right but here is also fine.

You can obtain the logo from here: (sorry it's in Japanese)
http://linux-ha.sourceforge.jp/wp/archives/369

Regards,
Keisuke MORI
Linux-HA Japan Project.

2011/6/21 Junko IKEDA tsukishima...@gmail.com:
 Hi Erkan,

 The pacemaker logos has been created by NTT group.
 I asked for the boss's permission,
 I think I can send them to you directory soon :)

 Did you post the similar mail to the Japanese mailing list before this?
 Sorry to inconvenience you.

 Thanks,
 Junko IKEDA

 NTT DATA INTELLILINK CORPORATION

 2011/6/20 erkan yanar erkan.ya...@linsenraum.de:

 Moin,

 I would like to use the (red/rabbit) pacemaker logo for the linux cluster 
 group in xing.
 Who do I have to ask for permission to use it?

 Regards
 Erkan


 --
 über den grenzen muß die freiheit wohl wolkenlos sein

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release

2011-06-08 Thread Keisuke MORI

Hi,

Thank you for all your efforts for the new release.


2011/6/7 Fabio M. Di Nitto fdini...@redhat.com:
 Several changes have been made to the build system and the spec file to
 accommodate both projects´ needs. The most noticeable change is the
 option to select all, linux-ha or rgmanager resource agents at
 configuration time, which will also set the default for the
 spec file.

Why is the ldirectord package disabled on RHEL environment?
I would expect that it would be built as same as (linux-ha)
resource-agents-1.0.4
so that we can use the upcoming 3.9.1 as the upgrade version.

We still use the resource-agents/ldirectord on many RHEL systems and
if it was missing
we can not upgrade them anymore.

from resource-agents.spec.in :
 --- --- ---  --- --- ---   --- --- ---
%if %{with linuxha}
%if 0%{?rhel} == 0
%package -n ldirectord
 --- --- ---  --- --- ---   --- --- ---


 NOTE: About the 3.9.x version (particularly for linux-ha folks): This
 version was chosen simply because the rgmanager set was already at
 3.1.x. In order to make it easier for distribution, and to keep package
 upgrades linear, we decided to bump the number higher than both
 projects. There is no other special meaning associated with it.

 The final 3.9.1 release will take place soon.

BTW why not 4.0? :)
just curious though.


Regards,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release

2011-06-08 Thread Keisuke MORI

Hi,

2011/6/8 Fabio M. Di Nitto fdini...@redhat.com:
 Why is the ldirectord package disabled on RHEL environment?
 I would expect that it would be built as same as (linux-ha)
 resource-agents-1.0.4
 so that we can use the upcoming 3.9.1 as the upgrade version.

 Because ldirectord requires libnet to build and libnet is not available
 on default RHEL (unless you explicitly enable EPEL).

 ldirectord requires no extra packages to build on RHEL. It just a perl 
 script.
 You may be concerned about the running environment;  it requires 
 perl-MailTools
 at least which can be obtained only from EPEL or CentOS extras, but
 ldirectord users
 have been already doing to collect such packages when they want to use it.

 I can provide a patch to the spec file if it's ok to build.

 Note that the (linux-ha) resource-agents should have been completely 
 independent
 from libnet as of 1.0.4. Before that IPv6addr RA was the only
 dependency of libnet.

 Whops.. yes you are absolutely right. I got confused between the IPAddr
 and ldirectord.

 Yes you can either send me a patch, or I can do it. It´s really piece of
 cake.

Ok, I would suggest the attached patch for resolving this particular issue,
but I think there are still some issues left;

1) I'm wondering why this condition is needed; I think we can always use
   %{_var}/run/resource-agents in the current version.

%if 0%{?fedora} = 11 || 0%{?centos_version}  5 || 0%{?rhel}  5
%dir %{_var}/run/heartbeat/rsctmp
%else
%dir %attr (1755, root, root)   %{_var}/run/resource-agents
%endif


2) duplicated man8/ldirectord.8.gz is included both in resource-agents
   and ldirectord packages. it should not be a big problem though.

%{_mandir}/man8/*.8*
(...)
%{_mandir}/man8/ldirectord.8*


3) It can not build on RHEL5 with this error. I'd be glad if there is
   some kind of backward compatibility.

%if 0%{?suse_version} == 0  0%{?fedora} == 0  0%{?centos_version}
== 0  0%{?rhel} == 0
%{error:Unable to determine the distribution/version. This is
generally caused by missing /etc/rpm/macros.dist. Please install the
correct build packages or define the required macros manually.}


Regards,
-- 
Keisuke MORI
diff --git a/resource-agents.spec.in b/resource-agents.spec.in
index 8b39b3f..7dc6670 100644
--- a/resource-agents.spec.in
+++ b/resource-agents.spec.in
@@ -106,7 +106,6 @@ High Availability environment for both Pacemaker and rgmanager
 service managers.
 
 %if %{with linuxha}
-%if 0%{?rhel} == 0
 %package -n ldirectord
 License:	GPLv2+
 Summary:	A Monitoring Daemon for Maintaining High Availability Resources
@@ -136,7 +135,6 @@ lditrecord is simple to install and works with the heartbeat code
 
 See 'ldirectord -h' and linux-ha/doc/ldirectord for more information.
 %endif
-%endif
 
 %prep
 %if 0%{?suse_version} == 0  0%{?fedora} == 0  0%{?centos_version} == 0  0%{?rhel} == 0
@@ -194,11 +192,6 @@ make install DESTDIR=%{buildroot}
 rm -rf %{buildroot}/usr/share/doc/resource-agents
 
 %if %{with linuxha}
-%if 0%{?rhel} != 0
-# ldirectord isn't included on RHEL
-find %{buildroot} -name 'ldirectord.*' -exec rm -f {} \;
-find %{buildroot} -name 'ldirectord' -exec rm -f {} \;
-%endif
 
 %if 0%{?suse_version}
 test -d %{buildroot}/sbin || mkdir %{buildroot}/sbin
@@ -270,7 +263,6 @@ rm -rf %{buildroot}
 %{_libdir}/heartbeat/findif
 %{_libdir}/heartbeat/tickle_tcp
 
-%if 0%{?rhel} == 0
 %if 0%{?suse_version}
 %preun -n ldirectord
 %stop_on_removal ldirectord
@@ -303,7 +295,6 @@ rm -rf %{buildroot}
 /usr/lib/ocf/resource.d/heartbeat/ldirectord
 %endif
 %endif
-%endif
 
 %changelog
 * @date@ Autotools generated version nob...@nowhere.org - @version@-@specver@-@numcomm@.@alphatag@.@dirty@
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release

2011-06-08 Thread Keisuke MORI

2011/6/8 Digimer li...@alteeve.com:
 On 06/08/2011 09:48 AM, Florian Haas wrote:
 I realize I'm bikeshedding, but my preference would be for 3.9 for this
 one, and 4.0 to implement the new standard. Like Fabio originally suggested.

 Cheers,
 Florian

 Given that x.0 has long meant new stuff, I'd like to stick with the
 3.9.x.

About the bikeshed's color:) I don't mind either one,
I just wanted to know what's the reason behind and now all clear for me.

Thank,s
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] nginx resource agent

2011-01-03 Thread Keisuke MORI

Hi Alan,

2011/1/2 Alan Robertson al...@unix.sh:
 On 12/14/2010 02:42 AM, Dejan Muhamedagic wrote:
    #
    # I'm not convinced this is a wonderful idea (AlanR)
    #
    for sig in SIGTERM SIGHUP SIGKILL
    do
      if
        pgrep -f $NGINXD.*$CONFIGFILE/dev/null
      then
        pkill -$sig -f $NGINXD.*$CONFIGFILE/dev/null
        ocf_log info nginxd children were signalled ($sig)
        sleep 1
      else
        break
      fi
    done
 Can't recall anymore the details, there was a bit of discussion
 on the matter a few years ago, but NTT insisted on killing httpd
 children. Or do you mind the implementation?

 Hi Dejan,

 I know it's been a long time.  Sorry about that.  If I _hated_ the idea,
 I would have left it out.  It definitely leaves me feeling a bit
 unsettled.  If it causes a problem, it will no doubt eventually show
 up.  It looks like it's just masking a bug in Apache - that is, that
 giving it a shutdown request doesn't really work...

The relevant discussion is this:
http://www.gossamer-threads.com/lists/linuxha/dev/44395#44395
http://developerbugs.linux-foundation.org//show_bug.cgi?id=1800


The intention of the code is to allow to restart the service if the
Apache main process was failed in some reason (maybe a bug in Apache,
maybe the OOM killer or whatever). It's not for masking a bug in
Apache - it's just trying to clean up and continue the service without
the manual intervention as possible.


 Perhaps I shouldn't have kept it in the nginx code - since it does seem
 to be a bit specific to some circumstance in Apache...  On the other
 hand, it shouldn't hurt anything either...

You may want to see what happens if the nginx process was accidentally killed.
I'm not familiar with nginx at all, but in the case of Apache, the
children would keep running and they prevent to restart another Apache
instance until you kill all the orphaned processes manually.
If nginx is a single process application, then I think that the code
should not be necessary.


-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker

2010-11-15 Thread Keisuke MORI

2010/11/14 Lars Ellenberg lars.ellenb...@linbit.com:
 On Tue, Nov 09, 2010 at 06:06:30PM +0900, Keisuke MORI wrote:

 Ok, then let's just drop the changeset.

 I agree that srand should not be called many times,
 but I would rather prefer to just keep the existing behavior
 since there have been no problems with that so far.

 Ok, I'll revert it for now.

Thanks, I confirmed that the problem went away.


 But I'd rather have it working there.
 Would this patch to the cib do the right thing?

The patch actually didn't work. I've looked into the code more and now
I realized that the existence of the mainloop is not the issue here;
the g_main_loop_is_running() _always_ fails when NULL is passed.

glue/lib/clplumbing/cl_random.c:
        
static void
get_more_random(void)
{
if (randgen_scheduled || IS_QUEUEFULL) {
return;
}
if (g_main_loop_is_running(NULL)) {
randgen_scheduled = TRUE;
Gmain_timeout_add_full(G_PRIORITY_LOW+1, 10, add_a_random, 
NULL, NULL);
}
}
        


By looking at the source code of glib it looks like this:
http://git.gnome.org/browse/glib/tree/glib/gmain.c#n3157
        
gboolean
g_main_loop_is_running (GMainLoop *loop)
{
  g_return_val_if_fail (loop != NULL, FALSE);
  g_return_val_if_fail (g_atomic_int_get (loop-ref_count)  0, FALSE);

  return loop-is_running;
}
        

I'm wondering the 'get_more_random()' logic had ever worked before.

So the proper fix here would be, in my opinion,
just removing the 'get_more_random()' logic in the cluster-glue code.
It does not make sense for me that the g_mainloop is required just for
getting a random value:)

The Heartbeat code should still support the current version of
cluster-glue, so I think that the current code in the repository is
just good for the coming 3.0.4.


 Any other backlogs to release the heartbeat packeage?
 I would look forward to be released it soon!

 Me too. Alas ...
 We try to get it out until next Friday (19th November)

Great!
Thank you for all your effort for the release!

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker

2010-11-09 Thread Keisuke MORI

Lars,

2010/10/27 Lars Ellenberg lars.ellenb...@linbit.com:
 On Mon, Oct 25, 2010 at 08:21:26PM +0900, Keisuke MORI wrote:
 Hi,

 The recent heartbeat on the tip would cause an assertion fail in
 pacemaker-1.0 and generate a core:
(snip)
 I don't care for the get_more_random() stuff and
 keeping 100 random values prepared for get_next_random,
 that is probably just academic sugar, anyways.

 If it does not work, we throw it all out, or fix it.

Ok, then let's just drop the changeset.

I agree that srand should not be called many times,
but I would rather prefer to just keep the existing behavior
since there have been no problems with that so far.



 I object to calling srand many times.
 Actually we should only call it once,
 we still call it in too many places.

 I found the get_next_random() function to apparently properly wrap
 around a static int inityet and do the srand only once,
 so I just used it.

 Would it help to call g_main_loop_new() earlier?
 Can we more cleanly catch the no GMainLoop there yet in
 get_more_random()?

 Should we just drop get_next_random() from cl_rand_from_interval?
 Or drop it altogether along with get_more_random and its static
 array -- it's not as if generating random numbers was performance
 critical in any way, is it.

It could possibly help, but I don't think it's worth to do it right now.

Any other backlogs to release the heartbeat packeage?
I would look forward to be released it soon!

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker

2010-10-25 Thread Keisuke MORI

Hi,

The recent heartbeat on the tip would cause an assertion fail in
pacemaker-1.0 and generate a core:
{{{
Oct 25 17:15:08 srv02 cib: [31333]: ERROR: crm_abort:
crm_glib_handler: Forked child 31338 to record non-fatal assert at
utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
Oct 25 17:15:08 srv02 cib: [31333]: ERROR: crm_abort:
crm_glib_handler: Forked child 31339 to record non-fatal assert at
utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
Oct 25 17:15:11 srv02 crmd: [31337]: ERROR: crm_abort:
crm_glib_handler: Forked child 31341 to record non-fatal assert at
utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
Oct 25 17:15:11 srv02 crmd: [31337]: ERROR: crm_abort:
crm_glib_handler: Forked child 31342 to record non-fatal assert at
utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed
}}}


This seems introduced by the following changeset:
http://hg.linux-ha.org/dev/rev/231b0b8555be

The stack trace and my suggested patch are attached.

The changeset in question had changed to use get_next_random() here
which eventually calls g_main_loop_is_running() but it may fail
because g_main_loop is not initialized yet in cib/crmd.

My suggested patch would just revert the old behavior but only changes
the delay as 50ms.

Thanks,

-- 
Keisuke MORI
(gdb) where
#0  0x00669410 in __kernel_vsyscall ()
#1  0x00692df0 in raise () from /lib/libc.so.6
#2  0x00694701 in abort () from /lib/libc.so.6
#3  0x00c0d82f in crm_abort (file=0xc26955 utils.c, 
function=0xc26dda crm_glib_handler, line=449, 
assert_condition=0x8933d58 g_main_loop_is_running: assertion `loop != 
NULL' failed, do_core=1, do_fork=1) at utils.c:1382
#4  0x00c09f05 in crm_glib_handler (log_domain=0x167686 GLib, 
flags=G_LOG_LEVEL_CRITICAL, 
message=0x8933d58 g_main_loop_is_running: assertion `loop != NULL' 
failed, user_data=0x0) at utils.c:449
#5  0x00143b67 in g_logv () from /lib/libglib-2.0.so.0
#6  0x00143d39 in g_log () from /lib/libglib-2.0.so.0
#7  0x00143e1b in g_return_if_fail_warning () from /lib/libglib-2.0.so.0
#8  0x0013981b in g_main_loop_is_running () from /lib/libglib-2.0.so.0
#9  0x00880811 in get_more_random () at cl_random.c:95
#10 0x00880945 in cl_init_random () at cl_random.c:128
#11 0x00880644 in gen_a_random () at cl_random.c:68
#12 0x00880896 in get_next_random () at cl_random.c:106
#13 0x00fdbabb in get_clientstatus (lcl=0x8931bd8, host=0x0, 
clientid=0x805b779 cib, timeout=-1) at client_lib.c:974
#14 0x080557ee in cib_init () at main.c:461
#15 0x08054c4b in main (argc=1, argv=0xbfcd6124) at main.c:218
(gdb) 
# HG changeset patch
# User Keisuke MORI kskm...@intellilink.co.jp
# Date 1288003477 -32400
# Node ID 96b67422b12814f64dc7dd61c670801c7ba213b6
# Parent  82fc843fbcf9733e50bbc169c95e51b6c7f97c54
Medium: reduce max delay in get_client_status (revised 231b0b8555be)
revert the old code to avoid calling g_main_loop_is_running()
which may fail when used in Pacemaker cib/crmd.

diff -r 82fc843fbcf9 -r 96b67422b128 lib/hbclient/client_lib.c
--- a/lib/hbclient/client_lib.c	Mon Oct 04 22:12:37 2010 +0200
+++ b/lib/hbclient/client_lib.c	Mon Oct 25 19:44:37 2010 +0900
@@ -966,16 +966,6 @@ get_nodesite(ll_cluster_t* lcl, const ch
 * Return the status of the given client.
 */
 
-#ifndef HAVE_CL_RAND_FROM_INTERVAL
-/* you should grab latest glue headers! */
-static inline int cl_rand_from_interval(const int a, const int b)
-{
-	/* RAND_MAX may be INT_MAX, or (b-a) may be huge. */
-	long long r = get_next_random();
-	return a + (r * (b-a) + RAND_MAX/2)/RAND_MAX;
-}
-#endif
-
 static const char *
 get_clientstatus(ll_cluster_t* lcl, const char *host
 ,		const char *clientid, int timeout)
@@ -1027,8 +1017,9 @@ get_clientstatus(ll_cluster_t* lcl, cons
 		 * in a 100-node cluster, the max delay is 5 seconds
 		 */
 		num_nodes = get_num_nodes(lcl);
-		max_delay = num_nodes * 5;
-		delay = cl_rand_from_interval(0, max_delay);
+		max_delay = num_nodes * 5; /* in microsecond*/
+		srand(cl_randseed());
+		delay = (1.0* rand()/RAND_MAX)*max_delay;
 		if (ANYDEBUG){
 			cl_log(LOG_DEBUG, Delaying cstatus request for %d ms, delay/1000);
 		}
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Next release from Linux-HA? (was: [PATCH] IPv6addr: removing libnet dependency)

2010-10-15 Thread Keisuke MORI

Hi Lars,

We have talked about the next release of heartbeat/resource-agents
packages while ago.
As Pacemaker-1.0.10 is about to release soon, I think it's  good time
to release those packages now too for the best use of Pacemaker.

I think that heartbeat-3.0.4 / resource-agents-1.0.4 should be
released at least because it has already been 6 months since the last
release.

How do you think about it and when can we release the packages?

Regards,
Keisuke MORI

2010/7/27 Lars Ellenberg lars.ellenb...@linbit.com:
 On Tue, Jul 27, 2010 at 04:12:34PM +0900, Keisuke MORI wrote:
 2010/7/27 Keisuke MORI keisuke.mori...@gmail.com:
  2010/7/26 Lars Ellenberg lars.ellenb...@linbit.com:
  On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote:
  Heartbeat does not have many changes (appart from some cleanup in the
  build dependencies), so there is no urge to release a 3.0.4, but we
  could do so any time.
 (...)
  For heartbeat, I personally like pacemaker on in ha.cf :)


 I should have mentioned this too, the version number in the log file
 from heartbeat 3.0.3 seems incorrect. I want to fix this soon to avoid
 confusion.

 
 Jul 20 14:08:50 srv01 heartbeat: [6299]: info: Configuration
 validated. Starting heartbeat 3.0.2

 Yes, I know. Not a problem.
 Needs to be changed in configure.ac before the 3.0.4 release.

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/




-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-08-10 Thread Keisuke MORI

2010/8/11 Simon Horman ho...@verge.net.au:
  http://hg.linux-ha.org/agents/rev/612e2966f372

 I've had to commit a small revision, because on IA64, the memory on the
 stack is not aligned properly for the cast to struct nd_neighbor_advert
 * - http://hg.linux-ha.org/agents/rev/d206bc8f1303

 I apologize for the uglyness; it was the only way I could make gcc
 shutup and get the alignment right. If someone can make the alignment
 properly on the stack, I'm all ears ...

 You are right, that is a bit ugly.
 But I have no better ideas at this time :-(

How about this patch or along this line?
It assumes GCC but ICC should have a similar feature if you want to support it.

Alternatively, having an union with an u_int8_t array and a struct
should make an alignment correctly, I think.

-- 
Keisuke MORI
# HG changeset patch
# User Keisuke MORI kskm...@intellilink.co.jp
# Date 1281491442 -32400
# Node ID b12ca86af66197498cbf537ccc7ad4ff56cdf63b
# Parent  d206bc8f13039b332e76a93a86e8e550b67781da
[mq]: ipv6addr-alignment.patch

diff -r d206bc8f1303 -r b12ca86af661 heartbeat/IPv6addr.c
--- a/heartbeat/IPv6addr.c	Mon Aug 09 21:51:19 2010 +0200
+++ b/heartbeat/IPv6addr.c	Wed Aug 11 10:50:42 2010 +0900
@@ -89,7 +89,6 @@
 
 #include stdio.h
 #include stdlib.h
-#include malloc.h
 #include unistd.h
 #include sys/types.h
 #include sys/socket.h
@@ -424,10 +423,17 @@
 	int ifindex;
 	int hop;
 	struct ifreq ifr;
-	u_int8_t *payload;
-	intpayload_size;
-	struct nd_neighbor_advert *na;
-	struct nd_opt_hdr *opt;
+
+	/* GCC is assumed.
+	 * If you want to port to other than GCC, make sure that
+	 * the packet is packed correctly.
+	 */ 
+	struct neighbor_advert {
+		struct nd_neighbor_advert na;
+		struct nd_opt_hdr opt;
+		u_int8_t hwaddr[HWADDR_LEN];
+	} __attribute__ ((packed)) payload;
+
 	struct sockaddr_in6 src_sin6;
 	struct sockaddr_in6 dst_sin6;
 
@@ -473,39 +479,27 @@
 	}
 
 	/* build a neighbor advertisement message */
-	payload_size = sizeof(struct nd_neighbor_advert)
-			 + sizeof(struct nd_opt_hdr) + HWADDR_LEN;
-	payload = memalign(sysconf(_SC_PAGESIZE), payload_size);
-	if (!payload) {
-		cl_log(LOG_ERR, malloc for payload failed);
-		goto err;
-	}
-	memset(payload, 0, payload_size);
+	memset((void *)payload, 0, sizeof(payload));
 
-	/* Ugly typecast from ia64 hell! */
-	na = (struct nd_neighbor_advert *)((void *)payload);
-	na-nd_na_type = ND_NEIGHBOR_ADVERT;
-	na-nd_na_code = 0;
-	na-nd_na_cksum = 0; /* calculated by kernel */
-	na-nd_na_flags_reserved = ND_NA_FLAG_OVERRIDE;
-	na-nd_na_target = *src_ip;
+	payload.na.nd_na_type = ND_NEIGHBOR_ADVERT;
+	payload.na.nd_na_code = 0;
+	payload.na.nd_na_cksum = 0; /* calculated by kernel */
+	payload.na.nd_na_flags_reserved = ND_NA_FLAG_OVERRIDE;
+	payload.na.nd_na_target = *src_ip;
 
 	/* options field; set the target link-layer address */
-	opt = (struct nd_opt_hdr *)(payload + sizeof(struct nd_neighbor_advert));
-	opt-nd_opt_type = ND_OPT_TARGET_LINKADDR;
-	opt-nd_opt_len = 1; /* The length of the option in units of 8 octets */
-	memcpy(payload + sizeof(struct nd_neighbor_advert)
-			+ sizeof(struct nd_opt_hdr),
-	   ifr.ifr_hwaddr.sa_data, HWADDR_LEN);
+	payload.opt.nd_opt_type = ND_OPT_TARGET_LINKADDR;
+	payload.opt.nd_opt_len = 1; /* The length of the option in units of 8 octets */
+	memcpy(payload.hwaddr, ifr.ifr_hwaddr.sa_data, HWADDR_LEN);
 
 	/* sending an unsolicited neighbor advertisement to all */
 	memset(dst_sin6, 0, sizeof(dst_sin6));
 	dst_sin6.sin6_family = AF_INET6;
 	inet_pton(AF_INET6, BCAST_ADDR, dst_sin6.sin6_addr); /* should not fail */
 
-	if (sendto(fd, payload, payload_size, 0,
+	if (sendto(fd, (void *)payload, sizeof(payload), 0,
 		   (struct sockaddr *)dst_sin6, sizeof(dst_sin6))
-	!= payload_size) {
+	!= sizeof(payload)) {
 		cl_log(LOG_ERR, sendto(%s) failed: %s,
 		   if_name, strerror(errno));
 		goto err;
@@ -515,7 +509,6 @@
 
 err:
 	close(fd);
-	free(payload);
 	return status;
 }
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-30 Thread Keisuke MORI

2010/7/27 Andrew Beekhof and...@beekhof.net:
 On Tue, Jul 27, 2010 at 8:44 AM, Keisuke MORI keisuke.mori...@gmail.com 
 wrote:
 For heartbeat, I personally like pacemaker on in ha.cf :)

 One thing thats coming in 1.1.3 is an mcp (master control process) and
 associated init script for pacemaker.
 This means that Pacemaker is started/stopped independently of the
 messaging layer.

 Currently this is only written for corosync[1], but I've been toying
 with the idea of extending it to Heartbeat.
 In which case, if you're already changing the option, you might want
 to make it: legacy on/off
 Where off would be the equivalent of starting with -M (no resource
 management) but wouldn't spawn any daemons.

 Thoughts?

I have a several concerns with that change,

1) Is it possible to recover or cause a fail-over correctly when any
of the Pacemaker/Heartbeat process was failed?
   (In particular, for the failure of the new mcp process of pacemaker
and for the current heartbeat's MCP process failure)

2) Would the daemons used with respawn directive such as hbagent(SNMP
daemon) or pingd work as compatible?

3) After all, what would be the benefit for end users with the change?
   I feel like it's only adding some complexity to the operations and
the diagnostics by the end users.

I guess that I would only use legacy on on the heartbeat stack...

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-27 Thread Keisuke MORI

2010/7/26 Lars Ellenberg lars.ellenb...@linbit.com:
 On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote:
 By the way, do we have any plan to release the next
 agents/glue/heartbeat packages from the Linux-HA project?
 I think it's good time to consider them for the best use of pacemaker-1.0.9.

 I think glue was released by dejan just before he went on vacation,
 though the release announcement is missing (1.0.6).

 Heartbeat does not have many changes (appart from some cleanup in the
 build dependencies), so there is no urge to release a 3.0.4, but we
 could do so any time.

 Agents has a few fixes, but also has some big changes.
 I have to take an other close look, but yes, I think we should release
 an agents 1.0.4 within the next few weeks.

Great! Then let's go for the next release for agents/heartbeat along with glue.

My most concern about agents is LF#2378:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2378
It is a change but it's a necessary change to make the maintenance
mode work fine.

For heartbeat, I personally like pacemaker on in ha.cf :)


 find_if for IPv6 is also missing if you want to write a script based one.

 I'm sure that can be scripted itself around
 ip -o -f inet6 a s | grep ...

 but we already sort of agreed that this would
 not be development time well spent.

find_if does more than just grepping. It has to do matching against
the network address calculated from the given address and the prefix
to find out which interface would be appropriate to be assigned the
virtual address. The current IPaddr2 also relies on find_if to do
this.

But anyway, I would also agree that we are not going to develop such.
Just off topic.


Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-27 Thread Keisuke MORI

2010/7/27 Keisuke MORI keisuke.mori...@gmail.com:
 2010/7/26 Lars Ellenberg lars.ellenb...@linbit.com:
 On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote:
 Heartbeat does not have many changes (appart from some cleanup in the
 build dependencies), so there is no urge to release a 3.0.4, but we
 could do so any time.
(...)
 For heartbeat, I personally like pacemaker on in ha.cf :)


I should have mentioned this too, the version number in the log file
from heartbeat 3.0.3 seems incorrect. I want to fix this soon to avoid
confusion.


Jul 20 14:08:50 srv01 heartbeat: [6299]: info: Configuration
validated. Starting heartbeat 3.0.2


Thanks,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-26 Thread Keisuke MORI

Hi,

2010/7/23 Lars Ellenberg lars.ellenb...@linbit.com:
 On Fri, Jul 23, 2010 at 03:04:20PM +0200, Andrew Beekhof wrote:
 On Fri, Jul 23, 2010 at 5:09 AM, Simon Horman ho...@verge.net.au wrote:
  Hi Mori-san,
 
  I will add that libnet seems to be more or less unmaintained.

 Someone recently picked it up again, but I'm in favor of the patch for
 the reasons Mori-san already stated.

  You seem to make using libnet optional, is there a reason
  not to just remove it? portability?

 Agreed, lets just drop it.

 Ack.

Thanks to Simon, Andrew and Lars for all of your constructive comments.

I've revised the patch so that it drops the old libnet code completely.
Please apply this into the repository.


By the way, do we have any plan to release the next
agents/glue/heartbeat packages from the Linux-HA project?
I think it's good time to consider them for the best use of pacemaker-1.0.9.


 BTW, is it correct that most of it could be done by ip, similar as
 IPaddr2 does it?  The only think missing would be a send_arp v6.
 Anyone want to write an IPv6addr2? ;-)

find_if for IPv6 is also missing if you want to write a script based one.

Thanks,
-- 
Keisuke MORI
# HG changeset patch
# User Keisuke MORI kskm...@intellilink.co.jp
# Date 1280134509 -32400
# Branch ipv6
# Node ID 275089e31232b870e4218f7dd930538daa438cbf
# Parent  b3142fd9cc672f2217e632608bc986b46265b193
IPv6addr: remove libnet dependency

diff -r b3142fd9cc67 -r 275089e31232 configure.in
--- a/configure.in	Fri Jul 16 09:46:38 2010 +0200
+++ b/configure.in	Mon Jul 26 17:55:09 2010 +0900
@@ -634,7 +634,7 @@
 dnl 
 dnl * Check for netinet/icmp6.h to enable the IPv6addr resource agent
 AC_CHECK_HEADERS(netinet/icmp6.h,[],[],[#include sys/types.h])
-AM_CONDITIONAL(USE_IPV6ADDR, test $ac_cv_header_netinet_icmp6_h = yes -a $new_libnet = yes )
+AM_CONDITIONAL(USE_IPV6ADDR, test $ac_cv_header_netinet_icmp6_h = yes )
 
 dnl 
 dnl Compiler flags
diff -r b3142fd9cc67 -r 275089e31232 heartbeat/IPv6addr.c
--- a/heartbeat/IPv6addr.c	Fri Jul 16 09:46:38 2010 +0200
+++ b/heartbeat/IPv6addr.c	Mon Jul 26 17:55:09 2010 +0900
@@ -87,13 +87,22 @@
 
 #include config.h
 
+#include stdio.h
 #include stdlib.h
+#include unistd.h
 #include sys/types.h
+#include sys/socket.h
 #include netinet/icmp6.h
+#include arpa/inet.h /* for inet_pton */
+#include net/if.h /* for if_nametoindex */
+#include sys/ioctl.h
+#include sys/stat.h
+#include fcntl.h
 #include libgen.h
 #include syslog.h
+#include signal.h
+#include errno.h
 #include clplumbing/cl_log.h
-#include libnet.h
 
 
 #define PIDFILE_BASE HA_RSCTMPDIR  /IPv6addr-
@@ -141,6 +150,8 @@
 const int	UA_REPEAT_COUNT	= 5;
 const int	QUERY_COUNT	= 5;
 
+#define 	HWADDR_LEN 	6 /* mac address length */
+
 struct in6_ifreq {
 	struct in6_addr ifr6_addr;
 	uint32_t ifr6_prefixlen;
@@ -401,69 +412,100 @@
 }
 
 /* Send an unsolicited advertisement packet
- * Please refer to rfc2461
+ * Please refer to rfc4861 / rfc3542
  */
 int
 send_ua(struct in6_addr* src_ip, char* if_name)
 {
 	int status = -1;
-	libnet_t *l;
-	char errbuf[LIBNET_ERRBUF_SIZE];
+	int fd;
 
-	struct libnet_in6_addr dst_ip;
-	struct libnet_ether_addr *mac_address;
-	char payload[24];
 	int ifindex;
+	int hop;
+	struct ifreq ifr;
+	u_int8_t payload[sizeof(struct nd_neighbor_advert)
+			 + sizeof(struct nd_opt_hdr) + HWADDR_LEN];
+	struct nd_neighbor_advert *na;
+	struct nd_opt_hdr *opt;
+	struct sockaddr_in6 src_sin6;
+	struct sockaddr_in6 dst_sin6;
 
-
-	if ((l=libnet_init(LIBNET_RAW6, if_name, errbuf)) == NULL) {
-		cl_log(LOG_ERR, libnet_init failure on %s, if_name);
+	if ((fd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) == 0) {
+		cl_log(LOG_ERR, socket(IPPROTO_ICMPV6) failed: %s,
+		   strerror(errno));
 		goto err;
 	}
 	/* set the outgoing interface */
 	ifindex = if_nametoindex(if_name);
-	if (setsockopt(libnet_getfd(l), IPPROTO_IPV6, IPV6_MULTICAST_IF,
+	if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_IF,
 		   ifindex, sizeof(ifindex))  0) {
-		cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_IF): %s,
+		cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_IF) failed: %s,
 		   strerror(errno));
 		goto err;
 	}
-
-	mac_address = libnet_get_hwaddr(l);
-	if (!mac_address) {
-		cl_log(LOG_ERR, libnet_get_hwaddr: %s, errbuf);
+	/* set the hop limit */
+	hop = 255; /* 255 is required. see rfc4861 7.1.2 */
+	if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS,
+		   hop, sizeof(hop))  0) {
+		cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_HOPS) failed: %s,
+		   strerror(errno));
+		goto err;
+	}
+	
+	/* set the source address */
+	memset(src_sin6, 0, sizeof(src_sin6));
+	src_sin6.sin6_family = AF_INET6;
+	src_sin6.sin6_addr = *src_ip;
+	src_sin6.sin6_port = 0;
+	if (bind(fd, (struct sockaddr *)src_sin6, sizeof(src_sin6))  0) {
+		cl_log(LOG_ERR, bind() failed: %s, strerror(errno));
 		goto err;
 	}
 
-	dst_ip

Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-23 Thread Keisuke MORI

Hi,

2010/7/23 Simon Horman ho...@verge.net.au:
 I will add that libnet seems to be more or less unmaintained.

 You seem to make using libnet optional, is there a reason
 not to just remove it? portability?

I just thought that some people may want to preserve the existing
behavior. OpenSUSE has libnet shipped with it for example, and I'm not
sure if they would agree to change the implementation or just want to
keep using libnet.

But ok, If no one has objections I'll revise the patch so that it
removes all libnet code from IPv6addr.c and make it single code.
Any other opinions?

As for portability, I believe that the new implementation is
more portable than using libnet. (cf.
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034#c10)


 +#define HWADDR_LEN 6 /* mac address length */

 Personally I'd prefer the define outside of the function.

Ok, I just wanted to place them closely but no strong preference.
I'll move it to somewhere around the other macro definitions.

 +     na-nd_na_target = (*src_ip);

 There is no need to enclose *src_ip in brackets.

Right. removing the parens.

 +     if (sendto(fd, payload, sizeof(payload), 0,
 +                (struct sockaddr *)dst_sin6, sizeof(dst_sin6))
 +         != sizeof(payload)) {

 Is it valid to assume that there will never be a partial write?

I think that reporting an error is just enough when a partial write
occurred here. The packet is very small (32 bytes) and it should
rarely happen, it will be retried 5 times when it occurred, and if it
still fails then it should be considered that a really bad things
happened:-) And also the current libnet code does exactly same as
above inside so no behavior would be changed with this code.

Thanks,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency

2010-07-22 Thread Keisuke MORI

The attached patch is to remove libnet dependency from IPv6addr RA
by replacing the same functionality using the standard socket API.

Currently there are following problems with resource-agents package:

 - IPv6addr RA requires an extra libnet package on the run-time environment.
  That is pretty inconvenient particularly for RHEL users because
  it's not included in the standard distribution.

 - The pre-built RPMs from ClusterLabs does not include IPv6addr RA.
  This was once reported in the pacemaker list:
  http://www.gossamer-threads.com/lists/linuxha/pacemaker/64295#64295

The patch will resolve those issues.
I believe that none of Pacemaker/Heartbeat related packages would be
depending on libnet library any more after patched.

Regards,

-- 
Keisuke MORI
# HG changeset patch
# User Keisuke MORI kskm...@intellilink.co.jp
# Date 1279802861 -32400
# Branch ipv6
# Node ID 40d5dbdca9cc089b6514c7525cd2dbd678299711
# Parent  b3142fd9cc672f2217e632608bc986b46265b193
IPv6addr: remove libnet dependency

diff -r b3142fd9cc67 -r 40d5dbdca9cc configure.in
--- a/configure.in	Fri Jul 16 09:46:38 2010 +0200
+++ b/configure.in	Thu Jul 22 21:47:41 2010 +0900
@@ -607,6 +607,7 @@
   [new_libnet=yes; AC_DEFINE(HAVE_LIBNET_1_1_API, 1, Libnet 1.1 API)],
   [new_libnet=no; AC_DEFINE(HAVE_LIBNET_1_0_API, 1, Libnet 1.0 API)],$LIBNETLIBS)
AC_SUBST(LIBNETLIBS)
+   AC_DEFINE(HAVE_LIBNET_API, 1, Libnet API)
 fi
 
 if test $new_libnet = yes; then
@@ -634,7 +635,7 @@
 dnl 
 dnl * Check for netinet/icmp6.h to enable the IPv6addr resource agent
 AC_CHECK_HEADERS(netinet/icmp6.h,[],[],[#include sys/types.h])
-AM_CONDITIONAL(USE_IPV6ADDR, test $ac_cv_header_netinet_icmp6_h = yes -a $new_libnet = yes )
+AM_CONDITIONAL(USE_IPV6ADDR, test $ac_cv_header_netinet_icmp6_h = yes )
 
 dnl 
 dnl Compiler flags
diff -r b3142fd9cc67 -r 40d5dbdca9cc heartbeat/IPv6addr.c
--- a/heartbeat/IPv6addr.c	Fri Jul 16 09:46:38 2010 +0200
+++ b/heartbeat/IPv6addr.c	Thu Jul 22 21:47:41 2010 +0900
@@ -87,13 +87,25 @@
 
 #include config.h
 
+#include stdio.h
 #include stdlib.h
+#include unistd.h
 #include sys/types.h
+#include sys/socket.h
 #include netinet/icmp6.h
+#include arpa/inet.h /* for inet_pton */
+#include net/if.h /* for if_nametoindex */
+#include sys/ioctl.h
+#include sys/stat.h
+#include fcntl.h
 #include libgen.h
 #include syslog.h
+#include signal.h
+#include errno.h
 #include clplumbing/cl_log.h
+#ifdef HAVE_LIBNET_API
 #include libnet.h
+#endif
 
 
 #define PIDFILE_BASE HA_RSCTMPDIR  /IPv6addr-
@@ -400,8 +412,11 @@
 	return OCF_NOT_RUNNING;
 }
 
+#ifdef HAVE_LIBNET_API
 /* Send an unsolicited advertisement packet
  * Please refer to rfc2461
+ *
+ * Libnet based implementation.
  */
 int
 send_ua(struct in6_addr* src_ip, char* if_name)
@@ -466,6 +481,108 @@
 	libnet_destroy(l);
 	return status;
 }
+#else /* HAVE_LIBNET_API */
+/* Send an unsolicited advertisement packet
+ * Please refer to rfc4861 / rfc3542
+ *
+ * Libnet independent implementation.
+ */
+int
+send_ua(struct in6_addr* src_ip, char* if_name)
+{
+	int status = -1;
+	int fd;
+
+	int ifindex;
+	int hop;
+	struct ifreq ifr;
+#define HWADDR_LEN 6 /* mac address length */
+	u_int8_t payload[sizeof(struct nd_neighbor_advert)
+			 + sizeof(struct nd_opt_hdr) + HWADDR_LEN];
+	struct nd_neighbor_advert *na;
+	struct nd_opt_hdr *opt;
+	struct sockaddr_in6 src_sin6;
+	struct sockaddr_in6 dst_sin6;
+
+	if ((fd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) == 0) {
+		cl_log(LOG_ERR, socket(IPPROTO_ICMPV6) failed: %s,
+		   strerror(errno));
+		goto err;
+	}
+	/* set the outgoing interface */
+	ifindex = if_nametoindex(if_name);
+	if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_IF,
+		   ifindex, sizeof(ifindex))  0) {
+		cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_IF) failed: %s,
+		   strerror(errno));
+		goto err;
+	}
+	/* set the hop limit */
+	hop = 255; /* 255 is required. see rfc4861 7.1.2 */
+	if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS,
+		   hop, sizeof(hop))  0) {
+		cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_HOPS) failed: %s,
+		   strerror(errno));
+		goto err;
+	}
+	
+	/* set the source address */
+	memset(src_sin6, 0, sizeof(src_sin6));
+	src_sin6.sin6_family = AF_INET6;
+	src_sin6.sin6_addr = *src_ip;
+	src_sin6.sin6_port = 0;
+	if (bind(fd, (struct sockaddr *)src_sin6, sizeof(src_sin6))  0) {
+		cl_log(LOG_ERR, bind() failed: %s, strerror(errno));
+		goto err;
+	}
+
+
+	/* get the hardware address */
+	memset(ifr, 0, sizeof(ifr));
+	strncpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name) - 1);
+	if (ioctl(fd, SIOCGIFHWADDR, ifr)  0) {
+		cl_log(LOG_ERR, ioctl(SIOCGIFHWADDR) failed: %s, strerror(errno));
+		goto err;
+	}
+
+	/* build a neighbor advertisement message */
+	memset(payload, 0, sizeof(payload));
+
+	na = (struct nd_neighbor_advert *)payload;
+	na-nd_na_type = ND_NEIGHBOR_ADVERT;
+	na

Re: [Linux-HA] linux-ha 3.0.3 + SNMP

2010-05-18 Thread Keisuke MORI

Hi,

The SNMP subagent  has been moved to the Pacemaker GUI package:
http://hg.clusterlabs.org/pacemaker/pygui/

(I haven't use it with the recent version of heartbeat-3.* though)

Also the correct configure option is --enable-snmp-subagent

--enable-snmp option should only affect to some stonith agents, which is
in the glue package now, so I think any SNMP related options are
meaningless to the current heartbeat-3.* package.



2010/5/19 Patrice Laramee patrice.lara...@imetrik.com:
 Hi Florian,

 I did it with the two hyphens. It was a copy/paste error for the email.

 If I want to do SNMP monitoring for the 3.0.0 branch, I have to use
 Pacemaker?

 Thanks,
 -Pat


 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Florian Haas
 Sent: May-18-10 11:19 AM
 To: General Linux-HA mailing list
 Subject: Re: [Linux-HA] linux-ha 3.0.3 + SNMP

 On 2010-05-17 22:48, Patrice Laramee wrote:
 Hi,



 I've been trying to compile heartbeat with SNMP support. It did
 compile
 fine, but I cannot find the binary 'hbagent'. Was this binary removed
 from this version?

 o        ./ConfigureMe configure -enable-snmp

 Are you aware that this should be --enable-snmp (two hyphens, not
 one)?


-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] [Pacemaker] Known problem with IPaddr(2)

2010-04-26 Thread Keisuke MORI

Hi,

Regarding to the discussion in the pacemaker ML below,
I would suggest a patch as attached.

The patch includes:
1) Fix IPaddr to return the correct OCF value (It returned 255 when
delete_interface failed).
2) Add a description about the assumption in IPaddr / IPaddr2 meta-data.

Regards,

Keisuke MORI

2010/4/14 Lars Ellenberg lars.ellenb...@linbit.com:
 On Tue, Apr 13, 2010 at 08:28:09PM +0200, Lars Ellenberg wrote:
 On Tue, Apr 13, 2010 at 12:10:18PM +0200, Dejan Muhamedagic wrote:
  Hi,
 
  On Mon, Apr 12, 2010 at 05:26:19PM +0200, Markus M. wrote:
   Markus M. wrote:
   is there a known problem with IPaddr(2) when defining many (in my
   case: 11) ip resources which are started/stopped concurrently?
 
  Don't remember any problems.
 
   Well... some further investigation revealed that it seems to be a
   problem with the way how the ip addresses are assigned.
  
   When looking at the output of ip addr, the first ip address added
   to the interface gets the scope global, all further aliases gets
   the scope global secondary.
  
   If afterwards the first ip address is removed before the secondaries
   (due to concurrently run of the scripts), ALL secondaries are
   removed at the same time by the ip command, leading to an error
   for all subsequent trials to remove the other ip addresses because
   they are already gone.
  
   I am not sure how ip decides for the secondary scope, maybe
   beacuse the other ip addresses are in the same subnet as the first
   one.
 
  That sounds bad. Instances should be independent of each other.
  Can you please open a bugzilla and attach a hb_report.

 Oh, that is perfectly expected the way he describes it.
 The assumption has always been that there is at least one
 normal, not managed by crm, address on the interface,
 so no one will have noticed before.

 I suggest the following patch,
 basically doing one retry.

 For the described scenario,
 the second try will find the IP already non existant,
 and exit $OCF_SUCCESS.

 Though that obviously won't make instances independent.

 The typical way to achieve that is to have them all as secondary IPs.
 Which implies that for successful use of independent IPaddr2 resources
 on the same device, you need at least one system IP (as opposed to
 managed by cluster) on that device.

 The first IP assigned will get primary status.
 Usually, if you delete a primary IP, the kernel will also
 delete all secondary IP addresses.

 If using a system IP is not an option, here is the alternative:
 Recent kernels (a quick check revealed that this setting is around
 since at least 2.6.12) can do alias promotion, which can be enabled
 using
        sysctl -w net.ipv4.conf.all.promote_secondaries=1
 (or per device)

 In both cases the previously retry on ip_stop patch is unnecesssary.
 But won't do any harm, either. Most likely ;-)

 Glad that helped ;-)

 Somebody please add that to the man page respectively agent meta data...

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

 ___
 Pacemaker mailing list: pacema...@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf




-- 
Keisuke MORI


agents-ipaddr-retval.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Memory leaks in lrmd/cl_msg

2010-03-30 Thread Keisuke MORI

Hi,

lrmd in glue-1.0.3 has a memory leakage.
To be exact, the leakage is in the cl_msg library.

Please find the detail on the bugzilla item:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2389

Note that the leakage must have been existing since
the old heartbeat-2.1.4 because the code around here has not
been changed quite a while.

Thanks,

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Pseudo RAs do not work properly on Corosync stack

2010-03-25 Thread Keisuke MORI

Hi,

2010/3/24 Andrew Beekhof and...@beekhof.net:
 We'd need to coordinate this with all projects (corosync,
 pacemaker, heartbeat, glue, agents). That would probably be the
 most difficult part.

 Currently the ais plugin has:

    mkdir(HA_STATE_DIR/heartbeat, 0755); /* Used by RAs - Leave
 owned by root */
    mkdir(HA_STATE_DIR/heartbeat/rsctmp, 0755); /* Used by RAs -
 Leave owned by root */

 When you make the change, please also put it in a #define that
 pacemaker can look for during configure.
 That way I can default to the above if I can't find it.

 If you do that then upgrading should be pretty trivial.

OK, I will look into it when making changes.

I filed a bugzilla item for this issue:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2378

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Pseudo RAs do not work properly on Corosync stack

2010-03-16 Thread Keisuke MORI

Hi,

Sorry for a bit long mail.
I'm going to describe the issue of the Subject: and would like to
suggest some changes to the agents package (and possibly Pacemaker, too).
I would be grad if you could give me your thought and comments.



A pseudo RA which creates a stat file under HA_RSCTMP
(/var/run/heartbeat/rsctmp), such as Dummy, MailTo, etc. do not
work properly on the Pacemaker+Corosync stack.

When a node crashed and was rebooted, a stale stat file is
left over the reboot and hence the RA misbehaves as if the
resource was already started when the cluster is launched again
for the recovery.

This problem does not occur on Heartbeat stack because
Heartbeat removes HA_RSCTMP when its startup,
while on Pacemaker stack none of Pacemaker/Corosync removes it.

But removing them by Pacemaker does not seem to be correct -
if they were removed at the cluster startup time then the
maintenance mode would no longer work properly.

In my understanding, the correct behavior is:
 - They should NOT be removed at the cluster startup time.
 - They should be removed at the OS bootup time.



My suggestion to address this issue is, to fix as the following;

 - 1) change the HA_RSCTMP location to /var/run/resource-agents,
  or wherever a subdirectory right under /var/run.
 - 2) having the directory permission as 01777 (with sticky bit)
 - 3) change IPaddr/SendArp RA not to use its own subdirectory
  but instead, add a prefix for the filename.
 - 4) make /var/run/heartbeat/rsctmp as obsolete;
  Heartbeat/Pacemaker could preserve the current behavior
  for a while for the compatibility.


The basic idea of the changes is that, we're now going to follow the
file removal procedure defined by FHS(Filesystem Hierarchy Standard).

http://www.pathname.com/fhs/pub/fhs-2.3.html#VARRUNRUNTIMEVARIABLEDATA

FHS defines that any files under a subdirectory of /var/run
should be removed at the OS bootup time.

Unfortunately the second level subdirectory is out of the scope and
you can not rely on the removal (and that's the case of
/var/run/heartbeat/rsctmp).


I believe that the impacts for existing RAs are minimum.
If your RA is implemented correctly then you need to do nothing -
just notice that the location of the stat file is changed.

If your RA has hardcoded /var/run/heartbeat/rsctmp, or it
creates its own subdirectory, it is encouraged to fix because it
may not work well with the maintenance mode, but you can
continue to use the old rsctmp if you would like.


I would like to hear your thought and comments.

Regards,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Linux-HA site down?

2009-08-14 Thread Keisuke MORI

Hi,

http://www.linux-ha.org/ seems down today.

Maintenance ? or something trouble?

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] An OCF RA for syslog-ng

2009-06-11 Thread Keisuke MORI

Hi Dejan,

Do you have any chance to take a look at the syslog-ng OCF RA which
was posted by Takenaka-san before?

http://www.gossamer-threads.com/lists/linuxha/dev/54425

If you are OK, I will commit this to the -dev repository.

Thanks,
-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] OCF Script for Jboss

2009-06-11 Thread Keisuke MORI

Hi,

I'm posting an OCF RA for JBoss, which was originally posted by Stefan
to the users list, and includes some modifications as suggested by
Takenaka-san:

http://www.gossamer-threads.com/lists/linuxha/users/53969

Stefan,
Do you have any comment on this modification?

Dejan,
Would you please review this RA if you have any chance?

If you are all OK, I will commit the RA to the -dev repository.

Thanks,

-- 
Keisuke MORI


jboss
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] An OCF RA for syslog-ng

2009-06-11 Thread Keisuke MORI

Hi Dejan,

Thank you for your comments.
I will repost the RA after I revise it with your comments.

Thanks,

2009/6/11 Dejan Muhamedagic deja...@fastmail.fm:
 Hi Keisuke-san,

 On Thu, Jun 11, 2009 at 06:16:26PM +0900, Keisuke MORI wrote:
 Hi Dejan,

 Do you have any chance to take a look at the syslog-ng OCF RA which
 was posted by Takenaka-san before?

 http://www.gossamer-threads.com/lists/linuxha/dev/54425

 Attaching the script with comments. Please use diff.

 Cheers,

 Dejan

 If you are OK, I will commit this to the -dev repository.

 Thanks,
 --
 Keisuke MORI
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/





-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] OCF Script for Jboss

2009-06-11 Thread Keisuke MORI

Hi,

I'm posting an OCF RA for JBoss, which was originally posted by Stefan
to the users list, and includes some modifications as suggested by
Takenaka-san:

http://www.gossamer-threads.com/lists/linuxha/users/53969

Stefan,
Do you have any comment on this modification?

Dejan,
Would you please review this RA if you have any chance?

If you are all OK, I will commit the RA to the -dev repository.

Thanks,

-- 
Keisuke MORI


jboss
Description: Binary data
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] IPv6addr with prefixes longer than 64

2009-06-08 Thread Keisuke MORI

Hi,

2009/6/4 Rob Gallagher robert.gallag...@heanet.ie:
 Running the resource manually gives:

 r...@charlene:/etc/ha.d# /etc/ha.d/resource.d/IPv6addr
 2001:770:18:2:0:0:c101:db4a/128/eth0 start 2009/06/04_10:15:50 ERROR:
 Generic error ERROR:  Generic error

Didn't you get an error log something like this (in ha-log or syslog)?
I saw this in my reproduce test when I specify /128 prefix.

Jun  9 12:49:15 pacifica IPv6addr: [8640]: ERROR: no valid mecahnisms

It means that the RA could not find a proper network interface,
and it most likely be a configuration error.

 However if I change the prefix to /64 it is added without error:

If your network is setup as /64 prefix, then you should specify
/64 to the parameter of IPv6addr.

By the way, specifying an interface (i.e. /eth0) is not supported yet
in IPv6addr. (would be ignored)

Hope it helps.

Thanks,

-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] Checksum not computed in ICMPv6 neighbor advertisement

2009-06-07 Thread Keisuke MORI

Hi,

2009/6/5 Dejan Muhamedagic deja...@fastmail.fm:
 Hi Andre,

 On Fri, Jun 05, 2009 at 09:34:37AM +, Andre, Pascal wrote:
 Hi,

 On an Active/Standby platform (using Linux-HA 2.1.4 RHEL5, in
 my case), when a fail-over/switch-over is initiated and standby
 machine takes over the virtual IP (IPv6), IPv6addr broadcasts
 an ICMPv6 neighbor advertisement message.

 Unfortunately, this ICMPv6 message has its checksum field set
 to 0 (i.e. not computed). The message is thus discarded by
 recipients.

 Maybe this computation should be done by libnet itself.
 Unfortunately, without much time to investigate libnet, I've
 added code in resources/OCF/IPv6addr.c in order to compute the
 checksum and provide the result to libnet (as a parameter).

 Applied. Many thanks for the patch.

That problem was already fixed in:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034
so the patch should not be necessary.

-- 
Keisuke MORI
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] re:A patch of tomcat.

2009-02-25 Thread Keisuke MORI

Hi Dejan,

Thank you for reviewing it.

Commited the revised patch by Yamauchi-san (tomcat.patch-0225) as:
http://hg.linux-ha.org/dev/rev/6cbdca48bf88

Thanks,

Dejan Muhamedagic deja...@fastmail.fm writes:

 Hi,

 On Tue, Feb 24, 2009 at 12:20:22PM +0900, Keisuke MORI wrote:
 Hi,
 
 Will anybody review this patch?

 I was just reviewing it.

 I can commit it to the -dev if there're no comments.
 The patch was well tested and is used with tomcat 5.5 in our environment.

 Great. I'm attaching a patch which contains just a few
 minor optimizations and some meta-data updates. Please apply it
 after checking it with your tomcat (no tomcats here :).

 Cheers,

 Dejan

 Thanks,
 
 renayama19661...@ybb.ne.jp writes:
 
  Hi All, 
 
  The patch which solved a new problem was completed.
  The change is the following point. 
 
  1. Addition of the comment. 
 
  2. Deletion of the garbage in the log. 
 
  3. Optional addition. 
   * catalina_opts - CATALINA_OPTS environment variable. Default is None 
   * catalina_rotate_log - Control catalina.out logrotation flag. Default is 
  NO. 
   * catalina_rotatetime - catalina.out logrotation time span(seconds). 
  Default is 86400. 
 
  4. I summarized redundant pgrep processing in one function.
 
  5. Revised it so that pgrep was handled in a version of new tomcat 
  definitely.
   * The new version of tomcat confirmed that there was not a problem with 
  5.5.27 and version 6.0.28.
 
  6. For unity, I revised it to use $WGET of ocf_shellfunc.
 
  I attached a patch. 
  Please reflect it in a development version. 
 
  Best Regards,
  Hideo Yamauchi.
 
  --- renayama19661...@ybb.ne.jp wrote:
 
  Hi,
  
  Sorry
  
  There was a problem to the patch which I attached.
  When used latest tomcat, RA seem not to be able to handle it well.
  
  I will send the patch which I revised later.
  
  Best Regards,
  
  Hideo Yamauchi.

-- 
Keisuke MORI
Open Source Business Unit
Software Services Integration Business Division
NTT DATA Intellilink Corporation
Tel: +81-3-3534-4810 / Fax: +81-3-3534-4814
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] re:A patch of tomcat.

2009-02-23 Thread Keisuke MORI

Hi,

Will anybody review this patch?

I can commit it to the -dev if there're no comments.
The patch was well tested and is used with tomcat 5.5 in our environment.

Thanks,

renayama19661...@ybb.ne.jp writes:

 Hi All, 

 The patch which solved a new problem was completed.
 The change is the following point. 

 1. Addition of the comment. 

 2. Deletion of the garbage in the log. 

 3. Optional addition. 
  * catalina_opts - CATALINA_OPTS environment variable. Default is None 
  * catalina_rotate_log - Control catalina.out logrotation flag. Default is 
 NO. 
  * catalina_rotatetime - catalina.out logrotation time span(seconds). Default 
 is 86400. 

 4. I summarized redundant pgrep processing in one function.

 5. Revised it so that pgrep was handled in a version of new tomcat definitely.
  * The new version of tomcat confirmed that there was not a problem with 
 5.5.27 and version 6.0.28.

 6. For unity, I revised it to use $WGET of ocf_shellfunc.

 I attached a patch. 
 Please reflect it in a development version. 

 Best Regards,
 Hideo Yamauchi.

 --- renayama19661...@ybb.ne.jp wrote:

 Hi,
 
 Sorry
 
 There was a problem to the patch which I attached.
 When used latest tomcat, RA seem not to be able to handle it well.
 
 I will send the patch which I revised later.
 
 Best Regards,
 
 Hideo Yamauchi.
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/
 

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

-- 
Sincerely,

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] IPv6addr heartbeat

2009-02-09 Thread Keisuke MORI


Mariusz Blank mariuszbl...@alcatel-lucent.com writes:
 hto-mapfuncs: line 52: 20709 Aborted $__SCRIPT_NAME start
 2009/02/09_16:25:16 ERROR:  Unknown error: 134
 ERROR:  Unknown error: 134
 # ./resource.d/IPv6addr 2000:0:0:0:0:0:0:C/122/bond1 status
 2009/02/09_16:25:24 INFO:  Running OK
 INFO:  Running OK


 Do you know what is wrong with it?

It probably be the same problem to:
http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034

Could you try this patch?
http://hg.linux-ha.org/dev/rev/673f32858223

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] A STONITH plugin for checking whether the target node is kdumping or not.

2008-10-16 Thread Keisuke MORI

Hi Lars,

When we discussed about this feature at the Cluster Summit,
you mentioned that there're some issues in stonithd regarding to
the STONITH escalation.

Could you summarise the issues again please?
And if you have any particular test cases that may not work well 
in your mind, we will add the test cases and try to fix it.

As long as we've tested so far it seems working fine as expected, though.

Regars,
Keisuke MORI

Satomi TANIGUCHI [EMAIL PROTECTED] writes:

 Hi lists,

 I'm posting a STONITH plugin which checks whether the target node is kdumping
 or not.
 There are some steps to use this, but I believe this plugin is helpful for
 failure analysis.
 See attached README for details about how to use this.

 There are 2 patches.
 The patch named kdumpcheck.patch is for Linux-HA-dev(1eae6aaf1af8).
 And the patch named mkdumprd_for_kdumpcheck.patch is
 for mkdumprd version 5.0.39.

 If you're interested in, please give me your comments.
 Any comments and suggestions are really appreciated.


 Best Regards,
 Satomi TANIGUCHI


-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC

2008-08-18 Thread Keisuke MORI

Andrew,

Thanks for fixing it!

With my quick try, it seems working fine.
I (and a colleague of mine) now continue to test to make sure
that everything works fine. 

Thanks,

Andrew Beekhof [EMAIL PROTECTED] writes:

 Fixed in:
http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/2d516888d27c

 2008/8/15 Keisuke MORI [EMAIL PROTECTED]:

  But I've got PE crash now when I used with clone resources...
 I think the following is the correct fix, but i need to do some more 
 testing

 I've pushed that fix for the fatal assert to both the lha-2.1 tree and
 the openSUSE build service.

 I look forward to hearing from Keisuke-san whether this works for them
 now!

 It does not seem to be fixed right.

 It does not cause an assertion failure any more (neither crash ;-),
 but an invalid clone resource is appeared.

 Thanks,

 --
 Keisuke MORI
 NTT DATA Intellilink Corporation


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC

2008-08-18 Thread Keisuke MORI

Keisuke MORI [EMAIL PROTECTED] writes:
 Andrew,

 Thanks for fixing it!

 With my quick try, it seems working fine.
 I (and a colleague of mine) now continue to test to make sure
 that everything works fine. 

Just for making sure...
Our tests regarding to clone groups has passed without any problem.
Thank you again for the fix!


And also I would like to say thank you for _everybody_
who helped for the release in various way.
Thank you very much!



 Thanks,

 Andrew Beekhof [EMAIL PROTECTED] writes:

 Fixed in:
http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/2d516888d27c

 2008/8/15 Keisuke MORI [EMAIL PROTECTED]:

  But I've got PE crash now when I used with clone resources...
 I think the following is the correct fix, but i need to do some more 
 testing

 I've pushed that fix for the fatal assert to both the lha-2.1 tree and
 the openSUSE build service.

 I look forward to hearing from Keisuke-san whether this works for them
 now!

 It does not seem to be fixed right.

 It does not cause an assertion failure any more (neither crash ;-),
 but an invalid clone resource is appeared.

 Thanks,

 --
 Keisuke MORI
 NTT DATA Intellilink Corporation


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

 -- 
 Keisuke MORI
 NTT DATA Intellilink Corporation

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

-- 
Keisuke MORI
Open Source Business Unit
Software Services Integration Business Division
NTT DATA Intellilink Corporation
Tel: +81-3-3534-4810 / Fax: +81-3-3534-4814
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC

2008-08-17 Thread Keisuke MORI

Lars Marowsky-Bree [EMAIL PROTECTED] writes:
 On 2008-08-15T17:52:42, Keisuke MORI [EMAIL PROTECTED] wrote:

 More precisely, we once tried to use clones with 2.1.3 in production
 but had to suspend to use it because there were some problems.
 Now we want to upgrade it to the coming 2.1.4 with using clones.

 _Clones_ by themselves work fine, but cloned groups are the issue. You
 can work around this by not using them ;-)


We assume that we would use cloned groups as well,
and therefore we've been doing our tests with a configuration using cloned 
groups.
(and we didn't expect that those behaves differently ;-)


-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC

2008-08-15 Thread Keisuke MORI

Lars Marowsky-Bree [EMAIL PROTECTED] writes:
 On 2008-08-15T11:55:35, Keisuke MORI [EMAIL PROTECTED] wrote:

  I look forward to hearing from Keisuke-san whether this works for them
  now!
 
 It does not seem to be fixed right.
 
 It does not cause an assertion failure any more (neither crash ;-),
 but an invalid clone resource is appeared.

 Ah, well. Then we'll have to wait for Andrew to fix it completely.
 Otherwise, the code looks fine here.

 Are you using cloned groups in production, btw?

Yes.
More precisely, we once tried to use clones with 2.1.3 in production
but had to suspend to use it because there were some problems.
Now we want to upgrade it to the coming 2.1.4 with using clones.

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] crm_mon doesn't exit immediately

2008-08-11 Thread Keisuke MORI

Andrew,

If there's no objection I would like to push this patch into
the lha-2.1 repository, but any problem on that?

It seems that the latest pacemaker also presents the same behavior
so I think the both needs to be fixed as well.

Thanks,


Junko IKEDA [EMAIL PROTECTED] writes:

 Hi,

 I found that crm_mon which is included in Pacemaker-dev(2f2343008186) can be
 quitted by Ctrl + C.
 If a back port from Pacemaker to Heartbeat 2.1.4 is better than applying the
 patch,
 We don't care about how to fix this.

 Thanks,
 Junko


 Can somebody handle this issue?
 She said that, she couldn't quit crm_mon command with Ctrl+C.
 I usually use crm_mon with -i option, so I couldn't notice this behavior,
 but it sure is that crm_mon running with no option wouldn't be stopped by
 SIGINT.
 It's odd, right?
 I think almost all people would expect that Ctrl + C can stop this
 command.
 See attached her patch.
 
 Thanks,
 Junko
 
 
  I noticed that crm_mon doesn't exit immediately
  when it receive SIGINT in mainloop.
  It seems that SIGINT only kills sleep() function...
  (Is this caused by something in G_main_add_SignalHandler()?
   Or anything else?)
 
  So, I modified it to exit wait function
  when it is interrupted by a signal.
  This patch is for Heartbeat STABLE 2.1 (aae8d51d84ec).
  I hope it isn't too late for Heartbeat2.1.4...
 
 
  Regards,
  Satomi Taniguchi

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Re: [Linux-HA] rsc_order constraints behavior changed?

2008-08-11 Thread Keisuke MORI

Andrew,

I'm also going to backport this fix into lha-2.1. 
If there's any problem could you please let me know.

Thanks,

Junko IKEDA [EMAIL PROTECTED] writes:

  If you don't want non_clone_group1 to be restarted when this happens,
  make the ordering constraint advisory-only by setting adding score=0
  to the constraint.
  I tried this configuration, but non_clone_group1 was restarted
  when clone1 resources fail-count was cleared.
 
  you're right - this appears to be broken :(
 
 fixed in:
http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/e4b49e9f957b

 Thanks a lot!
 We are planning to offer this function soon,
 so could you push this change into Heartbeat 2.1.4(Stable 2.1)?

 Thanks,
 Junko

 ___
 Linux-HA mailing list
 [EMAIL PROTECTED]
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] rsc_order constraints behavior changed?

2008-08-11 Thread Keisuke MORI

Andrew,

I'm also going to backport this fix into lha-2.1. 
If there's any problem could you please let me know.

Thanks,

Junko IKEDA [EMAIL PROTECTED] writes:

  If you don't want non_clone_group1 to be restarted when this happens,
  make the ordering constraint advisory-only by setting adding score=0
  to the constraint.
  I tried this configuration, but non_clone_group1 was restarted
  when clone1 resources fail-count was cleared.
 
  you're right - this appears to be broken :(
 
 fixed in:
http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/e4b49e9f957b

 Thanks a lot!
 We are planning to offer this function soon,
 so could you push this change into Heartbeat 2.1.4(Stable 2.1)?

 Thanks,
 Junko

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-ha-dev] BasicSanityCheck fails in lha-2.1

2008-07-29 Thread Keisuke MORI

Dejan,

BasicSanityCheck fails by the permission test of RA
because ocf-tester returns an error at below (line 175)
if nobody user was not allowed to login.

su nobody $agent $action  /dev/null

[EMAIL PROTECTED] su nobody /usr/lib/ocf/resource.d/heartbeat/Dummy meta-data
This account is currently not available.
[EMAIL PROTECTED] grep nobody /etc/passwd
nobody:x:99:99:Nobody:/:/sbin/nologin


How about to use the hacluster user instead as attached?

Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

diff -r a8b2fc037b29 tools/ocf-tester.in
--- a/tools/ocf-tester.in	Thu Jul 17 17:01:29 2008 +0900
+++ b/tools/ocf-tester.in	Tue Jul 29 19:58:04 2008 +0900
@@ -168,11 +168,11 @@ lrm_test_command() {
 
 test_permissions() {
 action=meta-data
-msg=${1:-Testing permissions with uid nobody}
+msg=${1:-Testing permissions with uid @HA_CCMUSER@}
 if [ $verbose -ne 0 ]; then
 	echo $msg
 fi
-su nobody $agent $action  /dev/null
+su @HA_CCMUSER@ $agent $action  /dev/null
 }
 
 test_metadata() {
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] BasicSanityCheck fails in lha-2.1

2008-07-29 Thread Keisuke MORI

Hi Dejan,

Dejan Muhamedagic [EMAIL PROTECTED] writes:
 Hi Keisuke-san,

 On Tue, Jul 29, 2008 at 08:03:18PM +0900, Keisuke MORI wrote:
 Dejan,
 
 BasicSanityCheck fails by the permission test of RA
 because ocf-tester returns an error at below (line 175)
 if nobody user was not allowed to login.
 
 su nobody $agent $action  /dev/null
 
 [EMAIL PROTECTED] su nobody /usr/lib/ocf/resource.d/heartbeat/Dummy meta-data
 This account is currently not available.
 [EMAIL PROTECTED] grep nobody /etc/passwd
 nobody:x:99:99:Nobody:/:/sbin/nologin
 
 
 How about to use the hacluster user instead as attached?

 That won't help. nobody was chosen because lrmd runs the
 meta-data action as nobody. The problem here is that su(1)
 requires a shell whereas lrmd doesn't. It looks like the -s
 option could help. Just pushed a patch. Could you please test it
 too.

That works perfectly!

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] sfex

2008-06-18 Thread Keisuke MORI

Hi

Dejan Muhamedagic [EMAIL PROTECTED] writes:

 Hi Keisuke-san,

 On Tue, Jun 17, 2008 at 05:33:52PM +0900, Keisuke MORI wrote:
 Dejan,
 
 Thank you for taking care of it.
 
 Yes, NTT is very glad and agrees to include sfex into the
 heartbeat repository!
 
 Dejan Muhamedagic [EMAIL PROTECTED] writes:
 
  Hello,
 
  Since last year NTT designed and implemented sfex, a suite of
  programs to improve shared disk usage (see linux-ha.org/sfex)
  which unfortunately didn't attract attention it deserves. I
  reviewed the code and attached you'll find some comments and some
  simple changes. One general remark: all programs (sfex_*) are
  monolithic and, though they are not that big, it would be
  beneficial to code readers if they were split into more
  units/functions.
 
 That sounds reasonable.
 Where can I find your comments and modifications?

 A reasonable question :)  Forgot to attach the file with
 comments. Sorry about that. It is in the form of a patch against
 version 1.3.


Thanks, I will look into it.



  A couple of suggestions on making sfex useful in other contexts
  were making a quorum plugin and a HBcomm plugin. Did you
  investigate further these options?
 
 
 Yes we did but we think that
 those would be totally different approach from sfex.
 
 
  - a quorum plugin
 
A quorum plugin is executed only on 'the cluster leader node' in CCM,

 I don't think so. CCM delivers connectivity and quorum
 information on each node. However, that's probably not relevant.

and it does not care where the resource is running on,
whereas sfex should run on the same node which the resource
in question is running on because it's for the protection of
the data which resides in the resource.
 
In other words, sfex is to control with resource granularity,
whereas a quorum plugin is to control 'the partition' granularity.

 Right. The point was however to use parts of sfex for the quorum
 functionality. I'll see if I can get back to you with a more
 detailed and specific proposal.


I still don't understand you very well, sorry.
I'd appreciate if you could explain more details.





  - HBcomm plugin
 
I remember that somebody posted this before, called 'dskcm'.

 Somehow missed that one.

This is also interesting idea but the approach is very different.
 
This approach is:
 - having yet another redundant communication path through
   the shared medium.
whereas sfex's approach is:
 - provide a protection method when ALL of the communication
   paths are failed.
 
Even though they have the similar goal the functionality is
very different.

 Yes. Though again sfex would need to be twisted a bit to provide
 heartbeats over shared storage. I'll take a look at dskcm.


It was this:

http://www.gossamer-threads.com/lists/linuxha/dev/39716#39716

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] sfex

2008-06-17 Thread Keisuke MORI

Dejan,

Thank you for taking care of it.

Yes, NTT is very glad and agrees to include sfex into the
heartbeat repository!

Dejan Muhamedagic [EMAIL PROTECTED] writes:

 Hello,

 Since last year NTT designed and implemented sfex, a suite of
 programs to improve shared disk usage (see linux-ha.org/sfex)
 which unfortunately didn't attract attention it deserves. I
 reviewed the code and attached you'll find some comments and some
 simple changes. One general remark: all programs (sfex_*) are
 monolithic and, though they are not that big, it would be
 beneficial to code readers if they were split into more
 units/functions.

That sounds reasonable.
Where can I find your comments and modifications?



 A couple of suggestions on making sfex useful in other contexts
 were making a quorum plugin and a HBcomm plugin. Did you
 investigate further these options?


Yes we did but we think that
those would be totally different approach from sfex.


 - a quorum plugin

   A quorum plugin is executed only on 'the cluster leader node' in CCM,
   and it does not care where the resource is running on,
   whereas sfex should run on the same node which the resource
   in question is running on because it's for the protection of
   the data which resides in the resource.

   In other words, sfex is to control with resource granularity,
   whereas a quorum plugin is to control 'the partition' granularity.


 - HBcomm plugin

   I remember that somebody posted this before, called 'dskcm'.
   This is also interesting idea but the approach is very different.

   This approach is:
- having yet another redundant communication path through
  the shared medium.
   whereas sfex's approach is:
- provide a protection method when ALL of the communication
  paths are failed.

   Even though they have the similar goal the functionality is
   very different.



 Of course, if you agree, we could include sfex into the heartbeat
 repository.

 Cheers,

 Dejan
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] IPv6 HBcomm plugin

2008-06-17 Thread Keisuke MORI

Andrew Beekhof [EMAIL PROTECTED] writes:

 On Tue, Jun 17, 2008 at 09:48, Keisuke MORI [EMAIL PROTECTED] wrote:
 Andrew Beekhof [EMAIL PROTECTED] writes:
 and in case anyone cares... the new pingd tool (the stand-alone
 version that supports both stacks) also supports IPv6

 It's something I'm interested in...

 Do you have any plan when it will be available?

 Its already in pacemaker-dev (which I think you're testing already).
 It will also be part of 0.7 (unstable) which will be out this month.


Ok, I didn't realize that it's already in there.
I will take a look at it.

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [PATCH] IPv6 HBcomm plugin

2008-06-13 Thread Keisuke MORI

Hi,

I've been implementing HBcomm plugin to enable IPv6
communication among the cluster nodes and the ping nodes.

It is still an experimental implementation and
I would appreciate on any feedback.


Thanks,

            
The IPv6 HBcomm plugin usage

1. Building

Apply the attached patch and do './ConfigureMe configure' and make.

The patch is made against the dev branch at:
changeset:   11945:5c915f1d5b7b

It has built and tested on RHEL 5.1.


2. Configuration

The following two directives are available in ha.cf:

 1) mcast6

Use IPv6 multicast for the heartbeat communication between the nodes.
The syntax is same as 'mcast'.

Eg.
mcast6 eth1 ff02::694 694 1 0

Note: Please choose a multicast address that available on
your subnet. The address in the example is not officially
registered to IANA.


 2) ping6

Use an IPv6 address as a ping node.
This is equivalent to 'ping' directive in IPv4.
The syntax is also same as 'ping' except that you can specify a
interface name for the address by concatenating with '%'.

Eg.
ping6 fe80::1:1%eth0

Note: the interface name (%eth0 above) is mandatory
if you want to ping to a link-local address (by the design of IPv6).
You can omit this part if you're pinging to a global address.


3. TODO / known issues

 - Still experimental status and not completely tested yet.
   Please test yourself and give me your feedback :-).

 - the 'ping_group' equivalent is not implemented.
   (is it possible to use an anycast address instead of this?)

 - ping6: the ioctl() to set a ICMPv6 filter fails.
   It can be ignored but preferable for the optimization.

 - mcast6: the allocated memory for the private area is never freed.
   It would not be a big problem but preferable to fix. Same in 'mcast'.

            


-- 
Keisuke MORI
NTT DATA Intellilink Corporation

diff -r 5c915f1d5b7b lib/plugins/HBcomm/Makefile.am
--- a/lib/plugins/HBcomm/Makefile.am	Wed May 28 09:14:21 2008 +1000
+++ b/lib/plugins/HBcomm/Makefile.am	Wed Jun 04 13:02:19 2008 +0900
@@ -46,6 +46,7 @@ halibdir		= $(libdir)/@HB_PKG@
 halibdir		= $(libdir)/@HB_PKG@
 plugindir		= $(halibdir)/plugins/HBcomm
 plugin_LTLIBRARIES	= bcast.la mcast.la ping.la serial.la ucast.la \
+			  mcast6.la ping6.la \
 			  ping_group.la  $(HBAPING) $(OPENAIS) $(TIPC)
 
 bcast_la_SOURCES	= bcast.c
@@ -80,3 +81,12 @@ tipc_la_SOURCES   	= tipc.c
 tipc_la_SOURCES   	= tipc.c
 tipc_la_LDFLAGS  	= -export-dynamic -module -avoid-version
 tipc_la_LIBADD   	= $(top_builddir)/replace/libreplace.la
+
+mcast6_la_SOURCES	= mcast6.c
+mcast6_la_LDFLAGS	= -export-dynamic -module -avoid-version 
+mcast6_la_LIBADD	= $(top_builddir)/replace/libreplace.la
+
+ping6_la_SOURCES	= ping6.c
+ping6_la_LDFLAGS	= -export-dynamic -module -avoid-version
+ping6_la_LIBADD		= $(top_builddir)/replace/libreplace.la
+
diff -r 5c915f1d5b7b lib/plugins/HBcomm/mcast6.c
--- /dev/null	Thu Jan 01 00:00:00 1970 +
+++ b/lib/plugins/HBcomm/mcast6.c	Thu Jun 05 17:00:20 2008 +0900
@@ -0,0 +1,788 @@
+/*
+ * mcast6.c: implements hearbeat API for UDP/IPv6 multicast communication
+ *
+ * Author:  Keisuke MORI [EMAIL PROTECTED]
+ * 
+ * based on mcast.c written by the following authors.
+ * Copyright (C) 2000 Alan Robertson [EMAIL PROTECTED]
+ * Copyright (C) 2000 Chris Wright [EMAIL PROTECTED]
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ * 
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ * 
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ */
+
+#include lha_internal.h
+#include stdio.h
+#include stdlib.h
+#include unistd.h
+#include errno.h
+#include string.h
+#include ctype.h
+#include fcntl.h
+#include sys/types.h
+#include sys/socket.h
+#include netinet/in.h
+#include arpa/inet.h
+#include net/if.h
+#include sys/ioctl.h
+
+#ifdef HAVE_SYS_SOCKIO_H
+#	include sys/sockio.h
+#endif
+
+#include HBcomm.h
+ 
+#define PIL_PLUGINTYPE  HB_COMM_TYPE
+#define PIL_PLUGINTYPE_SHB_COMM_TYPE_S
+#define PIL_PLUGIN  mcast6
+#define PIL_PLUGIN_Smcast6
+#define PIL_PLUGINLICENSE	LICENSE_LGPL
+#define PIL_PLUGINLICENSEURL	URL_LGPL
+#include pils/plugin.h
+#include heartbeat.h
+
+struct mcast6_private {
+	char *  interface;  /* Interface name */
+	char *  mcastaddr;  /* multicast address for IPv6

Re: [Linux-ha-dev] [RFC] heartbeat-2.1.4

2008-05-21 Thread Keisuke MORI

Hi,

How is the 2.1.4 release going?
Will it be released soon? or any trouble with it?

I look forward to see it!

Thanks,

Lars Marowsky-Bree [EMAIL PROTECTED] writes:

 Hi all,

 the Linux-HA project is undergoing some changes, as you've noticed. Not
 all of them have gone as well as expected, and it hasn't stabilized
 yet.

 Under guidance with Alan, the project members have met and decided to
 change the governance of the project in the future. This will be
 announced in more detail soon, stay tuned.

 We also want to likely make some further changing to the package layout,
 and understand that users, admins and distro maintainers dislike it when
 we do that, so we don't want to make it a habit.

 We recognize the needs of our users (I hope!) to receive timely updates,
 and thus have decided to go ahead and propose releasing one more 2.1.4
 (following the 2.1.x package layout) as the last release of that branch
 before the restructuring kicks in completely.

 (When we decided to split off pacemaker, we didn't expect that this
 would cause the upstream Linux-HA project to cease releasing
 completely, and unfortunately there's been little discussion on the
 lists regarding this since.)

 For SLE10 SP2, it was already too late to change the package layout, so
 I've been backporting changes (which is quite easy with Mercurial) from
 the Pacemaker project, the GUI, and heartbeat-dev into the 2.1.x
 codebase, and done a fair amount of testing on x86, x86-64, s390x.

 However, I've been mostly focused on cherry picking what we (as in,
 Novell) needed, so in particular the packaging for non-SUSE dists is
 somewhat neglected in this version.

 If other distro maintainers would please help me with fixing up the
 packaging, and more community members would pound on it, I would really
 appreciate this.

 My proposal would be to release 2.1.4 by the end of next week
 (2008-04-18). (Mostly because after that I go on vacation ;-)

 I know this is a highly condensed schedule and doesn't follow any
 proper release methodology. The reasons for this in bullet points:

 - It's been too long since the last official gasp from the heartbeat
   project. The code we have is clearly better than 2.1.3, and we should
   get it to our users ASAP.

 - Novell has done a fair amount of testing on it already. The code is
   good (as in much better than 2.1.3), except the packaging.

 - The new governance will eventually decide on a new release methodology
   for the Linux-HA project, I expect, but this will take some more
   weeks, and I don't want to delay releasing even further.

 So, with the above reasoning, I'm volunteering myself - and hijacking
 the vacuum, I acknowledge - to do the 2.1.4 release, as the current
 split hasn't been adopted everywhere yet, 2.1.x is defunc, and our user
 community appears to need it now and not in several months.

 I'd plan on building the packages for all dists via OBS, if nobody holds
 any strong objections and update the DownloadSoftware page after we
 agree that the 2.1.4 release is good. 

 And of course would be much approving of distro maintainers pulling it
 into their official distro repositories too!

 So, that said, I've pushed my proposed code to
 http://hg.linux-ha.org/lha-2.1/. It, for reasons outlined above, likely
 doesn't build yet (because the in-tree packaging is broken), but I
 wanted to share the scope of changes with you. 

 As a further point of reference, I'm attaching the SLES changes section
 to this mail. (bnc# refers to bugzilla.novell.com.)


 Let me emphasize strongly that I really don't want to step on anyone's
 toes, or rush the new governance board, but only fill the current void
 until that is actually operational and has settled down, as I suggest
 our users need it.


 Please comment.


 Regards,
 Lars

 -- 
 Team lead Kernel, SuSE Labs, Research and Development
 SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde


 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Re: [RFC] heartbeat-2.1.4

2008-04-22 Thread Keisuke MORI

Hi,

Andrew Beekhof [EMAIL PROTECTED] writes:

 On Wed, Apr 16, 2008 at 1:31 PM, HIDEO YAMAUCHI
 [EMAIL PROTECTED] wrote:
 Hi Andrew,


   I asked for the right function but the wrong frame number - I should
   have asked for frame 2.  Sorry :(

  (gdb) frame 2
  #2  0x00416c74 in stop_recurring_action_by_rsc (key=0x755f60, 
 value=0x755f40,
  user_data=0x545a10) at lrm.c:1442
  1442if(op-interval != 0  safe_str_eq(op-rsc_id, rsc-id)) {
  (gdb) print *rsc
  Variable rsc is not available.
  (gdb) print *op
  No symbol op in current context.

  Is what or my operation a mistake?

 Looks like gcc is being too clever for it's own good (by optimizing
 away some of the variables) :-(

 Can you try the following patch please?

 diff -r be12cb83cd2d crmd/lrm.c
 --- a/crmd/lrm.c  Wed Apr 16 10:46:59 2008 +0200
 +++ b/crmd/lrm.c  Wed Apr 16 15:02:16 2008 +0200
 @@ -1451,7 +1451,7 @@ stop_recurring_action_by_rsc(gpointer ke
  {
   lrm_rsc_t *rsc = user_data;
   struct recurring_op_s *op = (struct recurring_op_s*)value;
 - 
 + crm_info(op-rsc=%s (%p), rsc=%s (%p), crm_str(op-rsc_id),
 op-rsc_id, crm_str(rsc-id), rsc-id);
   if(op-interval != 0  safe_str_eq(op-rsc_id, rsc-id)) {
   cancel_op(rsc, key, op-call_id, FALSE);
   }



I think I found the cause of this issue.
I attached the additional log with your patch (a bit different though)
and the stacktrace.

Here's my observation:

 - An element of pending_ops is removed at lrm.c:L497
 - It is called inside from g_has_table_foreach() at L1475
 - This is violating the usage of g_has_table_foreach() according
   to the glib manual.
 - Therefore the iteration can not proceed correctly and would
   try to refer to a removed element.

http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c
(...)
946 /* not doing this will block the node from shutting down */
947 g_hash_table_remove(pending_ops, key);
(...)
1475g_hash_table_foreach(pending_ops, stop_recurring_action_by_rsc, 
rsc);

http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach
(...)
The hash table may not be modified while iterating over it (you can't 
add/remove items).


I also attached my suggested patch, although I can not guarantee
the correctness but just to show you the idea.

Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation



ms-additional-log-20080422.tar.gz
Description: ms-additional-log-20080422.tar.gz
diff -r 333aef5bd4ed -r 36c0fd90691d crm/crmd/lrm.c
--- a/crm/crmd/lrm.c	Thu Apr 17 18:55:57 2008 +0200
+++ b/crm/crmd/lrm.c	Tue Apr 22 17:48:47 2008 +0900
@@ -943,8 +943,9 @@ cancel_op(lrm_rsc_t *rsc, const char *ke
 		if(key  remove) {
 			delete_op_entry(NULL, rsc-id, key, op);
 		}
+		/* return FALSE to be removed from pending_ops */
 		/* not doing this will block the node from shutting down */
-		g_hash_table_remove(pending_ops, key);
+		return FALSE;
 	}
 	
 	return TRUE;
@@ -954,15 +955,20 @@ gboolean cancel_done = FALSE;
 gboolean cancel_done = FALSE;
 lrm_rsc_t *cancel_rsc = NULL;
 
-static void
+static gboolean
 cancel_action_by_key(gpointer key, gpointer value, gpointer user_data)
 {
 	struct recurring_op_s *op = (struct recurring_op_s*)value;
 	
 	if(safe_str_eq(op-op_key, cancel_key)) {
 		cancel_done = TRUE;
-		cancel_op(cancel_rsc, key, op-call_id, TRUE);
-	}
+		if (!cancel_op(cancel_rsc, key, op-call_id, TRUE)) {
+			/* return TRUE to be removed from pending_ops */
+			/* when the cancellation failed */
+			return TRUE;
+		}
+	}
+	return FALSE;
 }
 
 static gboolean
@@ -976,7 +982,7 @@ cancel_op_key(lrm_rsc_t *rsc, const char
 
 	CRM_CHECK(key != NULL, return FALSE);
 	
-	g_hash_table_foreach(pending_ops, cancel_action_by_key, NULL);
+	g_hash_table_foreach_remove(pending_ops, cancel_action_by_key, NULL);
 
 	if(cancel_done == FALSE  remove) {
 		crm_err(No known %s operation to cancel, key);
@@ -1433,15 +1439,21 @@ send_direct_ack(const char *to_host, con
 	free_xml(update);
 }
 
-static void
+static gboolean
 stop_recurring_action_by_rsc(gpointer key, gpointer value, gpointer user_data)
 {
 	lrm_rsc_t *rsc = user_data;
 	struct recurring_op_s *op = (struct recurring_op_s*)value;
 	
 	if(op-interval != 0  safe_str_eq(op-rsc_id, rsc-id)) {
-		cancel_op(rsc, key, op-call_id, FALSE);
-	}
+		if (!cancel_op(rsc, key, op-call_id, FALSE)) {
+			/* return TRUE to be removed from pending_ops */
+			/* when the cancellation failed */
+			return TRUE;
+		}
+	}
+
+	return FALSE;
 }
 
 void
@@ -1472,7 +1484,7 @@ do_lrm_rsc_op(lrm_rsc_t *rsc, const char
 	   || crm_str_eq(operation, CRMD_ACTION_DEMOTE, TRUE)
 	   || crm_str_eq(operation, CRMD_ACTION_PROMOTE, TRUE)
 	   || crm_str_eq(operation, CRMD_ACTION_MIGRATE, TRUE)) {
-		g_hash_table_foreach(pending_ops, stop_recurring_action_by_rsc, rsc);
+		g_hash_table_foreach_remove(pending_ops, stop_recurring_action_by_rsc, rsc);
 	}
 	
 	/* now do the op

Re: [Linux-ha-dev] Re: [RFC] heartbeat-2.1.4

2008-04-22 Thread Keisuke MORI

Andrew Beekhof [EMAIL PROTECTED] writes:
(snip)
  Here's my observation:

   - An element of pending_ops is removed at lrm.c:L497
   - It is called inside from g_has_table_foreach() at L1475
   - This is violating the usage of g_has_table_foreach() according
to the glib manual.
   - Therefore the iteration can not proceed correctly and would
try to refer to a removed element.

 Turns out that the Stateful resource in CTS was never getting promoted.
 Once I fixed this, I was able to trigger the bug too (in the last few 
 minutes).

A weird thing is that, it is not reproducable on every environments.

As far as we've tested:
 - it _always_ happens on a RedHat 4 environment.
 - it has _never_ happened on a RedHat 5 environment.

I'm not sure if it's the only difference but
possibly the difference of the glib versions may be related to 
the behavior.



 Thanks for your diagnosis and the patch, you've certainly saved me some time 
 :-)


  http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c
  (...)
  946 /* not doing this will block the node from shutting down */
  947 g_hash_table_remove(pending_ops, key);
  (...)
  1475g_hash_table_foreach(pending_ops, 
 stop_recurring_action_by_rsc, rsc);

  
 http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach
  (...)
  The hash table may not be modified while iterating over it (you can't 
 add/remove items).


  I also attached my suggested patch, although I can not guarantee
  the correctness but just to show you the idea.

  Thanks,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Process monitor daemon (revised)

2008-04-14 Thread Keisuke MORI

Hi Lars,

Thank all of you for reviewing and making a suggestion.

I think I understand your point as the Heartbeat architecture,
but it would require re-writing the code almost all ;-)

I will discuss with my colleagues about what we can do for procd
as the next step.



Lars Marowsky-Bree [EMAIL PROTECTED] writes:

 On 2008-02-27T20:39:13, Keisuke MORI [EMAIL PROTECTED] wrote:

 Hi Keisuke-san,

 thanks for your patch and contribution. I have to apologize in the name
 of everyone for the late feedback.

 I really appreciate the idea of monitoring processes directly, and
 receiving async failure notifications to reduce fail-over times.

 I have just discussed this with Dejan and Andrew, and we think that the
 best path forward, alas necessary before inclusion, is to

 - Make procd independent of Pacemaker. It should talk only to the RAs
   and the LRM.

 - RAs should sign in with it for the processes they want monitored,
   instead of listing the processes in the procd configuration section
   (means it gets decoupled from the CIB further). The RAs could write a
   record to /var/run/heartbeat/procd/resource-id, for example. 
   
   The RAs would add/remove the required processes on start/promote or
   demote/stop. (So procd itself would not need to be master-slave.)

   I'm afraid that having users manually specify process lists in the CIB
   really is not workable - the users will not be able to get this
   right.

 - Instead of respawning procd, there should be a resource agent which
   starts/stops (and monitors!) procd. You already have one, but why
   doesn't it go into resources/OCF/ ?

We've only thought to use procd by respawning so far
and we didn't have a such RA yet.



 - procd should talk to the LRM to insert a fake failed resource
   action, which would then cause the CRM/PE to handle the resource as
   failed and initiate recovery. (This is not currently possible with the
   LRM client library; you could exec crm_resource -F, which would mean
   you no longer have a build-time dependency on the CRM.)

 - This would have the advantage of decoupling procd from pacemaker as
   well as heartbeat. It could be included with the LRM/RA package build,
   and possibly be useful with other cluster managers too.

 I think all that would help simplify the code.


 +#define RSCID_LEN  128 /* ref. include/lrm/lrm_api.h */
 +#define MAX_PID_LEN256 /* ref. lrm/lrmd/lrmd.h */
 +#define MAX_LISTEN_NUM 10 /* ref. lib/clplumbing/ipcsocket.c */

 If you're referencing from other include files, please do include the
 includes as to avoid diverging header definitions.


Right.


Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] brocade fencing anyone?

2008-02-25 Thread Keisuke MORI

Hi,

Johan Hoeke [EMAIL PROTECTED] writes:
 Dejan Muhamedagic wrote:
 Hi,
 
 On Fri, Feb 22, 2008 at 05:59:23PM +0100, Johan Hoeke wrote:
 LS,

 Anybody here using some kind of brocade fencing with heartbeat like
 RedHat offers in its cluster software? I found this reference:
 http://linux.die.net/man/8/fence_brocade

 It turns out to be the attached perl script. Got it from
 http://mirror.centos.org/centos/4/csgfs/i386/RPMS/fence-1.32.50-2.el4.centos.1.i686.rpm

 Would it be possible to use this as an external stonith script?
 
 No, because stonith is about fencing nodes and this would be
 fencing resources.

 Point taken!

Fencing nodes by isolating I/O is very interesting idea though.


   I think that right now the only way would be to implement an RA
 which would fence resource. That's what Junko Ikeda and NTT
 people did:
 
 http://lists.linux-ha.org/pipermail/linux-ha/2007-October/028388.html

 Good stuff, thanks you for pointing it out. And mr Ikeda and NTT for
 sharing!

 
 I don't know why their code was not included in Heartbeat. This
 is an important issue, so it should get more attention.

 Agreed!

What can I do to make it included in Heartbeat? (we call it as SF-EX)

I and my colleagues would be really happy if it is included
as a standard component in Heartbeat and available for everyone.


BTW, she is Ms. Ikeda. ;-)

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Basic SNMP/Linux HA question

2008-01-31 Thread Keisuke MORI

Hi,

Mike Toler [EMAIL PROTECTED] writes:

 I don't know if I'm an idiot, have failed to compile the load correctly,
 or just don't have the secret handshake down correctly, but either way,
 I am unable to query any statistics from Linux HA using snmp.

 I've read the README file in the snmp-subagent directory in the
 source, and I *THINK* I've followed the directions. 

 1.Install Net-SNMP 
 2.Verify snmp queries to server work. (I have setup 'pass' scripts
 for DRBD and NFS to query counters, so net-snmp is working on my
 system).
 3.download Linux-HA source
 4.run ./ConfigureMe configure --enable-snmp-subagent
 5.make and make install
 6.Start Linux-HA using service heartbeat start

 What am I missing  Do I need to specifically start 'hbagent'
 somewhere?  Am I missing something from the /etc/ha.d/ha.cf file?  Does
 anyone have any debugging tips that I can use to try and isolate where
 my disconnect is?

Do you have added a line like this in your ha.cf?

respawn root /usr/lib/heartbeat/hbagent -r 5

(I realized now that this is not mentioned in the README clearly.)


Providing your ha.cf and the logs would be more helpful.

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] 2.1.3 RPM names

2007-12-25 Thread Keisuke MORI

Hi,

Thank you very much for the great Christmas present for us ;-)

I've noticed that the RPM names in 2.1.3 for i386
have been changed on the official download web site.

http://linux-ha.org/download/index.html#2.1.3
pils-2.1.3-1.fc7.i386.rpm   
stonith-2.1.3-1.fc7.i386.rpm

Is there any reason why the change was made?

Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] IPaddr: netmask or cidr_netmask?

2007-12-18 Thread Keisuke MORI

Lars,

Thank you for your answer.

I will use cidr_netmask from now on.

Lars Marowsky-Bree [EMAIL PROTECTED] writes:
 On 2007-12-14T14:54:38, Keisuke MORI [EMAIL PROTECTED] wrote:

 IPaddr RA has two kinds of parameter to specify the netmask:
 netmask and cidr_netmask.
 
 Which one is officially supported and recommended to use?

 The fact that only the cidr_netmask is in the metadata is a pretty big
 clue. ;-)

Yes. I should have believed documents. ;-)


Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] IPaddr: netmask or cidr_netmask?

2007-12-13 Thread Keisuke MORI

Hi,

IPaddr RA has two kinds of parameter to specify the netmask:
netmask and cidr_netmask.

Which one is officially supported and recommended to use?


From the mail archive below, I thought that cidr_netmask was wrong,
http://www.gossamer-threads.com/lists/linuxha/dev/36035#36035

but on the other hand, the GUI can handle only cidr_netmask
(because the IPaddr meta-data contains only cidr_netmask).

As long as I tried both works the same; both can take either
CIDR form (e.g. 24) or the dot-notation (255.255.255.0),
but I want to always use the official one.

Thanks,
-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-ha-dev] [PATCH] SNMP subagent syslog fix

2007-12-11 Thread Keisuke MORI

Hi,

The attached patch fixes the SNMP subagent so that
it obeys the syslog policy of heartbeat; 1) use logd if it's
enabled. 2) the default syslog facility is taken from the
configure option as well as lrmd, mgmtd, etc.

The current SNMP subagent produces its logs always into LOG_USER
which is hard-coded. This is not good.

This patch can be applied solely
(i.e. independent from the SNMP extention for V2),
so please consider including this patch into 2.1.3.

Regards,

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
diff -r 0890907b816f snmp_subagent/hbagent.c
--- a/snmp_subagent/hbagent.c	Tue Dec 11 01:10:53 2007 +0100
+++ b/snmp_subagent/hbagent.c	Tue Dec 11 17:08:47 2007 +0900
@@ -562,7 +562,10 @@ init_heartbeat(void)
 	hb = NULL;
 
 	cl_log_set_entity(lha-snmpagent);
-	cl_log_set_facility(LOG_USER);
+	cl_log_set_facility(HA_LOG_FACILITY);
+
+	/* Use logd if it's enabled by heartbeat */
+	cl_inherit_logging_environment(0);
 
 	hb = ll_cluster_new(heartbeat);
 
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] SNMP subagent syslog fix

2007-12-11 Thread Keisuke MORI

Hi,

Dejan Muhamedagic [EMAIL PROTECTED] writes:
 Hi,

 On Tue, Dec 11, 2007 at 08:26:52PM +0900, Keisuke MORI wrote:
 Hi,
 
 The attached patch fixes the SNMP subagent so that
 it obeys the syslog policy of heartbeat; 1) use logd if it's
 enabled. 2) the default syslog facility is taken from the
 configure option as well as lrmd, mgmtd, etc.
 
 The current SNMP subagent produces its logs always into LOG_USER
 which is hard-coded. This is not good.
 
 This patch can be applied solely
 (i.e. independent from the SNMP extention for V2),
 so please consider including this patch into 2.1.3.

 Thanks for the patch. I can recall vaguely seeing the problem,
 perhaps I even filed a bugzilla for it. Or something. My memory
 isn't in the best shape today.


By grep'ing the source, there are still some hard-coded LOG_USER.
Do they also need to be fix?

In particular, send_arp.c, cl_status.c, xml_diff.c, lrmadmin.c
are visible to end users, I think.



$ hg id
885e02e00632 tip
$ grep -R cl_log_set_facility * | grep LOG_USER
crm/pengine/ptest.c:cl_log_set_facility(LOG_USER);
crm/admin/xml_diff.c:   cl_log_set_facility(LOG_USER);
fencing/test/apitest.c: cl_log_set_facility(LOG_USER);
heartbeat/libnet_util/send_arp.c:cl_log_set_facility(LOG_USER);
lib/hbclient/api_test.c:cl_log_set_facility(LOG_USER);
lib/clplumbing/netstring_test.c:cl_log_set_facility(LOG_USER);
lrm/admin/lrmadmin.c:   cl_log_set_facility(LOG_USER);
lrm/test/apitest.c: cl_log_set_facility(LOG_USER);
membership/ccm/ccm_testclient.c:cl_log_set_facility(LOG_USER);
telecom/apphbd/apphbd.c:cl_log_set_facility(LOG_USER);
telecom/apphbd/apphbtest.c: cl_log_set_facility(LOG_USER);
telecom/recoverymgrd/recoverymgrd.c:cl_log_set_facility(LOG_USER);
tools/cl_status.c:  cl_log_set_facility(LOG_USER);


-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] [Q] group resources and unmanaged status

2007-12-07 Thread Keisuke MORI

Andrew,

Can I ask a question about the internal status of the PE?

My SNMP subagent code is using cluster_status(pe_working_set_t)
to analyze the current status of resources like crm_mon.

When a parent resource(group/clone/master) is unmanaged,
'running_on' and 'allowed_nodes' member of resource_t gets NULL.
Is this an expected value? or any intention about this?


If the parent resource was managed, those members have node values
according to its children.
In the case of a child resource(primitive), those members always
contain node values no matter if it's in managed or unmanaged.


My SNMP subagent has a minor problem for displaying the status
of an unmanaged group resource and I'm now looking into how
should I fix it.


Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: AW: [Linux-ha-dev] Call for testers: 2.1.3

2007-12-07 Thread Keisuke MORI

Spindler Michael [EMAIL PROTECTED] writes:

 Hi,

This problem has been solved. My packaging box didn't have all 
necessary packages for building GUI rpm. When I added 
 them it was 
able to build haclinet (GUI) and that find-lang.sh tool 
 worked fine.
   
I didn't find the problem with pegasus on my CentOS 5.0 
 but I have 
32 bit version, and the problem was reported for 64 bit.
  
  
   OK.
  
   So, this step should only be included if --enable-mgmt, I guess?
  
  
  Right. It establish language settings for the GUI, so it's 
 not needed 
  if GUI isn't needed.
 
 We are trying to build it on RedHat(Red Hat Enterprise Linux 
 ES release 4 (Nahant Update 4)), and a problem remains before us.
 Please check Mori-san's patch again.
 http://developerbugs.linux-foundation.org//attachment.cgi?id=1109
 
 -if test x${CIMOM} = x; then
 -if test x${CIMOM} = x; then 
 -AC_CHECK_PROG([CIMOM], [cimserver], [pegasus])
 +if test x${enable_cim_provider} = xyes; then   # 
 maybe, here #
 +if test x${CIMOM} = x; then
 +if test x${CIMOM} = x; then
 
 I attached the configure.log
 

 fyi: I was able to build the rpms on RedHat AS 4 without any problems.

The error above should occur only when tog-pegasus packages has
been installed on your RedHat.

I thought that tog-pegasus is installed by default on RedHat ES 4...

-- 
Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [Q] group resources and unmanaged status

2007-12-07 Thread Keisuke MORI

Andrew Beekhof [EMAIL PROTECTED] writes:
 On Dec 7, 2007, at 11:56 AM, Keisuke MORI wrote:

 Andrew,

 Can I ask a question about the internal status of the PE?

 My SNMP subagent code is using cluster_status(pe_working_set_t)
 to analyze the current status of resources like crm_mon.

 When a parent resource(group/clone/master) is unmanaged,
 'running_on' and 'allowed_nodes' member of resource_t gets NULL.
 Is this an expected value?

 I thought that group/clone/master always had NULL... since they can be
 running on more than one node (especially clone and m/s resources)

Judging from the output of the SNMP agent, two pair of a clone and a
primitive are observed and each of the parent clone has the node
which its child primitive is running on.
Maybe my code is doing something wrong, I'll check it again.



 I recall also doing something special for unmanaged resources but I
 can probably change that behavior for you.

 that said, it would be better to use the recently added API call:
   node_t *(*location)(resource_t *, GListPtr*, gboolean);

 eg.
node_t *native_location(resource_t *rsc, GListPtr *list, gboolean
 current)


Thanks, I will look into this.

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] Call for testers: 2.1.3

2007-12-07 Thread Keisuke MORI

Hi,

The problem is reported in the bugzilla #1662.
Please see my comment and a patch at comment #6 and #8.

http://developerbugs.linux-foundation.org/show_bug.cgi?id=1662#c6


Thanks,

Keisuke MORI

Dejan Muhamedagic [EMAIL PROTECTED] writes:

 Hi,

 On Thu, Dec 06, 2007 at 10:54:36AM +1100, Amos Shapira wrote:
 On 06/12/2007, Alan Robertson [EMAIL PROTECTED] wrote:
  We are in the final weeks of testing for release 2.1.3 - which has been
  delayed to the week of Dec 19.
 
 Trying to do make rpm on CentOS 5 I get the following error:
 
 gmake[1]: Leaving directory
 `/root/Downloads/heartbeat/Heartbeat-Testing-d8d7ce11fbad/contrib'
 find heartbeat-2.1.3 -type d ! -perm -777 -exec chmod a+rwx {} \; -o \
   ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \
   ! -type d ! -perm -400 -exec chmod a+r {} \; -o \
   ! -type d ! -perm -444 -exec /bin/sh
 /root/Downloads/heartbeat/Heartbeat-Testing-d8d7ce11fbad/install-sh -c
 -m a+r {} {} \; \
 || chmod -R a+r heartbeat-2.1.3
 tardir=heartbeat-2.1.3  /bin/sh
 /root/Downloads/heartbeat/Heartbeat-Testing-d8d7ce11fbad/missing --run
 tar chof - $tardir | GZIP=--best gzip -c heartbeat-2.1.3.tar.gz
 { test ! -d heartbeat-2.1.3 || { find heartbeat-2.1.3 -type d ! -perm
 -200 -exec chmod u+w {} ';'  rm -fr heartbeat-2.1.3; }; }
 /usr/bin/rpmbuild -ta heartbeat-2.1.3.tar.gz /dev/null;
 error: Macro %CMPI_PROVIDER_DIR has empty body
 sh: line 0: fg: no job control
 error: Failed build dependencies:
 pegasus is needed by heartbeat-2.1.3-2.x86_64
 make: *** [rpm] Error 1

 Funny, can't find pegasus neither on suse nor on debian. It's got
 do with CIM though.


 There is no such package pegasus in CentOS 5. I tried installing
 tog-pegasus but I'm not sure it's even related and it didn't help.

 http://rpmfind.net/linux/rpm2html/search.php?query=tog-pegasus-cimserver

 That one? You got the same error? Care to post your config.log?

 Thanks,

 Dejan


 
 
  Please help us test this upcoming new release!
 
 Would love to if I can sneak in tests in my schedule - especially if
 it'll help me get heartbeat 2 running on my CentOS 5 Xen guests.
 
 Thanks for all your work.
 
 --Amos
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] [patch] Fix potential memory leaks in the HB client library

2007-11-18 Thread Keisuke MORI

Dejan,

Dejan Muhamedagic [EMAIL PROTECTED] writes:
 Hi,

 On Tue, Oct 30, 2007 at 08:53:54PM +0900, Keisuke MORI wrote:
 Hi,
 
 I've been testing the heartbeat with valgrind enabled,
 and found that it reported a couple of leaks which are
 in the heartbeat API client library.
 
 I'm submitting my proposed patch to fix them, so
 could somebody please review it and the correctness?
 
 In my understanding, these leaks are not so serious 
 because the leaks only happens when the heartbeat exits,
 but it may be a problem if a HB client does signon()/signoff()/delete()
 repeatedly in a single process.
 
(omit)

 Your patch is in this changeset:

 http://hg.linux-ha.org/dev/rev/84e6520764bf

Thank you for taking care of it.


 BTW, do you have hg write access?

No, I don't.
Is there any authorization procedure to gain the access?


Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Proposal SNMP subagent extention for CRM resources

2007-11-15 Thread Keisuke MORI

Dejan,

Dejan Muhamedagic [EMAIL PROTECTED] writes:
 Hi,

 On Fri, Nov 09, 2007 at 03:12:29PM +0900, Keisuke MORI wrote:
 Hello all,
 
 I would like to propose an extention for the SNMP hbagent 
 so that it can handle the CRM resource information provided by
 Heartbeat Version 2.
 
 The attached patch is my proposal implementation.
 The patch is already tested and debugged by our team
 with using valgrind. 
 
 
 I would appreciate any comments and suggestions
 to make it more usable for everybody in the community.
 

 I'll take a look at the code.

 Thanks a lot for the contribution.

Thank you for taking a look at it.

Please advise me if there's any suspicious code.
We'll correct it.

Thanks,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] [PATCH] Proposal SNMP subagent extention for CRM resources

2007-11-15 Thread Keisuke MORI

Andrew,

Thank you for your comments.

Andrew Beekhof [EMAIL PROTECTED] writes:
 I would appreciate any comments and suggestions
 to make it more usable for everybody in the community.

 you might want to include so other data (such as failed, is_managed,
 etc) in the trap

 also, it might be an idea to include a back-link for resources that
 are in groups/clones/etc


 quote
 NOTE :   This trap is sent only when the resource operation
 succeeds.
Concretely, the extended hbagent gets the cib information
 when it
changes, and parse it. And if the rc_code of the operation
 (like
CRMD_ACTION_START) is 0, then the hbagent sends a trap.
 /quote

 it worries me a little that you only send the trap when rc=0...
 you don't want to know about failed actions?

The intention of the trap is letting you know the current status
of resources (such as running/stopped/etc.), and not the result
of each operations. This is similar to LHAResourceGroupStatus
object which is the resource status V1.
(The note above is just for an explanation that how it's implemented.)

But, yeah, your point is right and it might be also useful.
Does anybody want to use this information?

We're considering to extend it more, but before we proceed
I would like to design the new MIB definition first.
Does anyone have comments for this?

I would like to hear more, particularly about what kind of
information is needed from whom really wants to use the SNMP agent.


Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] snmp notification v2 cluster

2007-11-08 Thread Keisuke MORI

Hi,

I posted my proposal patch of the SNMP hbagent extention for
V2 resources to the development mailing list.

http://www.gossamer-threads.com/lists/linuxha/dev/43676

Please take a look and
I would appreciate if you could give me any comments.

Thanks,

Keisuke MORI [EMAIL PROTECTED] writes:
 Abraham, and everyone in the list,

 Our company is now working on adding a feature to the hbagent
 so that it notifies you when the V2 resource status is changed.

 Our first implementation is almost done actually, so
 I will post it to here in a week as soon as we're ready.

 I hope it helps for the community.
 Thanks,


 Abraham Iglesias [EMAIL PROTECTED] writes:

 It sounds good. I am really surprised because i didn't expect so many
 replies :D . Thanks to everyone. I will start trying the hbagent to
 see which capabilities are implemented at this moment. Any suggestion
 will be wellcome :)

 -bram

 Michael Schwartzkopff escribió:
 Am Mittwoch, 31. Oktober 2007 13:23 schrieb Peter Clapham:
   
 Linux-HA comes with a full SNMP subagent with its own MIB and the
 capability
 send sending traps. nagios is only an add-on for proper alerting and
 visualization of the alerts.

 Look for SNMP or hbagent in the documentation of the sources.
   
 Regrettably not resource aware, hence an update to full v2 would be
 rather nice tm :-)
 

 Yes, but at least it count online nodes. So you get a hint if there
 is a problem. 


   


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


 -- 
 Keisuke MORI
 NTT DATA Intellilink Corporation
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] snmp notification v2 cluster

2007-10-31 Thread Keisuke MORI

Abraham, and everyone in the list,

Our company is now working on adding a feature to the hbagent
so that it notifies you when the V2 resource status is changed.

Our first implementation is almost done actually, so
I will post it to here in a week as soon as we're ready.

I hope it helps for the community.
Thanks,


Abraham Iglesias [EMAIL PROTECTED] writes:

 It sounds good. I am really surprised because i didn't expect so many
 replies :D . Thanks to everyone. I will start trying the hbagent to
 see which capabilities are implemented at this moment. Any suggestion
 will be wellcome :)

 -bram

 Michael Schwartzkopff escribió:
 Am Mittwoch, 31. Oktober 2007 13:23 schrieb Peter Clapham:
   
 Linux-HA comes with a full SNMP subagent with its own MIB and the
 capability
 send sending traps. nagios is only an add-on for proper alerting and
 visualization of the alerts.

 Look for SNMP or hbagent in the documentation of the sources.
   
 Regrettably not resource aware, hence an update to full v2 would be
 rather nice tm :-)
 

 Yes, but at least it count online nodes. So you get a hint if there
 is a problem. 


   


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


-- 
Keisuke MORI
NTT DATA Intellilink Corporation
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] [Bug 1722] First item in a group is not stopped when the second fails (and can't be migrated)

2007-10-29 Thread Keisuke MORI

Andrew Beekhof [EMAIL PROTECTED] writes:
I think that the old behavior is preferable because running
a part of the group is pointless from the service
availability's point of view and confusing to users.

 no.
 just because items later in the group fail doesn't mean the rest of
 the group should be stopped.

In the HA database cluster, the database service is typically provided
by the group like:
  Filesystem + MySQL + IP

If any of the resources failed then the database service is no
longer available. Running only Filesystem does not mean anything to
the service availability.

 consider:
   IP + Filesystem + Apache + MySQL

 Just because MySQL fails doesn't mean Apache, the Filesystem nor the
 IP should be stopped.

I can understand that, but in that case,
I think it would be more straightforward to have
two separate groups; one for the database server and the other
for the web server, because they can run independently, right?

We usually group up resources because they need to run together to
provide the service (database, web server, or whatever) as a whole,
therefore running a part of the group does not make sense.



 2) If it would not be possible, then would you tell me
what is the correct configuration to achieve the same
result as 2.1.2 in the new version?
(with correct I mean by design and unlikely change in
the near future)

I'm also wondering how anybody else configures about this behavior.

 Let me instead ask what you believe you gain by stopping the first resource.

Because it is just simple and intuitive for users.

And I believe that the most of commercial HA software would also
behave like this (at least in the typical usage).
Our costomers are considering to migrate from a commercial HA
software to heartbeat, and all of them expect to behave like
this so far.

At least it would be nice if it's able to be customized, I would think.


Regards,

Keisuke MORI
NTT DATA Intellilink Corporation

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

1 2 >

1 - 100 of 120 matches

Mail list logo