Re: [Linux-HA] [ha-wg-technical] [Pacemaker] [RFC] Organizing HA Summit 2015
Hi all, Really late response but, I will be joining the HA summit, with a few colleagues from NTT. See you guys in Brno, Thanks, 2014-12-08 22:36 GMT+09:00 Jan Pokorný jpoko...@redhat.com: Hello, it occured to me that if you want to use the opportunity and double as as tourist while being in Brno, it's about the right time to consider reservations/ticket purchases this early. At least in some cases it is a must, e.g., Villa Tugendhat: http://rezervace.spilberk.cz/langchange.aspx?mrsname=languageId=2returnUrl=%2Flist On 08/09/14 12:30 +0200, Fabio M. Di Nitto wrote: DevConf will start Friday the 6th of Feb 2015 in Red Hat Brno offices. My suggestion would be to have a 2 days dedicated HA summit the 4th and the 5th of February. -- Jan ___ ha-wg-technical mailing list ha-wg-techni...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-ha-dev] [PATCH] High: ccm: fix a memory leak when a client exits
Hi, 2013/9/6 Lars Ellenberg lars.ellenb...@linbit.com: On Wed, Sep 04, 2013 at 08:16:44PM +0900, Keisuke MORI wrote: Hi, The attached patch will fix a memory leak in ccm that occurs whenever a ccm client disconnect. Thank you. This may introduce double free for client_delete_all() now? No, I do not think it does. When an individual client exits, client_delete() removes the ipc object from ccm_hashclient and hence client_detete_all() will never call client_destroy() for the same ipc object again. The valgrind result did not complain regarding to this either. Am I missing your point? All this aparently useless indirection seems to be from a time when client_destroy explicitly called into a -ops-destroy virtual function. Which it no longer does. So I think dropping the explicit calls to client_destroy, as well as the other then useless indirection functions, but instead do a g_hash_table_new_full with g_free in client_init would be the way to go. It might be doable, but I do not think it is necessary to rewrite the code for fixing this issue. Thanks, Could you have a look? It would not affect to most of the installations because only crmd and cib are the client, but if you run any ccm client such as crm_node command periodically, ccm will increase its memory consumption. The valgrind outputs are also attached as the evidence of the leakage and the fix by the patch; The results are taken after crm_node command is executed 100 times. There still exists some definitely / indirectly / possibly lost , but as long as I've investigated they are all allocated only at the invocation time and not considered as a leak. Double checks are welcome. Thanks, -- Keisuke MORI Cheers, Lars -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH] High: ccm: fix a memory leak when a client exits
Hi, The attached patch will fix a memory leak in ccm that occurs whenever a ccm client disconnect. It would not affect to most of the installations because only crmd and cib are the client, but if you run any ccm client such as crm_node command periodically, ccm will increase its memory consumption. The valgrind outputs are also attached as the evidence of the leakage and the fix by the patch; The results are taken after crm_node command is executed 100 times. There still exists some definitely / indirectly / possibly lost , but as long as I've investigated they are all allocated only at the invocation time and not considered as a leak. Double checks are welcome. Thanks, -- Keisuke MORI ccm-memleak.patch Description: Binary data valgrind-NG-HB305.log Description: Binary data valgrind-OK-HB305patched.log Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH][heartbeat] skip unnecessary waiting for lrmd invocation when LRMD_MAX_CHILDREN is set
Hi, The heartbeat init.d script waits for the lrmd invocation when LRMD_MAX_CHILDREN is set and it does not return until initdead time is past (or 40s timeout) unless you invoke all the nodes in the cluster at a same time. But this should not be necessary because the recent version of lrmd (cluster-glue-1.0.10 or later) looks at the environment variable by itself. The attached patch will improve so that the init.d script returns immediately as usual even if the variable is set. Regards, -- Keisuke MORI lrmd-max-children.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] IPaddr2 support of ipv6
Hi, 2013/3/29 David Vossel dvos...@redhat.com: Hi, It looks like ipv6 support got added to the IPaddr2 agent last year. I'm curious why the metadata only advertises that the 'ip' option should be an IPv4 address. ip (required): The IPv4 address to be configured in dotted quad notation, for example192.168.1.1. Is this just an oversight? If so this patch would probably help. https://github.com/davidvossel/resource-agents/commit/07be0019a50b96743536ab50727b56d9175bf95f Ah, yes that's just an oversight. Thank you for pointing out. Would you submit your patch as a pull request? Thanks, -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Hi Nick, Could you privide which version of resource-agents you're using? Prior to 3.9.2, IPv6addr requires a static IPv6 address with the exactly same prefix to find out an apropriate nic; so you should have statically assigned 2600:3c00::34:c003/116 on eth0 for example. As of 3.9.3, it has relaxed and the specified nic is always used no matter if the prefix does not match; so it should just work. (at least it works for me) Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a virtual IPv6 address, which is brand new and I would prefer this because it uses the standard ip command. Thanks, 2013/3/25 Nick Walke tubaguy50...@gmail.com: This the correct place to report bugs? https://github.com/ClusterLabs/resource-agents Nick On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.dewrote: Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the resource agent. Manually adding and removing works: (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/116 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591887sec preferred_lft 604687sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0 Nick, you can do the following things to resolve this: - Hunt down the bug and fix it or let someone else do it for you - Use another netmask, if possible (fighting the symptoms instead of resolving the root cause) - Write your own resource agent (fighting the symptoms instead of resolving the root cause) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
2013/3/25 Nick Walke tubaguy50...@gmail.com: Looks like 3.9.2-5. So I need to statically assign the address I want to use before using it with IPv6addr? Yes. On Mar 25, 2013 3:44 AM, Keisuke MORI keisuke.mori...@gmail.com wrote: Hi Nick, Could you privide which version of resource-agents you're using? Prior to 3.9.2, IPv6addr requires a static IPv6 address with the exactly same prefix to find out an apropriate nic; so you should have statically assigned 2600:3c00::34:c003/116 on eth0 for example. As of 3.9.3, it has relaxed and the specified nic is always used no matter if the prefix does not match; so it should just work. (at least it works for me) Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a virtual IPv6 address, which is brand new and I would prefer this because it uses the standard ip command. Thanks, 2013/3/25 Nick Walke tubaguy50...@gmail.com: This the correct place to report bugs? https://github.com/ClusterLabs/resource-agents Nick On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the resource agent. Manually adding and removing works: (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/116 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591887sec preferred_lft 604687sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0 Nick, you can do the following things to resolve this: - Hunt down the bug and fix it or let someone else do it for you - Use another netmask, if possible (fighting the symptoms instead of resolving the root cause) - Write your own resource agent (fighting the symptoms instead of resolving the root cause) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-ha-dev] [PATCH] crmsh: fix in python version checking
Hi Dejan, Here is a trivial patch for crmsh. It is totally harmless because it affects only when the python version is too old and crmsh would abort anyway :) Thanks, -- Keisuke MORI python-version.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] announcement: resource-agents release candidate 3.9.5rc1
Hi, Does IPaddr2 need to support 'eth0:label' format in a single 'nic' parameter when you want to use an interface label? I thought it does't from the mata-data description of 'nic' but it looks conflicting with the 'iflabel' description; nic: Do NOT specify an alias interface in the form eth0:1 or anything here; rather, specify the base interface only. If you want a label, see the iflabel parameter. iflabel: (omit) If a label is specified in nic name, this parameter has no effect. The latest IPaddr2 (findif.sh version) would reject it as an invalid nic has specified. If we do need to support it I will submit a patch for this. Thanks, 2013/1/30 Dejan Muhamedagic de...@suse.de: Hello, The current resource-agents repository has been tagged v3.9.5rc1. It is mainly a bug fix release. The full list of changes for the linux-ha RA set is available in ChangeLog: https://github.com/ClusterLabs/resource-agents/blob/v3.9.5rc1/ChangeLog We'll allow a week for testing. The final release is planned for Feb 6. Many thanks to all contributors! Best, The resource-agents maintainers ___ ha-wg-technical mailing list ha-wg-techni...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] announcement: resource-agents release candidate 3.9.5rc1
Hi Dejan, 2013/1/31 Dejan Muhamedagic de...@suse.de: The latest IPaddr2 (findif.sh version) would reject it as an invalid nic has specified. If we do need to support it I will submit a patch for this. I'd rather just update the iflabel description. Me too, but After all, normally one doesn't need to specify the nic. But you'll get different preferences from different people :) However, it seems to be a regression, so we should probably allow labels. Yes, that is true. I will submit a patch tomorrow. BTW, is this related to https://github.com/ClusterLabs/resource-agents/issues/200 ? Yes, the proposed patch in #200 would not support the nic:label format either so it should be re-written in a different way. Honestly I'm wondering if it is really worth to fix #200 though. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] agents: including LGPL license file
Hi Dejan, 2012/12/24 Dejan Muhamedagic de...@suse.de: Hi Keisuke-san, On Thu, Dec 20, 2012 at 02:14:10PM +0900, Keisuke MORI wrote: Hi, The resource-agents package is licensed under GPL and LGPL, but the full copy of LGPL license file is missing as opposed to the heartbeat and the glue packages that includes it. Why don't we include COPYING.LGPL in the agents package too as the verbatim copy of LGPL license for the consistency? Not really an expert in the area, but I think there's no problem adding a copy of a license. Thanks for your comment. I will submit a pull request for that. There is no problem with the current package at all, but adding it would be good to clarify which licenses we are using more precisely. The background is that our legal division advised that there was a 'bogus' OSS project which claims as if they are using a popular OSS license defined by OSI but actually they derived it with some additional clauses and limitations for their benefits. Including a verbatim copy of a license file will help for clarifying that we are a valid OSS project. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] agents: including LGPL license file
Hi, The resource-agents package is licensed under GPL and LGPL, but the full copy of LGPL license file is missing as opposed to the heartbeat and the glue packages that includes it. Why don't we include COPYING.LGPL in the agents package too as the verbatim copy of LGPL license for the consistency? Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
Hi, 2012/11/27 Dejan Muhamedagic de...@suse.de: (...) It might be also helpful if it has a kind of 'hook' functionality that allows you to execute an arbitrary script for collecting the runtime information such as CPU usage, memory status, I/O status or the list of running processes etc. for diagnosis. Yes. I guess that one could run such a hook in background. Did you mean that? I first thought that it simply runs a one-shot hook at the invocation of the RA instance, but it would be great if it can run in background while running a RA operation. Or once the RA instance exited? This is a bit different feature though. It is also possible if it can run a hook at the event of the RA timeouts or a command in the RA gets stuck in some reason. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] RA trace facility
Hi, 2012/11/22 Dejan Muhamedagic de...@suse.de: Hi Lars, On Wed, Nov 21, 2012 at 04:43:08PM +0100, Lars Marowsky-Bree wrote: On 2012-11-21T16:33:18, Dejan Muhamedagic de...@suse.de wrote: Hi, This is little something which could help while debugging resource agents. Setting the environment variable __OCF_TRACE_RA would cause the resource agent run to be traced (as in set -x). PS4 is set accordingly (that's a bash feature, don't know if other shells support it). ocf-tester got an option (-X) to turn the feature on. The agent itself can also turn on/off tracing via ocf_start_trace/ocf_stop_trace. Do you find anything amiss? I *really* like this. But I'd like a different way to turn it on - a standard one that is available via the CIB configuration, without modifying the script. I don't really want that the script gets modified either. The above instructions are for people developing a new RA. I like this, too. I would be useful when you need to diagnose in the production environment if you can enable / disable it without any modifications to RAs. It might be also helpful if it has a kind of 'hook' functionality that allows you to execute an arbitrary script for collecting the runtime information such as CPU usage, memory status, I/O status or the list of running processes etc. for diagnosis. -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6. (#97)
Hi Lars, Thank you for your comments, I'm going to answer to your comments below, and if you have further comments I would greatly appreciate it. 2012/10/16 Lars Ellenberg l...@linbit.com: Again, apollogies for not having this sent out when I wrote it, I'm unsure why it hibernated in my Draft folder for five month, but it was not the only one :( I realize the pull request has meanwhile been closed, and we do have a findif.sh implementation in current git. Still I'll just send these comments as I wrote them back then, maybe some of them still apply. At the end, there is a bit of comment how to maybe re-implement the ipcheck and ifcheck functions without grep and awk. Feel free to ignore for now, though. I'll try to review again whats in git now, and send a proper git diff, once I find the time ;-) On Wed, May 30, 2012 at 11:20:03PM -0700, Keisuke MORI wrote: This is a proposal enhancement of IPaddr2 to support IPv6 as well as IPv4. I would appreciate your comments and suggestions for merging this into the upstream. NOTE: This pull request is meant for reviewing the code and discussions, and not intended to be merged as is at this moment. Github pull request comments are IMO not the best place to discuss these things, so I send to the linux-ha-dev mailing list as well. ## Benefits: * Unify the usage, behavior and the code maintenance between IPv4 and IPv6 on Linux. The usage of IPaddr2 and IPv6addr are similar but they have different parameters and different behaviors. In particular, they may choose a different interface depending on your configuration even if you provided similar parameters in the past. IPv6addr is written in C and rather hard to make improvements. As /bin/ip already supports both IPv4 and IPv6, we can share the most of the code of IPaddr2 written in bash. IPv6addr is supposed to run on non-Linux as well. So we better not deprecate it, as long as all the world is not Linux. Agreed. * usable for LVS on IPv6. IPv6addr does not support lvs_support=true and unfortunately there is no possible way to use LVS on IPv6 right now. IPaddr2(/bin/ip) works for LVS configurations without enabling lvs_support both for IPv4 and IPv6. (You don't have to remove an address on the loopback interface if the virtual address is assigned by using /bin/ip.) See also: http://www.gossamer-threads.com/lists/linuxha/dev/76429#76429 * retire the old 'findif' binary. 'findif' binary is replaced by a shell script version of findif, originally developed by lge. See ClusterLabs/resource-agents#53 : findif could be rewritten in shell * easier support for other pending issues These pending issues can be fix based on this new IPaddr2. * ClusterLabs/resource-agents#68 : Allow ipv6addr to mark new address as deprecated * ClusterLabs/resource-agents#77 : New RA that controls IPv6 address in loopback interface ## Notes / Changes: * findif semantics changes There are some incompatibility in deciding which interface to be used when your configuration is ambiguous. But in reality it should not be a problem as long as it's configured properly. The changes mostly came from fixing a bug in the findif binary (returns a wrong broadcast) or merging the difference between (old)IPaddr2 and IPv6addr. See the ofct test cases for details. (case No.6, No.9, No.10, No.12, No.15 in IPaddr2v4 test cases) Other notable changes are described below. * broadcast parameter for IPv4 broadcast parameter may be required along with cidr_netmask when you want use a different subnet mask from the static IP address. It's because doing such calculation is difficult in the shell script version of findif. See the ofct test cases for details. (case No.11, No.14, No.16, No.17 in IPaddr2v4 test cases) This limitation may be eliminated if we would remove brd options from the /bin/ip command line. If we do not specify the broadcast at all, ip will do the right thing by default, I think. We should only use it on the ip command line, if it is in the input parameters. I don't really have a use case for the broadcast address to not be the default, so I would be ok with dropping it completely. It has been fixed and the latest code in the repo now should work like you said. * loopback(lo) now requires cidr_netmask or broadcast. See the ofct test case in the IPaddr2 ocft script. The reason is similar to the previous one. We really need to avoid breaking existing configurations. So we need to fix this. If we find nothing better, with some heuristic. It has also been fixed now and loopback can be used as same as before. * loose error check for nic for a IPv6 link-local address. IPv6addr was able to check this, but in the shell script it is hard to determine a link-local address (requires bitmask calculation). I do not think it's worth
Re: [Linux-HA] Question about pacemaker + heartbeat + postgres in active/passive configuration
Hi, 2012/8/16 Renee Riffee riffe...@gmail.com: His presentation is here: http://www.slideshare.net/ksk_ha/linuxconfauhaminiconfpgsql9120120116 Click on the Save File button in the bar... Yes, that is it. Thank you for finding and sharing it :). The developer's wiki page provides more up-to-date information for pgsql RA after it has included to the upstream: https://github.com/t-matsuo/resource-agents/wiki/Resource-Agent-for-PostgreSQL-9.1-streaming-replication As for the configuration for heartbeat, documents from linux-ha.org might help: http://www.linux-ha.org/doc/users-guide/_creating_an_initial_heartbeat_configuration.html Regards, Keisuke MORI On Aug 15, 2012, at 4:54 PM, DENNY, MICHAEL wrote: Hi, Andrew. Where can I find the presentation by Keisuke? Btw, I use heartbeat in combo with pacemaker also...but with MySQL...was just a decision based on an impression at the time that most of the documentation content was about heartbeat...and that heartbeat had a more stability. And it seemed that there was nothing negative being posted that would deter me about the choice. After being on this distribution list for a while, I now think it's time for me to move to Corosync. - Mike -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof Sent: Wednesday, August 15, 2012 7:40 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Question about pacemaker + heartbeat + postgres in active/passive configuration On Wed, Aug 15, 2012 at 10:25 PM, Renee Riffee riffe...@gmail.com wrote: Hello everyone, Apologies if this is not the correct group for this question, but I am seeking information on how to set up pacemaker with heartbeat and postgres in an active/passive streaming (pg 9.1) configuration. I would prefer to use heartbeat and not corosync, although most of the good tutorials like the Cluster_from_scratch document use corosync with pacemaker, but I don't have any shared storage to use it with on my machines. The presentation by Keisuke Mori of Linux-HA Japan is beautiful and exactly what I really want to do, but I need more information on how to use pacemaker with heartbeat instead of corosync. Assuming pacemaker was built with heartbeat support, simply install heartbeat and add: crm respawn to ha.cf Start heartbeat, job done. Resource configuration is unchanged from what you see in Clusters from Scratch Does anyone know of a recipe or tutorial that goes through what his presentation shows? Kind regards, -Renee ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-ha-dev] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.
Hi Alan, Thank you for your comments. 2012/5/31 Alan Robertson al...@unix.sh: It's straightforward to determine if an IP address is link-local or not - for an already configured address. 3: eth1: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 94:db:c9:3f:7c:20 brd ff:ff:ff:ff:ff:ff inet 10.10.10.30/24 brd 10.10.10.255 scope global eth1 inet6 fe80::96db:c9ff:fe3f:7c20/64 scope link valid_lft forever preferred_lft forever This works uniformly for both ipv4 and ipv6 addresses (quite nice!) It's an interesting idea, but I don't think we need to care about IPv4 link-local addresses because users can configure using the same manner as a regular IP address. (and it's used very rarely) In the case of IPv6 link-local addresses it is almost always a wrong configuration if nic is missing (the socket API mandate it) so we want to check it. However, for addresses which are not yet up (which is unfortunately what you're concerned with), ipv6 link-local addresses take the form fe80:: -- followed by 64-bits of MAC addresses (48 bit MACs are padded out) http://en.wikipedia.org/wiki/Link-local_address MAC addresses never begin with 4 bytes of zeros, so the regular expression to match this is pretty straightforward. This isn't a bad approximation (but could easily be made better): Yes, you are right. Matching to 'fe80::' should be pretty easy and good enough. Why I could not think of such a simple idea :) islinklocal() { if echo $1 | grep -i '^fe80::[^:]*:[^:]*:[^:]*:[^:]*$' /dev/null We should also accept 'fe80::1'. Anyway I will look into this way. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6.
I would like to propose an enhancement of IPaddr2 to support IPv6 as well as IPv4. I've submitted this as a pull request #97 but also posting to the ML for a wider audience. I would appreciate your comments and suggestions for merging this into the upstream. [RFC] IPaddr2: Proposal patch to support the dual stack of IPv4 and IPv6. https://github.com/ClusterLabs/resource-agents/pull/97 ## Benefits: * Unify the usage, behavior and the code maintenance between IPv4 and IPv6 on Linux. The usage of IPaddr2 and IPv6addr are similar but they have different parameters and different behaviors. In particular, they may choose a different interface depending on your configuration even if you provided similar parameters in the past. IPv6addr is written in C and rather hard to make improvements. As /bin/ip already supports both IPv4 and IPv6, we can share the most of the code of IPaddr2 written in bash. * usable for LVS on IPv6. IPv6addr does not support lvs_support=true and unfortunately there is no possible way to use LVS on IPv6 right now. IPaddr2(/bin/ip) works for LVS configurations without enabling lvs_support both for IPv4 and IPv6. (You don't have to remove an address on the loopback interface if the virtual address is assigned by using /bin/ip.) See also: http://www.gossamer-threads.com/lists/linuxha/dev/76429#76429 * retire the old 'findif' binary. 'findif' binary is replaced by a shell script version of findif, originally developed by lge. See findif could be rewritten in shell : https://github.com/ClusterLabs/resource-agents/issues/53 * easier support for other pending issues These pending issues can be fix based on this new IPaddr2. * Allow ipv6addr to mark new address as deprecated https://github.com/ClusterLabs/resource-agents/issues/68 * New RA that controls IPv6 address in loopback interface https://github.com/ClusterLabs/resource-agents/pull/77 ## Notes / Changes: * findif semantics changes There are some incompatibility in deciding which interface to be used when your configuration is ambiguous. But in reality it should not be a problem as long as it's configured properly. The changes mostly came from fixing a bug in the findif binary (returns a wrong broadcast) or merging the difference between (old)IPaddr2 and IPv6addr. See the ofct test cases for details. (case No.6, No.9, No.10, No.12, No.15 in IPaddr2v4 test cases) Other notable changes are described below. * broadcast parameter for IPv4 broadcast parameter may be required along with cidr_netmask when you want use a different subnet mask from the static IP address. It's because doing such calculation is difficult in the shell script version of findif. See the ofct test cases for details. (case No.11, No.14, No.16, No.17 in IPaddr2v4 test cases) This limitation may be eliminated if we would remove brd options from the /bin/ip command line. * loopback(lo) now requires cidr_netmask or broadcast. See the ofct test case in the IPaddr2 ocft script. The reason is similar to the previous one. * loose error check for nic for a IPv6 link-local address. IPv6addr was able to check this, but in the shell script it is hard to determine a link-local address (requires bitmask calculation). I do not think it's worth to implement it in shell. * send_ua: a new binary We need one new binary as a replacement of send_arp for IPv6 support. IPv6addr.c is reused to make this command. Note that IPv6addr RA is still there and you can continue to use it for the backward compatibility. ## Acknowledgement Thanks to Tomo Nozawa-san for his hard work for writing and testing this patch. Thanks to Lars Ellenberg for the first findif.sh implementation. Best Regards, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Modified patch for RA
Hi Yves, Thank you for revising the patch. I've confirmed that this patch resumes the log level for mysql_status as before. 2012/5/5 Yves Trudeau y.trud...@videotron.ca: Hi Dejan, here's another modified patch for the mysql agent of the commit version 4c18035 (g...@github.com:y-trudeau/resource-agents.git branch mysql-repl). Following a comment of Keisuke, I put back the log level for mysql_status in probe mode. Regards, Yves ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Patch to mysql RA for replication
Hi Yves, 2012/4/19 Yves Trudeau y.trud...@videotron.ca: - cleanup loglevel Why did you remove all the loglevel stuff away? Was there anything wrong with that? After your patch, the RA will generate inappropriate ERROR logs whenever it starts/stops/probes even though they're all _expected_ results and nothing to worry about. It's confusing for users and we have been trying to eliminate such confusing ERROR logs as possible. The loglevel code is intended to use INFO level when it's an expected result, and to use ERROR level only when it's considered a failure. https://github.com/ClusterLabs/resource-agents/commit/72952904b67b85e1809f90255a55ce39eb2a8922 I would like to revert them back. Thanks, Hi Dejan, here's my patch to the mysql agent in the commit version 4c18035. Sorry for being inept with git. Included here: - attribute for replication_info - put in a variable error code 1040 - put in a variable the long call to crm_attribute for replication_info - cleanup loglevel - defined a value for DEBUG_LOG Like I wrote before, I didn't find any solution yet to remove the IP attribute for each node. Using a replication_VIP breaks the operation of the agent as it removes the easy way to add new nodes (or rejoin) Regards, Yves ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH] cluster-glue: correct a build dependency on CentOS6/RHEL6
Hi, Please consider the attached patch for cluster-glue. Thanks, -- Keisuke MORI export-libuuid.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Fwd: [lvs-users] defunct checkcommand processes w/ ldirectord
Hi, A bug report and a proposed patch to ldirectord were posted to lvs-users ML a little while ago. I think it's worth to include. http://lists.graemef.net/pipermail/lvs-users/2012-February/024430.html -- Forwarded message -- From: Sohgo Takeuchi so...@sohgo.dyndns.org Date: 2012/2/11 Subject: Re: [lvs-users] defunct checkcommand processes w/ ldirectord To: lvs-us...@linuxvirtualserver.org, da...@davidcoulson.net Hello, David From: David Coulson da...@davidcoulson.net | I'm running ldirectord with a few external checkcommands, but end up with numerous defunct processes on the system. Seems like this issue: http://archive.linuxvirtualserver.org/html/lvs-users/2010-08/msg00040.html I realize I can set an alarm on my command and cause it to exit, but is there a fix for ldirectord to correctly clean up processes which are killed due to the internal timeout? Please try the following patch. diff --git a/ldirectord/ldirectord.in b/ldirectord/ldirectord.in index 5d26114..c28eb40 100644 --- a/ldirectord/ldirectord.in +++ b/ldirectord/ldirectord.in @@ -2671,19 +2671,21 @@ sub run_child my $real = $$v{real}; my $virtual_id = get_virtual_id_str($v); my $checkinterval = $$v{checkinterval} || $CHECKINTERVAL; $0 = ldirectord $virtual_id; while (1) { foreach my $r (@$real) { $0 = ldirectord $virtual_id checking $$r{server}; _check_real($v, $r); + check_signal(); } $0 = ldirectord $virtual_id; sleep $checkinterval; + check_signal(); ld_emailalert_resend(); } } sub _check_real { my $v = shift; my $r = shift; ___ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-us...@linuxvirtualserver.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra
Hi, Any update on this? 2012/2/1 Keisuke MORI keisuke.mori...@gmail.com: Hi Dejan, 2012/1/31 Dejan Muhamedagic de...@suse.de: Hi Keisuke-san, On Tue, Jan 31, 2012 at 09:52:24PM +0900, Keisuke MORI wrote: Hi Dejan 2012/1/31 Dejan Muhamedagic de...@suse.de: Hi Keisuke-san, (...) On Tue, Jan 31, 2012 at 08:46:35PM +0900, Keisuke MORI wrote: The current RA will try to check the top page (http://localhost:80) as the default behavior if you have not enabled server-status in httpd.conf and it would fail to start even for the apache's default test page:) Hmm, the current RA would produce an error for that URL: 488 case $STATUSURL in 489 http://*/*) ;; 490 *) 491 ocf_log err Invalid STATUSURL $STATUSURL 492 exit $OCF_ERR_ARGS ;; 493 esac Strange. That URL is generated by the RA itself. apache-conf.sh: 119 buildlocalurl() { 120 [ x$Listen != x ] 121 echo http://${Listen}; || 122 echo ${LOCALHOST}:${PORT} Probably we should relax the validation pattern, as just 'http://*' ? Agreed. I thought that the intention was to always use the status page, but obviously people figured out that they could skip that. Just as well. Thank you for your productive comments and discussions! As the result of the discussion regarding to this topic, I would suggest two patches as the pull request below: https://github.com/ClusterLabs/resource-agents/pull/54 Regards, -- Keisuke MORI -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH] hb_report fails to sudo
Hi, hb_report in cluster-glue-1.0.8 or later fails on an error even if it runs as root, at least on RHEL: --- # id -u 0 # hb_report -f 16:00 report1 sudo: sorry, you must have a tty to run sudo (...) --- It seems introduced by this changeset: http://hg.linux-ha.org/glue/rev/f55d68c37426 Apparently two issues are involved: 1) it tries to use sudo even when invoked as root. 2) sudo may be prohibited without tty on some distros such as RHEL for a security sake. The attached patch would fix for 1). You can workaround it by specifying '-u root' explicitly until it gets fixed. As for 2), it seems that the current hb_report need to _disable_ tty on ssh, so you would need an additional configuration to /etc/sudoers on such distros if you want to use a regular user to ssh. Regards, -- Keisuke MORI hb_report-sudo-root.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra
Hi Dejan, 2012/1/31 Dejan Muhamedagic de...@suse.de: Hi Keisuke-san, On Mon, Jan 30, 2012 at 08:38:35PM +0900, Keisuke MORI wrote: Hi, 2012/1/28 Dejan Muhamedagic de...@suse.de: Hi, On Fri, Jan 20, 2012 at 02:09:13PM +0900, nozawat wrote: Hi Dejan I'm agreeable in the opinion. I send the patch which I revised. I'll apply this one. BTW, can you share your use case. If there is not -z option, the following HTML files return an error. - example --- html body test /body /html --- I placed a page for checks and was going to monitor it. Even though I said I'd apply this one, I'm now rather reluctant, because it may break some existing configurations, for instance if there are anchors in the regular expression (^ or $). Why is it important to match multiple lines? Just curious: how do you put this string into statusurl? The problem is that the default value of testregex assumes that /body and /html tags are in a single line, although it is very common that the HTML contents return them as multiple lines. TESTREGEX=${OCF_RESKEY_testregex:-'/ *body *[[:space:]]*/ *html *'} I think that it will not be a problem when you are using apache with 'server-status' handler enabled because in that case apache seems return those tags in a single line, but it is also a common use case that the RA monitors to, say, the index.html on the top page. Ah, but in that case, i.e. if another page is specified to be monitored, testregex should be adjusted accordingly. The default are guaranteed to work only with the apache status page. Though I'm not really happy with the default regular expression we cannot change that. The current RA will try to check the top page (http://localhost:80) as the default behavior if you have not enabled server-status in httpd.conf and it would fail to start even for the apache's default test page:) I agree that an user should change the testregex accordingly when they specify the page to be monitored, but I just wanted to make it work with a default configuration. As for the regular expression like ^ or $, it looks like working as expected with -z option in my quick tests. Do you have any examples that it may break the configuration? For instance, what I see here in the status page is also a PID at the beginning of line: xen-d:~ # wget -q -O- -L --no-proxy --bind-address ::1 http://[::1]/server-status | grep ^PID PID Key: br / xen-d:~ # wget -q -O- -L --no-proxy --bind-address ::1 http://[::1]/server-status | grep -z ^PID xen-d:~ # echo $? 1 Hmm, OK you are right. My test was not enough. (Thanks to Lars for the comprehensive tests in the other mail!) Now I understand that we should not support multiple lines matching so that we can support ^ or $ as testregex in various platforms. It is reasonable. If we should not really support multiple lines matching, then that is fine for us too, but in that case it would be preferable that the default value of testregex is something better for a single line matching, like just '/ *html *'. (and also we should mention about it in the meta-data document) Hmm, I'd really expect that in case a different page is checked, then also another test string is specified. After all, shouldn't that be part of the content, rather than an HTML code which can occur in any HTTP reply. But we could just as well reduce to the default regular expression to '/ *html *'. If nobody objects :) Yes. Regards, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra
Hi Dejan 2012/1/31 Dejan Muhamedagic de...@suse.de: Hi Keisuke-san, (...) On Tue, Jan 31, 2012 at 08:46:35PM +0900, Keisuke MORI wrote: The current RA will try to check the top page (http://localhost:80) as the default behavior if you have not enabled server-status in httpd.conf and it would fail to start even for the apache's default test page:) Hmm, the current RA would produce an error for that URL: 488 case $STATUSURL in 489 http://*/*) ;; 490 *) 491 ocf_log err Invalid STATUSURL $STATUSURL 492 exit $OCF_ERR_ARGS ;; 493 esac Strange. That URL is generated by the RA itself. apache-conf.sh: 119 buildlocalurl() { 120[ x$Listen != x ] 121 echo http://${Listen}; || 122 echo ${LOCALHOST}:${PORT} Probably we should relax the validation pattern, as just 'http://*' ? I agree that an user should change the testregex accordingly when they specify the page to be monitored, but I just wanted to make it work with a default configuration. Of course. As for the regular expression like ^ or $, it looks like working as expected with -z option in my quick tests. Do you have any examples that it may break the configuration? For instance, what I see here in the status page is also a PID at the beginning of line: xen-d:~ # wget -q -O- -L --no-proxy --bind-address ::1 http://[::1]/server-status | grep ^PID PID Key: br / xen-d:~ # wget -q -O- -L --no-proxy --bind-address ::1 http://[::1]/server-status | grep -z ^PID xen-d:~ # echo $? 1 Hmm, OK you are right. My test was not enough. (Thanks to Lars for the comprehensive tests in the other mail!) Now I understand that we should not support multiple lines matching so that we can support ^ or $ as testregex in various platforms. It is reasonable. If we should not really support multiple lines matching, then that is fine for us too, but in that case it would be preferable that the default value of testregex is something better for a single line matching, like just '/ *html *'. (and also we should mention about it in the meta-data document) Hmm, I'd really expect that in case a different page is checked, then also another test string is specified. After all, shouldn't that be part of the content, rather than an HTML code which can occur in any HTTP reply. But we could just as well reduce to the default regular expression to '/ *html *'. If nobody objects :) Yes. So, with this the RA will always match any HTML. That should be fine for the default. Cheers, Dejan Regards, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ Regards, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra
Hi Dejan, 2012/1/31 Dejan Muhamedagic de...@suse.de: Hi Keisuke-san, On Tue, Jan 31, 2012 at 09:52:24PM +0900, Keisuke MORI wrote: Hi Dejan 2012/1/31 Dejan Muhamedagic de...@suse.de: Hi Keisuke-san, (...) On Tue, Jan 31, 2012 at 08:46:35PM +0900, Keisuke MORI wrote: The current RA will try to check the top page (http://localhost:80) as the default behavior if you have not enabled server-status in httpd.conf and it would fail to start even for the apache's default test page:) Hmm, the current RA would produce an error for that URL: 488 case $STATUSURL in 489 http://*/*) ;; 490 *) 491 ocf_log err Invalid STATUSURL $STATUSURL 492 exit $OCF_ERR_ARGS ;; 493 esac Strange. That URL is generated by the RA itself. apache-conf.sh: 119 buildlocalurl() { 120 [ x$Listen != x ] 121 echo http://${Listen}; || 122 echo ${LOCALHOST}:${PORT} Probably we should relax the validation pattern, as just 'http://*' ? Agreed. I thought that the intention was to always use the status page, but obviously people figured out that they could skip that. Just as well. Thank you for your productive comments and discussions! As the result of the discussion regarding to this topic, I would suggest two patches as the pull request below: https://github.com/ClusterLabs/resource-agents/pull/54 Regards, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] new resource agents release
Hi Dejan, 2012/1/27 Dejan Muhamedagic de...@suse.de: 2. apache RA: testregex matching fix http://www.gossamer-threads.com/lists/linuxha/dev/77619#77619 This looks as a regression since heartbeat-2.1.4 from user's point of view; one of our customer reported that they had been using 2.1.4 and apache without problems and when they tried to upgrade to the recent Pacemaker without any changes in apache, it failed because of this issue. You're referring to apache-002.patch? Well, that's unfortunate as the two are incompatible, i.e. if the configuration has 'whatever-string$' in testregex and we reintroduce tr '\012' ' ' that would break such configurations. Oops, sorry, I was only referring to apache-001.patch. This changed a bit more than three years ago and so far nobody complained. So, perhaps better to leave it as it is and whoever wants to upgrade from a +3 old installation should anyway do some good testing. What do you think? Ok, let's move the discussion on to the relevant thread about this topic. Regards, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH]Monitor failure and IPv6 support of apache-ra
Hi, 2012/1/28 Dejan Muhamedagic de...@suse.de: Hi, On Fri, Jan 20, 2012 at 02:09:13PM +0900, nozawat wrote: Hi Dejan I'm agreeable in the opinion. I send the patch which I revised. I'll apply this one. BTW, can you share your use case. If there is not -z option, the following HTML files return an error. - example --- html body test /body /html --- I placed a page for checks and was going to monitor it. Even though I said I'd apply this one, I'm now rather reluctant, because it may break some existing configurations, for instance if there are anchors in the regular expression (^ or $). Why is it important to match multiple lines? Just curious: how do you put this string into statusurl? The problem is that the default value of testregex assumes that /body and /html tags are in a single line, although it is very common that the HTML contents return them as multiple lines. TESTREGEX=${OCF_RESKEY_testregex:-'/ *body *[[:space:]]*/ *html *'} I think that it will not be a problem when you are using apache with 'server-status' handler enabled because in that case apache seems return those tags in a single line, but it is also a common use case that the RA monitors to, say, the index.html on the top page. As for the regular expression like ^ or $, it looks like working as expected with -z option in my quick tests. Do you have any examples that it may break the configuration? If we should not really support multiple lines matching, then that is fine for us too, but in that case it would be preferable that the default value of testregex is something better for a single line matching, like just '/ *html *'. (and also we should mention about it in the meta-data document) Regards, Keisuke MORI Cheers, Dejan Regards, Tomo 2012年1月20日4:20 Dejan Muhamedagic de...@suse.de: Hi, On Thu, Jan 19, 2012 at 11:42:07AM +0900, nozawat wrote: Hi Dejan and Lars I send the patch which settled a conventional argument. 1)apache-001.patch -I am the same with the patch which I sent last time. -It is the version that I added an option of the grep to. I'll apply this one. BTW, can you share your use case. 2)apache-002.patch -It is a processing method using tr at the age of HB2.1.4. Can't recall or see from the history why tr(1) was dropped (and it was me who removed it :( But I guess there was a reason for that. 3)http-mon.sh.patch -It is the patch which coupled my suggestion with A. After trying to rework the patch a bit, I think now that we need a different user interface, i.e. we should introduce a boolean parameter, say use_ipv6, then fix interface bind addresses depending on that. For instance, if user wants to use curl, then we'd need to add the -g option to make it work with IPv6. We can also try to figure out from the statusurl content if it contains an IPv6 address (echo $statusurl | grep -qs ::) then make the http client use IPv6 automatically. Would that work for you? Opinions? Cheers, Dejan 1) and 2) improve malfunction at the time of the monitor processing. 3) supports IPv6. The malfunction is not revised when I do not apply at least 1) or 2). I think that 2) plan is good, but leave the final judgment to Dejan. Regards, Tomo 2012年1月19日1:12 Dejan Muhamedagic de...@suse.de: Hi, On Wed, Jan 18, 2012 at 11:19:58AM +0900, nozawat wrote: Hi Dejna and Lars When, for example, it is a logic of the examples of Lars to try both, in the case of IPv6, is the check of IPv4 that I enter every time? Don't you hate that useless processing enters every time? In that case, I think that I should give a parameter such as OCF_RESKEY_bindaddress. -- bind_address=127.0.0.1 if [ -n $OCF_RESKEY_bindaddress ]; then bind_address=$OCF_RESKEY_bindaddress fi WGETOPTS=-O- -q -L --no-proxy --bind-address=$bind_address -- That's fine too. We can combine yours and Lars' proposal, i.e. in case bindaddress is not set, it tries both. Do you think you could prepare such a patch? BTW, the extra processing is minimal, in particular compared to the rest this RA does. Thanks, Dejan Regards, Tomo 2012年1月17日23:28 Dejan Muhamedagic de...@suse.de: On Tue, Jan 17, 2012 at 11:41:41AM +0100, Lars Ellenberg wrote: On Tue, Jan 17, 2012 at 11:07:09AM +0900, nozawat wrote: Hi Dejan and Lars, I send the patch which I revised according to indication of Lars. OK. I guess that this won't introduce a regression. And I guess that sometimes one may need a newline in the test string. I seemed to surely take such a step in the past. However, I thought that the tr processing was deleted that load became higher. Therefore I used the -z option. Thinking about it, maybe to reduce chance
[Linux-ha-dev] [GIT PULL] Medium: IPv6addr: handle a link-local address properly in send_ua
Dejan, Would you consider to pull this patch to resolve issue #29 on the github? https://github.com/ClusterLabs/resource-agents/pull/34 Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] LVS support for IPv6
Hello all, I would like to use LVS Direct Routing configuration on IPv6 but I encountered a problem as described below. I'm now going to fix it but I've found that there may be a several arguments to decide how it should be fix, so I would like to ask everybody's opinion before I proceed. Please give me your thoughts, comments about how we should fix it. Symptom: On IPv4, I have been using IPaddr2 RA for LVS DR and it works like a charm. On IPv6, I tried to use IPv6addr RA for the virtual IPv6 address with a similar configuration as IPv4 but the address would not become reachable from another node. The ip command shows that the duplicate address has assigned to both lo and ethX and one has 'dadfailed' flag (Duplicate Address Detection defined in RFC4862). # ip addr show 1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue state UNKNOWN inet6 2004::210/128 scope global valid_lft forever preferred_lft forever (...) 5: eth3: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc mq state UP qlen 1000 inet6 2004::210/64 scope global tentative dadfailed valid_lft forever preferred_lft forever Arguments: 1) Which RA should be improved? IPaddr2 or IPv6addr? Obviously we have two approaches to fix this and each has pros/cons. a) improve IPadd2 to support IPv4/IPv6 dual stack b) improve IPv6addr to remove a duplicate IPv6 address on the loopback. As for a), pros: easy to maintain as a single code base. uniform behavior between IPv4/IPv6 since ip command already supports dual stack. cons: it changes the policy of the recommended RA for IPv6 on Linux. need a new binary for the replacement of send_arp for IPv6. As for b), pros: no changes for the existing IPaddr2. cons: need to implement lvs_support=true equivalent feature in C, which may make the code to maintain harder. 2) Is lvs_support=true functionality really necessary? When I use IPaddr2 for LVS on IPv4, it's been working perfectly *without* lvs_support=true. In this case the same IP addresses are assigned to both lo and ethX, and still works everything fine. Addition to this, the latest IPaddr2 has a bug and it does not remove the IP address on lo even if lvs_support=true. This was reported once before: http://www.gossamer-threads.com/lists/linuxha/pacemaker/71106#71106 On IPv6, I also tried assigning an IPv6 address to both lo and ethX by ip command manually, and it seems to work fine as same as IPv4. The weird 'dadfailed' flag was not seen when I use ip command. Proposed solution: As considering all arguments above, I would like to suggest the following modification: - improve IPaddr2 as IPv4/IPv6 dual stack support. - recommend to use IPaddr2 both for IPv4/IPv6 on Linux in the future. IPaddr/IPv6addr would be only left for legacy and cross platform support. - lvs_support=true option would be deprecated and no longer necessary. Any opinions, suggestions are appreciated. I will work on it after we all agree on how we should fix it. Regards, Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] [ha-wg] CFP: HA Mini-Conference in Prague on Oct 25th
Hi, 2011/10/10 Dejan Muhamedagic de...@suse.de: On Sun, Oct 09, 2011 at 11:28:41PM +1100, Andrew Beekhof wrote: On Sat, Oct 8, 2011 at 6:03 AM, Digimer li...@alteeve.com wrote: On 10/07/2011 02:58 PM, Florian Haas wrote: Vienna before the early afternoon of Saturday the 29th, so if anyone has plans to do something interesting that Saturday morning I'd be more than happy to join. Cheers, Florian I'm going to be in the city all day Saturday as well. Knowing there will be at least a few who will have trouble making the unofficial meeting on the 26th, The 26th is just the meeting start. It's not 26th, but 25th. It also says so in the subject line. I'll be in Prague only on 25th. I'm trying to settle my schedule being in Prague from 25th to 28th. See you everybody over there. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] regressions in resource-agents 3.9.1
Hi, Is there any backlogs for the 3.9.2 release? I'm very looking forward to see it soon since 3.9.1 was not really usable for me... Thanks, 2011/6/22 Dejan Muhamedagic deja...@fastmail.fm: Hi all, On Wed, Jun 22, 2011 at 11:22:48PM +0900, Keisuke MORI wrote: 2011/6/22 Florian Haas florian.h...@linbit.com: On 2011-06-22 11:48, Dejan Muhamedagic wrote: Hello all, Unfortunately, it turned out that there were two regressions in the 3.9.1 release: - iscsi on platforms which run open-iscsi 2.0-872 (see http://developerbugs.linux-foundation.org/show_bug.cgi?id=2562) - pgsql probes with shared storage (iirc), see http://marc.info/?l=linux-ham=130858569405820w=2 Thanks to Vadym Chepkov for finding and reporting them. I'd suggest to make a quick fix release 3.9.2. Opinions? Agree. +1 OK. Let's do that on Friday morning. Tomorrow is holiday here. Cheers, Dejan -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] regressions in resource-agents 3.9.1
2011/6/22 Florian Haas florian.h...@linbit.com: On 2011-06-22 11:48, Dejan Muhamedagic wrote: Hello all, Unfortunately, it turned out that there were two regressions in the 3.9.1 release: - iscsi on platforms which run open-iscsi 2.0-872 (see http://developerbugs.linux-foundation.org/show_bug.cgi?id=2562) - pgsql probes with shared storage (iirc), see http://marc.info/?l=linux-ham=130858569405820w=2 Thanks to Vadym Chepkov for finding and reporting them. I'd suggest to make a quick fix release 3.9.2. Opinions? Agree. +1 -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] using the pacemaker logo for the xing group
Hi Erkan, As I've sent a personal email to you and as Ikeda-san already replied to you, Anybody may use the logo in conjunction with any Pacemaker / Linux-HA related projects. The logo is a contribution from the Japanese Pacemaker / Linux-HA community, so asking the permission to the Japanese mailing list as you did is right but here is also fine. You can obtain the logo from here: (sorry it's in Japanese) http://linux-ha.sourceforge.jp/wp/archives/369 Regards, Keisuke MORI Linux-HA Japan Project. 2011/6/21 Junko IKEDA tsukishima...@gmail.com: Hi Erkan, The pacemaker logos has been created by NTT group. I asked for the boss's permission, I think I can send them to you directory soon :) Did you post the similar mail to the Japanese mailing list before this? Sorry to inconvenience you. Thanks, Junko IKEDA NTT DATA INTELLILINK CORPORATION 2011/6/20 erkan yanar erkan.ya...@linsenraum.de: Moin, I would like to use the (red/rabbit) pacemaker logo for the linux cluster group in xing. Who do I have to ask for permission to use it? Regards Erkan -- über den grenzen muß die freiheit wohl wolkenlos sein ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release
Hi, Thank you for all your efforts for the new release. 2011/6/7 Fabio M. Di Nitto fdini...@redhat.com: Several changes have been made to the build system and the spec file to accommodate both projects´ needs. The most noticeable change is the option to select all, linux-ha or rgmanager resource agents at configuration time, which will also set the default for the spec file. Why is the ldirectord package disabled on RHEL environment? I would expect that it would be built as same as (linux-ha) resource-agents-1.0.4 so that we can use the upcoming 3.9.1 as the upgrade version. We still use the resource-agents/ldirectord on many RHEL systems and if it was missing we can not upgrade them anymore. from resource-agents.spec.in : --- --- --- --- --- --- --- --- --- %if %{with linuxha} %if 0%{?rhel} == 0 %package -n ldirectord --- --- --- --- --- --- --- --- --- NOTE: About the 3.9.x version (particularly for linux-ha folks): This version was chosen simply because the rgmanager set was already at 3.1.x. In order to make it easier for distribution, and to keep package upgrades linear, we decided to bump the number higher than both projects. There is no other special meaning associated with it. The final 3.9.1 release will take place soon. BTW why not 4.0? :) just curious though. Regards, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release
Hi, 2011/6/8 Fabio M. Di Nitto fdini...@redhat.com: Why is the ldirectord package disabled on RHEL environment? I would expect that it would be built as same as (linux-ha) resource-agents-1.0.4 so that we can use the upcoming 3.9.1 as the upgrade version. Because ldirectord requires libnet to build and libnet is not available on default RHEL (unless you explicitly enable EPEL). ldirectord requires no extra packages to build on RHEL. It just a perl script. You may be concerned about the running environment; it requires perl-MailTools at least which can be obtained only from EPEL or CentOS extras, but ldirectord users have been already doing to collect such packages when they want to use it. I can provide a patch to the spec file if it's ok to build. Note that the (linux-ha) resource-agents should have been completely independent from libnet as of 1.0.4. Before that IPv6addr RA was the only dependency of libnet. Whops.. yes you are absolutely right. I got confused between the IPAddr and ldirectord. Yes you can either send me a patch, or I can do it. It´s really piece of cake. Ok, I would suggest the attached patch for resolving this particular issue, but I think there are still some issues left; 1) I'm wondering why this condition is needed; I think we can always use %{_var}/run/resource-agents in the current version. %if 0%{?fedora} = 11 || 0%{?centos_version} 5 || 0%{?rhel} 5 %dir %{_var}/run/heartbeat/rsctmp %else %dir %attr (1755, root, root) %{_var}/run/resource-agents %endif 2) duplicated man8/ldirectord.8.gz is included both in resource-agents and ldirectord packages. it should not be a big problem though. %{_mandir}/man8/*.8* (...) %{_mandir}/man8/ldirectord.8* 3) It can not build on RHEL5 with this error. I'd be glad if there is some kind of backward compatibility. %if 0%{?suse_version} == 0 0%{?fedora} == 0 0%{?centos_version} == 0 0%{?rhel} == 0 %{error:Unable to determine the distribution/version. This is generally caused by missing /etc/rpm/macros.dist. Please install the correct build packages or define the required macros manually.} Regards, -- Keisuke MORI diff --git a/resource-agents.spec.in b/resource-agents.spec.in index 8b39b3f..7dc6670 100644 --- a/resource-agents.spec.in +++ b/resource-agents.spec.in @@ -106,7 +106,6 @@ High Availability environment for both Pacemaker and rgmanager service managers. %if %{with linuxha} -%if 0%{?rhel} == 0 %package -n ldirectord License: GPLv2+ Summary: A Monitoring Daemon for Maintaining High Availability Resources @@ -136,7 +135,6 @@ lditrecord is simple to install and works with the heartbeat code See 'ldirectord -h' and linux-ha/doc/ldirectord for more information. %endif -%endif %prep %if 0%{?suse_version} == 0 0%{?fedora} == 0 0%{?centos_version} == 0 0%{?rhel} == 0 @@ -194,11 +192,6 @@ make install DESTDIR=%{buildroot} rm -rf %{buildroot}/usr/share/doc/resource-agents %if %{with linuxha} -%if 0%{?rhel} != 0 -# ldirectord isn't included on RHEL -find %{buildroot} -name 'ldirectord.*' -exec rm -f {} \; -find %{buildroot} -name 'ldirectord' -exec rm -f {} \; -%endif %if 0%{?suse_version} test -d %{buildroot}/sbin || mkdir %{buildroot}/sbin @@ -270,7 +263,6 @@ rm -rf %{buildroot} %{_libdir}/heartbeat/findif %{_libdir}/heartbeat/tickle_tcp -%if 0%{?rhel} == 0 %if 0%{?suse_version} %preun -n ldirectord %stop_on_removal ldirectord @@ -303,7 +295,6 @@ rm -rf %{buildroot} /usr/lib/ocf/resource.d/heartbeat/ldirectord %endif %endif -%endif %changelog * @date@ Autotools generated version nob...@nowhere.org - @version@-@specver@-@numcomm@.@alphatag@.@dirty@ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [ha-wg-technical] resource agents 3.9.1rc1 release
2011/6/8 Digimer li...@alteeve.com: On 06/08/2011 09:48 AM, Florian Haas wrote: I realize I'm bikeshedding, but my preference would be for 3.9 for this one, and 4.0 to implement the new standard. Like Fabio originally suggested. Cheers, Florian Given that x.0 has long meant new stuff, I'd like to stick with the 3.9.x. About the bikeshed's color:) I don't mind either one, I just wanted to know what's the reason behind and now all clear for me. Thank,s -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] nginx resource agent
Hi Alan, 2011/1/2 Alan Robertson al...@unix.sh: On 12/14/2010 02:42 AM, Dejan Muhamedagic wrote: # # I'm not convinced this is a wonderful idea (AlanR) # for sig in SIGTERM SIGHUP SIGKILL do if pgrep -f $NGINXD.*$CONFIGFILE/dev/null then pkill -$sig -f $NGINXD.*$CONFIGFILE/dev/null ocf_log info nginxd children were signalled ($sig) sleep 1 else break fi done Can't recall anymore the details, there was a bit of discussion on the matter a few years ago, but NTT insisted on killing httpd children. Or do you mind the implementation? Hi Dejan, I know it's been a long time. Sorry about that. If I _hated_ the idea, I would have left it out. It definitely leaves me feeling a bit unsettled. If it causes a problem, it will no doubt eventually show up. It looks like it's just masking a bug in Apache - that is, that giving it a shutdown request doesn't really work... The relevant discussion is this: http://www.gossamer-threads.com/lists/linuxha/dev/44395#44395 http://developerbugs.linux-foundation.org//show_bug.cgi?id=1800 The intention of the code is to allow to restart the service if the Apache main process was failed in some reason (maybe a bug in Apache, maybe the OOM killer or whatever). It's not for masking a bug in Apache - it's just trying to clean up and continue the service without the manual intervention as possible. Perhaps I shouldn't have kept it in the nginx code - since it does seem to be a bit specific to some circumstance in Apache... On the other hand, it shouldn't hurt anything either... You may want to see what happens if the nginx process was accidentally killed. I'm not familiar with nginx at all, but in the case of Apache, the children would keep running and they prevent to restart another Apache instance until you kill all the orphaned processes manually. If nginx is a single process application, then I think that the code should not be necessary. -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker
2010/11/14 Lars Ellenberg lars.ellenb...@linbit.com: On Tue, Nov 09, 2010 at 06:06:30PM +0900, Keisuke MORI wrote: Ok, then let's just drop the changeset. I agree that srand should not be called many times, but I would rather prefer to just keep the existing behavior since there have been no problems with that so far. Ok, I'll revert it for now. Thanks, I confirmed that the problem went away. But I'd rather have it working there. Would this patch to the cib do the right thing? The patch actually didn't work. I've looked into the code more and now I realized that the existence of the mainloop is not the issue here; the g_main_loop_is_running() _always_ fails when NULL is passed. glue/lib/clplumbing/cl_random.c: static void get_more_random(void) { if (randgen_scheduled || IS_QUEUEFULL) { return; } if (g_main_loop_is_running(NULL)) { randgen_scheduled = TRUE; Gmain_timeout_add_full(G_PRIORITY_LOW+1, 10, add_a_random, NULL, NULL); } } By looking at the source code of glib it looks like this: http://git.gnome.org/browse/glib/tree/glib/gmain.c#n3157 gboolean g_main_loop_is_running (GMainLoop *loop) { g_return_val_if_fail (loop != NULL, FALSE); g_return_val_if_fail (g_atomic_int_get (loop-ref_count) 0, FALSE); return loop-is_running; } I'm wondering the 'get_more_random()' logic had ever worked before. So the proper fix here would be, in my opinion, just removing the 'get_more_random()' logic in the cluster-glue code. It does not make sense for me that the g_mainloop is required just for getting a random value:) The Heartbeat code should still support the current version of cluster-glue, so I think that the current code in the repository is just good for the coming 3.0.4. Any other backlogs to release the heartbeat packeage? I would look forward to be released it soon! Me too. Alas ... We try to get it out until next Friday (19th November) Great! Thank you for all your effort for the release! -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker
Lars, 2010/10/27 Lars Ellenberg lars.ellenb...@linbit.com: On Mon, Oct 25, 2010 at 08:21:26PM +0900, Keisuke MORI wrote: Hi, The recent heartbeat on the tip would cause an assertion fail in pacemaker-1.0 and generate a core: (snip) I don't care for the get_more_random() stuff and keeping 100 random values prepared for get_next_random, that is probably just academic sugar, anyways. If it does not work, we throw it all out, or fix it. Ok, then let's just drop the changeset. I agree that srand should not be called many times, but I would rather prefer to just keep the existing behavior since there have been no problems with that so far. I object to calling srand many times. Actually we should only call it once, we still call it in too many places. I found the get_next_random() function to apparently properly wrap around a static int inityet and do the srand only once, so I just used it. Would it help to call g_main_loop_new() earlier? Can we more cleanly catch the no GMainLoop there yet in get_more_random()? Should we just drop get_next_random() from cl_rand_from_interval? Or drop it altogether along with get_more_random and its static array -- it's not as if generating random numbers was performance critical in any way, is it. It could possibly help, but I don't think it's worth to do it right now. Any other backlogs to release the heartbeat packeage? I would look forward to be released it soon! Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH] Heartbeat: remove an assertion fail in pacemaker
Hi, The recent heartbeat on the tip would cause an assertion fail in pacemaker-1.0 and generate a core: {{{ Oct 25 17:15:08 srv02 cib: [31333]: ERROR: crm_abort: crm_glib_handler: Forked child 31338 to record non-fatal assert at utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed Oct 25 17:15:08 srv02 cib: [31333]: ERROR: crm_abort: crm_glib_handler: Forked child 31339 to record non-fatal assert at utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed Oct 25 17:15:11 srv02 crmd: [31337]: ERROR: crm_abort: crm_glib_handler: Forked child 31341 to record non-fatal assert at utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed Oct 25 17:15:11 srv02 crmd: [31337]: ERROR: crm_abort: crm_glib_handler: Forked child 31342 to record non-fatal assert at utils.c:449 : g_main_loop_is_running: assertion `loop != NULL' failed }}} This seems introduced by the following changeset: http://hg.linux-ha.org/dev/rev/231b0b8555be The stack trace and my suggested patch are attached. The changeset in question had changed to use get_next_random() here which eventually calls g_main_loop_is_running() but it may fail because g_main_loop is not initialized yet in cib/crmd. My suggested patch would just revert the old behavior but only changes the delay as 50ms. Thanks, -- Keisuke MORI (gdb) where #0 0x00669410 in __kernel_vsyscall () #1 0x00692df0 in raise () from /lib/libc.so.6 #2 0x00694701 in abort () from /lib/libc.so.6 #3 0x00c0d82f in crm_abort (file=0xc26955 utils.c, function=0xc26dda crm_glib_handler, line=449, assert_condition=0x8933d58 g_main_loop_is_running: assertion `loop != NULL' failed, do_core=1, do_fork=1) at utils.c:1382 #4 0x00c09f05 in crm_glib_handler (log_domain=0x167686 GLib, flags=G_LOG_LEVEL_CRITICAL, message=0x8933d58 g_main_loop_is_running: assertion `loop != NULL' failed, user_data=0x0) at utils.c:449 #5 0x00143b67 in g_logv () from /lib/libglib-2.0.so.0 #6 0x00143d39 in g_log () from /lib/libglib-2.0.so.0 #7 0x00143e1b in g_return_if_fail_warning () from /lib/libglib-2.0.so.0 #8 0x0013981b in g_main_loop_is_running () from /lib/libglib-2.0.so.0 #9 0x00880811 in get_more_random () at cl_random.c:95 #10 0x00880945 in cl_init_random () at cl_random.c:128 #11 0x00880644 in gen_a_random () at cl_random.c:68 #12 0x00880896 in get_next_random () at cl_random.c:106 #13 0x00fdbabb in get_clientstatus (lcl=0x8931bd8, host=0x0, clientid=0x805b779 cib, timeout=-1) at client_lib.c:974 #14 0x080557ee in cib_init () at main.c:461 #15 0x08054c4b in main (argc=1, argv=0xbfcd6124) at main.c:218 (gdb) # HG changeset patch # User Keisuke MORI kskm...@intellilink.co.jp # Date 1288003477 -32400 # Node ID 96b67422b12814f64dc7dd61c670801c7ba213b6 # Parent 82fc843fbcf9733e50bbc169c95e51b6c7f97c54 Medium: reduce max delay in get_client_status (revised 231b0b8555be) revert the old code to avoid calling g_main_loop_is_running() which may fail when used in Pacemaker cib/crmd. diff -r 82fc843fbcf9 -r 96b67422b128 lib/hbclient/client_lib.c --- a/lib/hbclient/client_lib.c Mon Oct 04 22:12:37 2010 +0200 +++ b/lib/hbclient/client_lib.c Mon Oct 25 19:44:37 2010 +0900 @@ -966,16 +966,6 @@ get_nodesite(ll_cluster_t* lcl, const ch * Return the status of the given client. */ -#ifndef HAVE_CL_RAND_FROM_INTERVAL -/* you should grab latest glue headers! */ -static inline int cl_rand_from_interval(const int a, const int b) -{ - /* RAND_MAX may be INT_MAX, or (b-a) may be huge. */ - long long r = get_next_random(); - return a + (r * (b-a) + RAND_MAX/2)/RAND_MAX; -} -#endif - static const char * get_clientstatus(ll_cluster_t* lcl, const char *host , const char *clientid, int timeout) @@ -1027,8 +1017,9 @@ get_clientstatus(ll_cluster_t* lcl, cons * in a 100-node cluster, the max delay is 5 seconds */ num_nodes = get_num_nodes(lcl); - max_delay = num_nodes * 5; - delay = cl_rand_from_interval(0, max_delay); + max_delay = num_nodes * 5; /* in microsecond*/ + srand(cl_randseed()); + delay = (1.0* rand()/RAND_MAX)*max_delay; if (ANYDEBUG){ cl_log(LOG_DEBUG, Delaying cstatus request for %d ms, delay/1000); } ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Next release from Linux-HA? (was: [PATCH] IPv6addr: removing libnet dependency)
Hi Lars, We have talked about the next release of heartbeat/resource-agents packages while ago. As Pacemaker-1.0.10 is about to release soon, I think it's good time to release those packages now too for the best use of Pacemaker. I think that heartbeat-3.0.4 / resource-agents-1.0.4 should be released at least because it has already been 6 months since the last release. How do you think about it and when can we release the packages? Regards, Keisuke MORI 2010/7/27 Lars Ellenberg lars.ellenb...@linbit.com: On Tue, Jul 27, 2010 at 04:12:34PM +0900, Keisuke MORI wrote: 2010/7/27 Keisuke MORI keisuke.mori...@gmail.com: 2010/7/26 Lars Ellenberg lars.ellenb...@linbit.com: On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote: Heartbeat does not have many changes (appart from some cleanup in the build dependencies), so there is no urge to release a 3.0.4, but we could do so any time. (...) For heartbeat, I personally like pacemaker on in ha.cf :) I should have mentioned this too, the version number in the log file from heartbeat 3.0.3 seems incorrect. I want to fix this soon to avoid confusion. Jul 20 14:08:50 srv01 heartbeat: [6299]: info: Configuration validated. Starting heartbeat 3.0.2 Yes, I know. Not a problem. Needs to be changed in configure.ac before the 3.0.4 release. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency
2010/8/11 Simon Horman ho...@verge.net.au: http://hg.linux-ha.org/agents/rev/612e2966f372 I've had to commit a small revision, because on IA64, the memory on the stack is not aligned properly for the cast to struct nd_neighbor_advert * - http://hg.linux-ha.org/agents/rev/d206bc8f1303 I apologize for the uglyness; it was the only way I could make gcc shutup and get the alignment right. If someone can make the alignment properly on the stack, I'm all ears ... You are right, that is a bit ugly. But I have no better ideas at this time :-( How about this patch or along this line? It assumes GCC but ICC should have a similar feature if you want to support it. Alternatively, having an union with an u_int8_t array and a struct should make an alignment correctly, I think. -- Keisuke MORI # HG changeset patch # User Keisuke MORI kskm...@intellilink.co.jp # Date 1281491442 -32400 # Node ID b12ca86af66197498cbf537ccc7ad4ff56cdf63b # Parent d206bc8f13039b332e76a93a86e8e550b67781da [mq]: ipv6addr-alignment.patch diff -r d206bc8f1303 -r b12ca86af661 heartbeat/IPv6addr.c --- a/heartbeat/IPv6addr.c Mon Aug 09 21:51:19 2010 +0200 +++ b/heartbeat/IPv6addr.c Wed Aug 11 10:50:42 2010 +0900 @@ -89,7 +89,6 @@ #include stdio.h #include stdlib.h -#include malloc.h #include unistd.h #include sys/types.h #include sys/socket.h @@ -424,10 +423,17 @@ int ifindex; int hop; struct ifreq ifr; - u_int8_t *payload; - intpayload_size; - struct nd_neighbor_advert *na; - struct nd_opt_hdr *opt; + + /* GCC is assumed. + * If you want to port to other than GCC, make sure that + * the packet is packed correctly. + */ + struct neighbor_advert { + struct nd_neighbor_advert na; + struct nd_opt_hdr opt; + u_int8_t hwaddr[HWADDR_LEN]; + } __attribute__ ((packed)) payload; + struct sockaddr_in6 src_sin6; struct sockaddr_in6 dst_sin6; @@ -473,39 +479,27 @@ } /* build a neighbor advertisement message */ - payload_size = sizeof(struct nd_neighbor_advert) - + sizeof(struct nd_opt_hdr) + HWADDR_LEN; - payload = memalign(sysconf(_SC_PAGESIZE), payload_size); - if (!payload) { - cl_log(LOG_ERR, malloc for payload failed); - goto err; - } - memset(payload, 0, payload_size); + memset((void *)payload, 0, sizeof(payload)); - /* Ugly typecast from ia64 hell! */ - na = (struct nd_neighbor_advert *)((void *)payload); - na-nd_na_type = ND_NEIGHBOR_ADVERT; - na-nd_na_code = 0; - na-nd_na_cksum = 0; /* calculated by kernel */ - na-nd_na_flags_reserved = ND_NA_FLAG_OVERRIDE; - na-nd_na_target = *src_ip; + payload.na.nd_na_type = ND_NEIGHBOR_ADVERT; + payload.na.nd_na_code = 0; + payload.na.nd_na_cksum = 0; /* calculated by kernel */ + payload.na.nd_na_flags_reserved = ND_NA_FLAG_OVERRIDE; + payload.na.nd_na_target = *src_ip; /* options field; set the target link-layer address */ - opt = (struct nd_opt_hdr *)(payload + sizeof(struct nd_neighbor_advert)); - opt-nd_opt_type = ND_OPT_TARGET_LINKADDR; - opt-nd_opt_len = 1; /* The length of the option in units of 8 octets */ - memcpy(payload + sizeof(struct nd_neighbor_advert) - + sizeof(struct nd_opt_hdr), - ifr.ifr_hwaddr.sa_data, HWADDR_LEN); + payload.opt.nd_opt_type = ND_OPT_TARGET_LINKADDR; + payload.opt.nd_opt_len = 1; /* The length of the option in units of 8 octets */ + memcpy(payload.hwaddr, ifr.ifr_hwaddr.sa_data, HWADDR_LEN); /* sending an unsolicited neighbor advertisement to all */ memset(dst_sin6, 0, sizeof(dst_sin6)); dst_sin6.sin6_family = AF_INET6; inet_pton(AF_INET6, BCAST_ADDR, dst_sin6.sin6_addr); /* should not fail */ - if (sendto(fd, payload, payload_size, 0, + if (sendto(fd, (void *)payload, sizeof(payload), 0, (struct sockaddr *)dst_sin6, sizeof(dst_sin6)) - != payload_size) { + != sizeof(payload)) { cl_log(LOG_ERR, sendto(%s) failed: %s, if_name, strerror(errno)); goto err; @@ -515,7 +509,6 @@ err: close(fd); - free(payload); return status; } ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency
2010/7/27 Andrew Beekhof and...@beekhof.net: On Tue, Jul 27, 2010 at 8:44 AM, Keisuke MORI keisuke.mori...@gmail.com wrote: For heartbeat, I personally like pacemaker on in ha.cf :) One thing thats coming in 1.1.3 is an mcp (master control process) and associated init script for pacemaker. This means that Pacemaker is started/stopped independently of the messaging layer. Currently this is only written for corosync[1], but I've been toying with the idea of extending it to Heartbeat. In which case, if you're already changing the option, you might want to make it: legacy on/off Where off would be the equivalent of starting with -M (no resource management) but wouldn't spawn any daemons. Thoughts? I have a several concerns with that change, 1) Is it possible to recover or cause a fail-over correctly when any of the Pacemaker/Heartbeat process was failed? (In particular, for the failure of the new mcp process of pacemaker and for the current heartbeat's MCP process failure) 2) Would the daemons used with respawn directive such as hbagent(SNMP daemon) or pingd work as compatible? 3) After all, what would be the benefit for end users with the change? I feel like it's only adding some complexity to the operations and the diagnostics by the end users. I guess that I would only use legacy on on the heartbeat stack... -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency
2010/7/26 Lars Ellenberg lars.ellenb...@linbit.com: On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote: By the way, do we have any plan to release the next agents/glue/heartbeat packages from the Linux-HA project? I think it's good time to consider them for the best use of pacemaker-1.0.9. I think glue was released by dejan just before he went on vacation, though the release announcement is missing (1.0.6). Heartbeat does not have many changes (appart from some cleanup in the build dependencies), so there is no urge to release a 3.0.4, but we could do so any time. Agents has a few fixes, but also has some big changes. I have to take an other close look, but yes, I think we should release an agents 1.0.4 within the next few weeks. Great! Then let's go for the next release for agents/heartbeat along with glue. My most concern about agents is LF#2378: http://developerbugs.linux-foundation.org/show_bug.cgi?id=2378 It is a change but it's a necessary change to make the maintenance mode work fine. For heartbeat, I personally like pacemaker on in ha.cf :) find_if for IPv6 is also missing if you want to write a script based one. I'm sure that can be scripted itself around ip -o -f inet6 a s | grep ... but we already sort of agreed that this would not be development time well spent. find_if does more than just grepping. It has to do matching against the network address calculated from the given address and the prefix to find out which interface would be appropriate to be assigned the virtual address. The current IPaddr2 also relies on find_if to do this. But anyway, I would also agree that we are not going to develop such. Just off topic. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency
2010/7/27 Keisuke MORI keisuke.mori...@gmail.com: 2010/7/26 Lars Ellenberg lars.ellenb...@linbit.com: On Mon, Jul 26, 2010 at 06:39:50PM +0900, Keisuke MORI wrote: Heartbeat does not have many changes (appart from some cleanup in the build dependencies), so there is no urge to release a 3.0.4, but we could do so any time. (...) For heartbeat, I personally like pacemaker on in ha.cf :) I should have mentioned this too, the version number in the log file from heartbeat 3.0.3 seems incorrect. I want to fix this soon to avoid confusion. Jul 20 14:08:50 srv01 heartbeat: [6299]: info: Configuration validated. Starting heartbeat 3.0.2 Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency
Hi, 2010/7/23 Lars Ellenberg lars.ellenb...@linbit.com: On Fri, Jul 23, 2010 at 03:04:20PM +0200, Andrew Beekhof wrote: On Fri, Jul 23, 2010 at 5:09 AM, Simon Horman ho...@verge.net.au wrote: Hi Mori-san, I will add that libnet seems to be more or less unmaintained. Someone recently picked it up again, but I'm in favor of the patch for the reasons Mori-san already stated. You seem to make using libnet optional, is there a reason not to just remove it? portability? Agreed, lets just drop it. Ack. Thanks to Simon, Andrew and Lars for all of your constructive comments. I've revised the patch so that it drops the old libnet code completely. Please apply this into the repository. By the way, do we have any plan to release the next agents/glue/heartbeat packages from the Linux-HA project? I think it's good time to consider them for the best use of pacemaker-1.0.9. BTW, is it correct that most of it could be done by ip, similar as IPaddr2 does it? The only think missing would be a send_arp v6. Anyone want to write an IPv6addr2? ;-) find_if for IPv6 is also missing if you want to write a script based one. Thanks, -- Keisuke MORI # HG changeset patch # User Keisuke MORI kskm...@intellilink.co.jp # Date 1280134509 -32400 # Branch ipv6 # Node ID 275089e31232b870e4218f7dd930538daa438cbf # Parent b3142fd9cc672f2217e632608bc986b46265b193 IPv6addr: remove libnet dependency diff -r b3142fd9cc67 -r 275089e31232 configure.in --- a/configure.in Fri Jul 16 09:46:38 2010 +0200 +++ b/configure.in Mon Jul 26 17:55:09 2010 +0900 @@ -634,7 +634,7 @@ dnl dnl * Check for netinet/icmp6.h to enable the IPv6addr resource agent AC_CHECK_HEADERS(netinet/icmp6.h,[],[],[#include sys/types.h]) -AM_CONDITIONAL(USE_IPV6ADDR, test $ac_cv_header_netinet_icmp6_h = yes -a $new_libnet = yes ) +AM_CONDITIONAL(USE_IPV6ADDR, test $ac_cv_header_netinet_icmp6_h = yes ) dnl dnl Compiler flags diff -r b3142fd9cc67 -r 275089e31232 heartbeat/IPv6addr.c --- a/heartbeat/IPv6addr.c Fri Jul 16 09:46:38 2010 +0200 +++ b/heartbeat/IPv6addr.c Mon Jul 26 17:55:09 2010 +0900 @@ -87,13 +87,22 @@ #include config.h +#include stdio.h #include stdlib.h +#include unistd.h #include sys/types.h +#include sys/socket.h #include netinet/icmp6.h +#include arpa/inet.h /* for inet_pton */ +#include net/if.h /* for if_nametoindex */ +#include sys/ioctl.h +#include sys/stat.h +#include fcntl.h #include libgen.h #include syslog.h +#include signal.h +#include errno.h #include clplumbing/cl_log.h -#include libnet.h #define PIDFILE_BASE HA_RSCTMPDIR /IPv6addr- @@ -141,6 +150,8 @@ const int UA_REPEAT_COUNT = 5; const int QUERY_COUNT = 5; +#define HWADDR_LEN 6 /* mac address length */ + struct in6_ifreq { struct in6_addr ifr6_addr; uint32_t ifr6_prefixlen; @@ -401,69 +412,100 @@ } /* Send an unsolicited advertisement packet - * Please refer to rfc2461 + * Please refer to rfc4861 / rfc3542 */ int send_ua(struct in6_addr* src_ip, char* if_name) { int status = -1; - libnet_t *l; - char errbuf[LIBNET_ERRBUF_SIZE]; + int fd; - struct libnet_in6_addr dst_ip; - struct libnet_ether_addr *mac_address; - char payload[24]; int ifindex; + int hop; + struct ifreq ifr; + u_int8_t payload[sizeof(struct nd_neighbor_advert) + + sizeof(struct nd_opt_hdr) + HWADDR_LEN]; + struct nd_neighbor_advert *na; + struct nd_opt_hdr *opt; + struct sockaddr_in6 src_sin6; + struct sockaddr_in6 dst_sin6; - - if ((l=libnet_init(LIBNET_RAW6, if_name, errbuf)) == NULL) { - cl_log(LOG_ERR, libnet_init failure on %s, if_name); + if ((fd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) == 0) { + cl_log(LOG_ERR, socket(IPPROTO_ICMPV6) failed: %s, + strerror(errno)); goto err; } /* set the outgoing interface */ ifindex = if_nametoindex(if_name); - if (setsockopt(libnet_getfd(l), IPPROTO_IPV6, IPV6_MULTICAST_IF, + if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_IF, ifindex, sizeof(ifindex)) 0) { - cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_IF): %s, + cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_IF) failed: %s, strerror(errno)); goto err; } - - mac_address = libnet_get_hwaddr(l); - if (!mac_address) { - cl_log(LOG_ERR, libnet_get_hwaddr: %s, errbuf); + /* set the hop limit */ + hop = 255; /* 255 is required. see rfc4861 7.1.2 */ + if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS, + hop, sizeof(hop)) 0) { + cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_HOPS) failed: %s, + strerror(errno)); + goto err; + } + + /* set the source address */ + memset(src_sin6, 0, sizeof(src_sin6)); + src_sin6.sin6_family = AF_INET6; + src_sin6.sin6_addr = *src_ip; + src_sin6.sin6_port = 0; + if (bind(fd, (struct sockaddr *)src_sin6, sizeof(src_sin6)) 0) { + cl_log(LOG_ERR, bind() failed: %s, strerror(errno)); goto err; } - dst_ip
Re: [Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency
Hi, 2010/7/23 Simon Horman ho...@verge.net.au: I will add that libnet seems to be more or less unmaintained. You seem to make using libnet optional, is there a reason not to just remove it? portability? I just thought that some people may want to preserve the existing behavior. OpenSUSE has libnet shipped with it for example, and I'm not sure if they would agree to change the implementation or just want to keep using libnet. But ok, If no one has objections I'll revise the patch so that it removes all libnet code from IPv6addr.c and make it single code. Any other opinions? As for portability, I believe that the new implementation is more portable than using libnet. (cf. http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034#c10) +#define HWADDR_LEN 6 /* mac address length */ Personally I'd prefer the define outside of the function. Ok, I just wanted to place them closely but no strong preference. I'll move it to somewhere around the other macro definitions. + na-nd_na_target = (*src_ip); There is no need to enclose *src_ip in brackets. Right. removing the parens. + if (sendto(fd, payload, sizeof(payload), 0, + (struct sockaddr *)dst_sin6, sizeof(dst_sin6)) + != sizeof(payload)) { Is it valid to assume that there will never be a partial write? I think that reporting an error is just enough when a partial write occurred here. The packet is very small (32 bytes) and it should rarely happen, it will be retried 5 times when it occurred, and if it still fails then it should be considered that a really bad things happened:-) And also the current libnet code does exactly same as above inside so no behavior would be changed with this code. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH] IPv6addr: removing libnet dependency
The attached patch is to remove libnet dependency from IPv6addr RA by replacing the same functionality using the standard socket API. Currently there are following problems with resource-agents package: - IPv6addr RA requires an extra libnet package on the run-time environment. That is pretty inconvenient particularly for RHEL users because it's not included in the standard distribution. - The pre-built RPMs from ClusterLabs does not include IPv6addr RA. This was once reported in the pacemaker list: http://www.gossamer-threads.com/lists/linuxha/pacemaker/64295#64295 The patch will resolve those issues. I believe that none of Pacemaker/Heartbeat related packages would be depending on libnet library any more after patched. Regards, -- Keisuke MORI # HG changeset patch # User Keisuke MORI kskm...@intellilink.co.jp # Date 1279802861 -32400 # Branch ipv6 # Node ID 40d5dbdca9cc089b6514c7525cd2dbd678299711 # Parent b3142fd9cc672f2217e632608bc986b46265b193 IPv6addr: remove libnet dependency diff -r b3142fd9cc67 -r 40d5dbdca9cc configure.in --- a/configure.in Fri Jul 16 09:46:38 2010 +0200 +++ b/configure.in Thu Jul 22 21:47:41 2010 +0900 @@ -607,6 +607,7 @@ [new_libnet=yes; AC_DEFINE(HAVE_LIBNET_1_1_API, 1, Libnet 1.1 API)], [new_libnet=no; AC_DEFINE(HAVE_LIBNET_1_0_API, 1, Libnet 1.0 API)],$LIBNETLIBS) AC_SUBST(LIBNETLIBS) + AC_DEFINE(HAVE_LIBNET_API, 1, Libnet API) fi if test $new_libnet = yes; then @@ -634,7 +635,7 @@ dnl dnl * Check for netinet/icmp6.h to enable the IPv6addr resource agent AC_CHECK_HEADERS(netinet/icmp6.h,[],[],[#include sys/types.h]) -AM_CONDITIONAL(USE_IPV6ADDR, test $ac_cv_header_netinet_icmp6_h = yes -a $new_libnet = yes ) +AM_CONDITIONAL(USE_IPV6ADDR, test $ac_cv_header_netinet_icmp6_h = yes ) dnl dnl Compiler flags diff -r b3142fd9cc67 -r 40d5dbdca9cc heartbeat/IPv6addr.c --- a/heartbeat/IPv6addr.c Fri Jul 16 09:46:38 2010 +0200 +++ b/heartbeat/IPv6addr.c Thu Jul 22 21:47:41 2010 +0900 @@ -87,13 +87,25 @@ #include config.h +#include stdio.h #include stdlib.h +#include unistd.h #include sys/types.h +#include sys/socket.h #include netinet/icmp6.h +#include arpa/inet.h /* for inet_pton */ +#include net/if.h /* for if_nametoindex */ +#include sys/ioctl.h +#include sys/stat.h +#include fcntl.h #include libgen.h #include syslog.h +#include signal.h +#include errno.h #include clplumbing/cl_log.h +#ifdef HAVE_LIBNET_API #include libnet.h +#endif #define PIDFILE_BASE HA_RSCTMPDIR /IPv6addr- @@ -400,8 +412,11 @@ return OCF_NOT_RUNNING; } +#ifdef HAVE_LIBNET_API /* Send an unsolicited advertisement packet * Please refer to rfc2461 + * + * Libnet based implementation. */ int send_ua(struct in6_addr* src_ip, char* if_name) @@ -466,6 +481,108 @@ libnet_destroy(l); return status; } +#else /* HAVE_LIBNET_API */ +/* Send an unsolicited advertisement packet + * Please refer to rfc4861 / rfc3542 + * + * Libnet independent implementation. + */ +int +send_ua(struct in6_addr* src_ip, char* if_name) +{ + int status = -1; + int fd; + + int ifindex; + int hop; + struct ifreq ifr; +#define HWADDR_LEN 6 /* mac address length */ + u_int8_t payload[sizeof(struct nd_neighbor_advert) + + sizeof(struct nd_opt_hdr) + HWADDR_LEN]; + struct nd_neighbor_advert *na; + struct nd_opt_hdr *opt; + struct sockaddr_in6 src_sin6; + struct sockaddr_in6 dst_sin6; + + if ((fd = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6)) == 0) { + cl_log(LOG_ERR, socket(IPPROTO_ICMPV6) failed: %s, + strerror(errno)); + goto err; + } + /* set the outgoing interface */ + ifindex = if_nametoindex(if_name); + if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_IF, + ifindex, sizeof(ifindex)) 0) { + cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_IF) failed: %s, + strerror(errno)); + goto err; + } + /* set the hop limit */ + hop = 255; /* 255 is required. see rfc4861 7.1.2 */ + if (setsockopt(fd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS, + hop, sizeof(hop)) 0) { + cl_log(LOG_ERR, setsockopt(IPV6_MULTICAST_HOPS) failed: %s, + strerror(errno)); + goto err; + } + + /* set the source address */ + memset(src_sin6, 0, sizeof(src_sin6)); + src_sin6.sin6_family = AF_INET6; + src_sin6.sin6_addr = *src_ip; + src_sin6.sin6_port = 0; + if (bind(fd, (struct sockaddr *)src_sin6, sizeof(src_sin6)) 0) { + cl_log(LOG_ERR, bind() failed: %s, strerror(errno)); + goto err; + } + + + /* get the hardware address */ + memset(ifr, 0, sizeof(ifr)); + strncpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name) - 1); + if (ioctl(fd, SIOCGIFHWADDR, ifr) 0) { + cl_log(LOG_ERR, ioctl(SIOCGIFHWADDR) failed: %s, strerror(errno)); + goto err; + } + + /* build a neighbor advertisement message */ + memset(payload, 0, sizeof(payload)); + + na = (struct nd_neighbor_advert *)payload; + na-nd_na_type = ND_NEIGHBOR_ADVERT; + na
Re: [Linux-HA] linux-ha 3.0.3 + SNMP
Hi, The SNMP subagent has been moved to the Pacemaker GUI package: http://hg.clusterlabs.org/pacemaker/pygui/ (I haven't use it with the recent version of heartbeat-3.* though) Also the correct configure option is --enable-snmp-subagent --enable-snmp option should only affect to some stonith agents, which is in the glue package now, so I think any SNMP related options are meaningless to the current heartbeat-3.* package. 2010/5/19 Patrice Laramee patrice.lara...@imetrik.com: Hi Florian, I did it with the two hyphens. It was a copy/paste error for the email. If I want to do SNMP monitoring for the 3.0.0 branch, I have to use Pacemaker? Thanks, -Pat -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Florian Haas Sent: May-18-10 11:19 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] linux-ha 3.0.3 + SNMP On 2010-05-17 22:48, Patrice Laramee wrote: Hi, I've been trying to compile heartbeat with SNMP support. It did compile fine, but I cannot find the binary 'hbagent'. Was this binary removed from this version? o ./ConfigureMe configure -enable-snmp Are you aware that this should be --enable-snmp (two hyphens, not one)? -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-ha-dev] [Pacemaker] Known problem with IPaddr(2)
Hi, Regarding to the discussion in the pacemaker ML below, I would suggest a patch as attached. The patch includes: 1) Fix IPaddr to return the correct OCF value (It returned 255 when delete_interface failed). 2) Add a description about the assumption in IPaddr / IPaddr2 meta-data. Regards, Keisuke MORI 2010/4/14 Lars Ellenberg lars.ellenb...@linbit.com: On Tue, Apr 13, 2010 at 08:28:09PM +0200, Lars Ellenberg wrote: On Tue, Apr 13, 2010 at 12:10:18PM +0200, Dejan Muhamedagic wrote: Hi, On Mon, Apr 12, 2010 at 05:26:19PM +0200, Markus M. wrote: Markus M. wrote: is there a known problem with IPaddr(2) when defining many (in my case: 11) ip resources which are started/stopped concurrently? Don't remember any problems. Well... some further investigation revealed that it seems to be a problem with the way how the ip addresses are assigned. When looking at the output of ip addr, the first ip address added to the interface gets the scope global, all further aliases gets the scope global secondary. If afterwards the first ip address is removed before the secondaries (due to concurrently run of the scripts), ALL secondaries are removed at the same time by the ip command, leading to an error for all subsequent trials to remove the other ip addresses because they are already gone. I am not sure how ip decides for the secondary scope, maybe beacuse the other ip addresses are in the same subnet as the first one. That sounds bad. Instances should be independent of each other. Can you please open a bugzilla and attach a hb_report. Oh, that is perfectly expected the way he describes it. The assumption has always been that there is at least one normal, not managed by crm, address on the interface, so no one will have noticed before. I suggest the following patch, basically doing one retry. For the described scenario, the second try will find the IP already non existant, and exit $OCF_SUCCESS. Though that obviously won't make instances independent. The typical way to achieve that is to have them all as secondary IPs. Which implies that for successful use of independent IPaddr2 resources on the same device, you need at least one system IP (as opposed to managed by cluster) on that device. The first IP assigned will get primary status. Usually, if you delete a primary IP, the kernel will also delete all secondary IP addresses. If using a system IP is not an option, here is the alternative: Recent kernels (a quick check revealed that this setting is around since at least 2.6.12) can do alias promotion, which can be enabled using sysctl -w net.ipv4.conf.all.promote_secondaries=1 (or per device) In both cases the previously retry on ip_stop patch is unnecesssary. But won't do any harm, either. Most likely ;-) Glad that helped ;-) Somebody please add that to the man page respectively agent meta data... -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: pacema...@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf -- Keisuke MORI agents-ipaddr-retval.patch Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Memory leaks in lrmd/cl_msg
Hi, lrmd in glue-1.0.3 has a memory leakage. To be exact, the leakage is in the cl_msg library. Please find the detail on the bugzilla item: http://developerbugs.linux-foundation.org/show_bug.cgi?id=2389 Note that the leakage must have been existing since the old heartbeat-2.1.4 because the code around here has not been changed quite a while. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Pseudo RAs do not work properly on Corosync stack
Hi, 2010/3/24 Andrew Beekhof and...@beekhof.net: We'd need to coordinate this with all projects (corosync, pacemaker, heartbeat, glue, agents). That would probably be the most difficult part. Currently the ais plugin has: mkdir(HA_STATE_DIR/heartbeat, 0755); /* Used by RAs - Leave owned by root */ mkdir(HA_STATE_DIR/heartbeat/rsctmp, 0755); /* Used by RAs - Leave owned by root */ When you make the change, please also put it in a #define that pacemaker can look for during configure. That way I can default to the above if I can't find it. If you do that then upgrading should be pretty trivial. OK, I will look into it when making changes. I filed a bugzilla item for this issue: http://developerbugs.linux-foundation.org/show_bug.cgi?id=2378 Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Pseudo RAs do not work properly on Corosync stack
Hi, Sorry for a bit long mail. I'm going to describe the issue of the Subject: and would like to suggest some changes to the agents package (and possibly Pacemaker, too). I would be grad if you could give me your thought and comments. A pseudo RA which creates a stat file under HA_RSCTMP (/var/run/heartbeat/rsctmp), such as Dummy, MailTo, etc. do not work properly on the Pacemaker+Corosync stack. When a node crashed and was rebooted, a stale stat file is left over the reboot and hence the RA misbehaves as if the resource was already started when the cluster is launched again for the recovery. This problem does not occur on Heartbeat stack because Heartbeat removes HA_RSCTMP when its startup, while on Pacemaker stack none of Pacemaker/Corosync removes it. But removing them by Pacemaker does not seem to be correct - if they were removed at the cluster startup time then the maintenance mode would no longer work properly. In my understanding, the correct behavior is: - They should NOT be removed at the cluster startup time. - They should be removed at the OS bootup time. My suggestion to address this issue is, to fix as the following; - 1) change the HA_RSCTMP location to /var/run/resource-agents, or wherever a subdirectory right under /var/run. - 2) having the directory permission as 01777 (with sticky bit) - 3) change IPaddr/SendArp RA not to use its own subdirectory but instead, add a prefix for the filename. - 4) make /var/run/heartbeat/rsctmp as obsolete; Heartbeat/Pacemaker could preserve the current behavior for a while for the compatibility. The basic idea of the changes is that, we're now going to follow the file removal procedure defined by FHS(Filesystem Hierarchy Standard). http://www.pathname.com/fhs/pub/fhs-2.3.html#VARRUNRUNTIMEVARIABLEDATA FHS defines that any files under a subdirectory of /var/run should be removed at the OS bootup time. Unfortunately the second level subdirectory is out of the scope and you can not rely on the removal (and that's the case of /var/run/heartbeat/rsctmp). I believe that the impacts for existing RAs are minimum. If your RA is implemented correctly then you need to do nothing - just notice that the location of the stat file is changed. If your RA has hardcoded /var/run/heartbeat/rsctmp, or it creates its own subdirectory, it is encouraged to fix because it may not work well with the maintenance mode, but you can continue to use the old rsctmp if you would like. I would like to hear your thought and comments. Regards, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Linux-HA site down?
Hi, http://www.linux-ha.org/ seems down today. Maintenance ? or something trouble? Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] An OCF RA for syslog-ng
Hi Dejan, Do you have any chance to take a look at the syslog-ng OCF RA which was posted by Takenaka-san before? http://www.gossamer-threads.com/lists/linuxha/dev/54425 If you are OK, I will commit this to the -dev repository. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] OCF Script for Jboss
Hi, I'm posting an OCF RA for JBoss, which was originally posted by Stefan to the users list, and includes some modifications as suggested by Takenaka-san: http://www.gossamer-threads.com/lists/linuxha/users/53969 Stefan, Do you have any comment on this modification? Dejan, Would you please review this RA if you have any chance? If you are all OK, I will commit the RA to the -dev repository. Thanks, -- Keisuke MORI jboss Description: Binary data ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] An OCF RA for syslog-ng
Hi Dejan, Thank you for your comments. I will repost the RA after I revise it with your comments. Thanks, 2009/6/11 Dejan Muhamedagic deja...@fastmail.fm: Hi Keisuke-san, On Thu, Jun 11, 2009 at 06:16:26PM +0900, Keisuke MORI wrote: Hi Dejan, Do you have any chance to take a look at the syslog-ng OCF RA which was posted by Takenaka-san before? http://www.gossamer-threads.com/lists/linuxha/dev/54425 Attaching the script with comments. Please use diff. Cheers, Dejan If you are OK, I will commit this to the -dev repository. Thanks, -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] OCF Script for Jboss
Hi, I'm posting an OCF RA for JBoss, which was originally posted by Stefan to the users list, and includes some modifications as suggested by Takenaka-san: http://www.gossamer-threads.com/lists/linuxha/users/53969 Stefan, Do you have any comment on this modification? Dejan, Would you please review this RA if you have any chance? If you are all OK, I will commit the RA to the -dev repository. Thanks, -- Keisuke MORI jboss Description: Binary data ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] IPv6addr with prefixes longer than 64
Hi, 2009/6/4 Rob Gallagher robert.gallag...@heanet.ie: Running the resource manually gives: r...@charlene:/etc/ha.d# /etc/ha.d/resource.d/IPv6addr 2001:770:18:2:0:0:c101:db4a/128/eth0 start 2009/06/04_10:15:50 ERROR: Generic error ERROR: Generic error Didn't you get an error log something like this (in ha-log or syslog)? I saw this in my reproduce test when I specify /128 prefix. Jun 9 12:49:15 pacifica IPv6addr: [8640]: ERROR: no valid mecahnisms It means that the RA could not find a proper network interface, and it most likely be a configuration error. However if I change the prefix to /64 it is added without error: If your network is setup as /64 prefix, then you should specify /64 to the parameter of IPv6addr. By the way, specifying an interface (i.e. /eth0) is not supported yet in IPv6addr. (would be ignored) Hope it helps. Thanks, -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-ha-dev] Checksum not computed in ICMPv6 neighbor advertisement
Hi, 2009/6/5 Dejan Muhamedagic deja...@fastmail.fm: Hi Andre, On Fri, Jun 05, 2009 at 09:34:37AM +, Andre, Pascal wrote: Hi, On an Active/Standby platform (using Linux-HA 2.1.4 RHEL5, in my case), when a fail-over/switch-over is initiated and standby machine takes over the virtual IP (IPv6), IPv6addr broadcasts an ICMPv6 neighbor advertisement message. Unfortunately, this ICMPv6 message has its checksum field set to 0 (i.e. not computed). The message is thus discarded by recipients. Maybe this computation should be done by libnet itself. Unfortunately, without much time to investigate libnet, I've added code in resources/OCF/IPv6addr.c in order to compute the checksum and provide the result to libnet (as a parameter). Applied. Many thanks for the patch. That problem was already fixed in: http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034 so the patch should not be necessary. -- Keisuke MORI ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] re:A patch of tomcat.
Hi Dejan, Thank you for reviewing it. Commited the revised patch by Yamauchi-san (tomcat.patch-0225) as: http://hg.linux-ha.org/dev/rev/6cbdca48bf88 Thanks, Dejan Muhamedagic deja...@fastmail.fm writes: Hi, On Tue, Feb 24, 2009 at 12:20:22PM +0900, Keisuke MORI wrote: Hi, Will anybody review this patch? I was just reviewing it. I can commit it to the -dev if there're no comments. The patch was well tested and is used with tomcat 5.5 in our environment. Great. I'm attaching a patch which contains just a few minor optimizations and some meta-data updates. Please apply it after checking it with your tomcat (no tomcats here :). Cheers, Dejan Thanks, renayama19661...@ybb.ne.jp writes: Hi All, The patch which solved a new problem was completed. The change is the following point. 1. Addition of the comment. 2. Deletion of the garbage in the log. 3. Optional addition. * catalina_opts - CATALINA_OPTS environment variable. Default is None * catalina_rotate_log - Control catalina.out logrotation flag. Default is NO. * catalina_rotatetime - catalina.out logrotation time span(seconds). Default is 86400. 4. I summarized redundant pgrep processing in one function. 5. Revised it so that pgrep was handled in a version of new tomcat definitely. * The new version of tomcat confirmed that there was not a problem with 5.5.27 and version 6.0.28. 6. For unity, I revised it to use $WGET of ocf_shellfunc. I attached a patch. Please reflect it in a development version. Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: Hi, Sorry There was a problem to the patch which I attached. When used latest tomcat, RA seem not to be able to handle it well. I will send the patch which I revised later. Best Regards, Hideo Yamauchi. -- Keisuke MORI Open Source Business Unit Software Services Integration Business Division NTT DATA Intellilink Corporation Tel: +81-3-3534-4810 / Fax: +81-3-3534-4814 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] re:A patch of tomcat.
Hi, Will anybody review this patch? I can commit it to the -dev if there're no comments. The patch was well tested and is used with tomcat 5.5 in our environment. Thanks, renayama19661...@ybb.ne.jp writes: Hi All, The patch which solved a new problem was completed. The change is the following point. 1. Addition of the comment. 2. Deletion of the garbage in the log. 3. Optional addition. * catalina_opts - CATALINA_OPTS environment variable. Default is None * catalina_rotate_log - Control catalina.out logrotation flag. Default is NO. * catalina_rotatetime - catalina.out logrotation time span(seconds). Default is 86400. 4. I summarized redundant pgrep processing in one function. 5. Revised it so that pgrep was handled in a version of new tomcat definitely. * The new version of tomcat confirmed that there was not a problem with 5.5.27 and version 6.0.28. 6. For unity, I revised it to use $WGET of ocf_shellfunc. I attached a patch. Please reflect it in a development version. Best Regards, Hideo Yamauchi. --- renayama19661...@ybb.ne.jp wrote: Hi, Sorry There was a problem to the patch which I attached. When used latest tomcat, RA seem not to be able to handle it well. I will send the patch which I revised later. Best Regards, Hideo Yamauchi. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Sincerely, Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] IPv6addr heartbeat
Mariusz Blank mariuszbl...@alcatel-lucent.com writes: hto-mapfuncs: line 52: 20709 Aborted $__SCRIPT_NAME start 2009/02/09_16:25:16 ERROR: Unknown error: 134 ERROR: Unknown error: 134 # ./resource.d/IPv6addr 2000:0:0:0:0:0:0:C/122/bond1 status 2009/02/09_16:25:24 INFO: Running OK INFO: Running OK Do you know what is wrong with it? It probably be the same problem to: http://developerbugs.linux-foundation.org/show_bug.cgi?id=2034 Could you try this patch? http://hg.linux-ha.org/dev/rev/673f32858223 -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-ha-dev] A STONITH plugin for checking whether the target node is kdumping or not.
Hi Lars, When we discussed about this feature at the Cluster Summit, you mentioned that there're some issues in stonithd regarding to the STONITH escalation. Could you summarise the issues again please? And if you have any particular test cases that may not work well in your mind, we will add the test cases and try to fix it. As long as we've tested so far it seems working fine as expected, though. Regars, Keisuke MORI Satomi TANIGUCHI [EMAIL PROTECTED] writes: Hi lists, I'm posting a STONITH plugin which checks whether the target node is kdumping or not. There are some steps to use this, but I believe this plugin is helpful for failure analysis. See attached README for details about how to use this. There are 2 patches. The patch named kdumpcheck.patch is for Linux-HA-dev(1eae6aaf1af8). And the patch named mkdumprd_for_kdumpcheck.patch is for mkdumprd version 5.0.39. If you're interested in, please give me your comments. Any comments and suggestions are really appreciated. Best Regards, Satomi TANIGUCHI -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC
Andrew, Thanks for fixing it! With my quick try, it seems working fine. I (and a colleague of mine) now continue to test to make sure that everything works fine. Thanks, Andrew Beekhof [EMAIL PROTECTED] writes: Fixed in: http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/2d516888d27c 2008/8/15 Keisuke MORI [EMAIL PROTECTED]: But I've got PE crash now when I used with clone resources... I think the following is the correct fix, but i need to do some more testing I've pushed that fix for the fatal assert to both the lha-2.1 tree and the openSUSE build service. I look forward to hearing from Keisuke-san whether this works for them now! It does not seem to be fixed right. It does not cause an assertion failure any more (neither crash ;-), but an invalid clone resource is appeared. Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC
Keisuke MORI [EMAIL PROTECTED] writes: Andrew, Thanks for fixing it! With my quick try, it seems working fine. I (and a colleague of mine) now continue to test to make sure that everything works fine. Just for making sure... Our tests regarding to clone groups has passed without any problem. Thank you again for the fix! And also I would like to say thank you for _everybody_ who helped for the release in various way. Thank you very much! Thanks, Andrew Beekhof [EMAIL PROTECTED] writes: Fixed in: http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/2d516888d27c 2008/8/15 Keisuke MORI [EMAIL PROTECTED]: But I've got PE crash now when I used with clone resources... I think the following is the correct fix, but i need to do some more testing I've pushed that fix for the fatal assert to both the lha-2.1 tree and the openSUSE build service. I look forward to hearing from Keisuke-san whether this works for them now! It does not seem to be fixed right. It does not cause an assertion failure any more (neither crash ;-), but an invalid clone resource is appeared. Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI Open Source Business Unit Software Services Integration Business Division NTT DATA Intellilink Corporation Tel: +81-3-3534-4810 / Fax: +81-3-3534-4814 ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC
Lars Marowsky-Bree [EMAIL PROTECTED] writes: On 2008-08-15T17:52:42, Keisuke MORI [EMAIL PROTECTED] wrote: More precisely, we once tried to use clones with 2.1.3 in production but had to suspend to use it because there were some problems. Now we want to upgrade it to the coming 2.1.4 with using clones. _Clones_ by themselves work fine, but cloned groups are the issue. You can work around this by not using them ;-) We assume that we would use cloned groups as well, and therefore we've been doing our tests with a configuration using cloned groups. (and we didn't expect that those behaves differently ;-) -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] duplicate resource active in 2.1.4-RC
Lars Marowsky-Bree [EMAIL PROTECTED] writes: On 2008-08-15T11:55:35, Keisuke MORI [EMAIL PROTECTED] wrote: I look forward to hearing from Keisuke-san whether this works for them now! It does not seem to be fixed right. It does not cause an assertion failure any more (neither crash ;-), but an invalid clone resource is appeared. Ah, well. Then we'll have to wait for Andrew to fix it completely. Otherwise, the code looks fine here. Are you using cloned groups in production, btw? Yes. More precisely, we once tried to use clones with 2.1.3 in production but had to suspend to use it because there were some problems. Now we want to upgrade it to the coming 2.1.4 with using clones. -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] crm_mon doesn't exit immediately
Andrew, If there's no objection I would like to push this patch into the lha-2.1 repository, but any problem on that? It seems that the latest pacemaker also presents the same behavior so I think the both needs to be fixed as well. Thanks, Junko IKEDA [EMAIL PROTECTED] writes: Hi, I found that crm_mon which is included in Pacemaker-dev(2f2343008186) can be quitted by Ctrl + C. If a back port from Pacemaker to Heartbeat 2.1.4 is better than applying the patch, We don't care about how to fix this. Thanks, Junko Can somebody handle this issue? She said that, she couldn't quit crm_mon command with Ctrl+C. I usually use crm_mon with -i option, so I couldn't notice this behavior, but it sure is that crm_mon running with no option wouldn't be stopped by SIGINT. It's odd, right? I think almost all people would expect that Ctrl + C can stop this command. See attached her patch. Thanks, Junko I noticed that crm_mon doesn't exit immediately when it receive SIGINT in mainloop. It seems that SIGINT only kills sleep() function... (Is this caused by something in G_main_add_SignalHandler()? Or anything else?) So, I modified it to exit wait function when it is interrupted by a signal. This patch is for Heartbeat STABLE 2.1 (aae8d51d84ec). I hope it isn't too late for Heartbeat2.1.4... Regards, Satomi Taniguchi ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] Re: [Linux-HA] rsc_order constraints behavior changed?
Andrew, I'm also going to backport this fix into lha-2.1. If there's any problem could you please let me know. Thanks, Junko IKEDA [EMAIL PROTECTED] writes: If you don't want non_clone_group1 to be restarted when this happens, make the ordering constraint advisory-only by setting adding score=0 to the constraint. I tried this configuration, but non_clone_group1 was restarted when clone1 resources fail-count was cleared. you're right - this appears to be broken :( fixed in: http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/e4b49e9f957b Thanks a lot! We are planning to offer this function soon, so could you push this change into Heartbeat 2.1.4(Stable 2.1)? Thanks, Junko ___ Linux-HA mailing list [EMAIL PROTECTED] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] rsc_order constraints behavior changed?
Andrew, I'm also going to backport this fix into lha-2.1. If there's any problem could you please let me know. Thanks, Junko IKEDA [EMAIL PROTECTED] writes: If you don't want non_clone_group1 to be restarted when this happens, make the ordering constraint advisory-only by setting adding score=0 to the constraint. I tried this configuration, but non_clone_group1 was restarted when clone1 resources fail-count was cleared. you're right - this appears to be broken :( fixed in: http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/e4b49e9f957b Thanks a lot! We are planning to offer this function soon, so could you push this change into Heartbeat 2.1.4(Stable 2.1)? Thanks, Junko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-ha-dev] BasicSanityCheck fails in lha-2.1
Dejan, BasicSanityCheck fails by the permission test of RA because ocf-tester returns an error at below (line 175) if nobody user was not allowed to login. su nobody $agent $action /dev/null [EMAIL PROTECTED] su nobody /usr/lib/ocf/resource.d/heartbeat/Dummy meta-data This account is currently not available. [EMAIL PROTECTED] grep nobody /etc/passwd nobody:x:99:99:Nobody:/:/sbin/nologin How about to use the hacluster user instead as attached? Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation diff -r a8b2fc037b29 tools/ocf-tester.in --- a/tools/ocf-tester.in Thu Jul 17 17:01:29 2008 +0900 +++ b/tools/ocf-tester.in Tue Jul 29 19:58:04 2008 +0900 @@ -168,11 +168,11 @@ lrm_test_command() { test_permissions() { action=meta-data -msg=${1:-Testing permissions with uid nobody} +msg=${1:-Testing permissions with uid @HA_CCMUSER@} if [ $verbose -ne 0 ]; then echo $msg fi -su nobody $agent $action /dev/null +su @HA_CCMUSER@ $agent $action /dev/null } test_metadata() { ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] BasicSanityCheck fails in lha-2.1
Hi Dejan, Dejan Muhamedagic [EMAIL PROTECTED] writes: Hi Keisuke-san, On Tue, Jul 29, 2008 at 08:03:18PM +0900, Keisuke MORI wrote: Dejan, BasicSanityCheck fails by the permission test of RA because ocf-tester returns an error at below (line 175) if nobody user was not allowed to login. su nobody $agent $action /dev/null [EMAIL PROTECTED] su nobody /usr/lib/ocf/resource.d/heartbeat/Dummy meta-data This account is currently not available. [EMAIL PROTECTED] grep nobody /etc/passwd nobody:x:99:99:Nobody:/:/sbin/nologin How about to use the hacluster user instead as attached? That won't help. nobody was chosen because lrmd runs the meta-data action as nobody. The problem here is that su(1) requires a shell whereas lrmd doesn't. It looks like the -s option could help. Just pushed a patch. Could you please test it too. That works perfectly! Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] sfex
Hi Dejan Muhamedagic [EMAIL PROTECTED] writes: Hi Keisuke-san, On Tue, Jun 17, 2008 at 05:33:52PM +0900, Keisuke MORI wrote: Dejan, Thank you for taking care of it. Yes, NTT is very glad and agrees to include sfex into the heartbeat repository! Dejan Muhamedagic [EMAIL PROTECTED] writes: Hello, Since last year NTT designed and implemented sfex, a suite of programs to improve shared disk usage (see linux-ha.org/sfex) which unfortunately didn't attract attention it deserves. I reviewed the code and attached you'll find some comments and some simple changes. One general remark: all programs (sfex_*) are monolithic and, though they are not that big, it would be beneficial to code readers if they were split into more units/functions. That sounds reasonable. Where can I find your comments and modifications? A reasonable question :) Forgot to attach the file with comments. Sorry about that. It is in the form of a patch against version 1.3. Thanks, I will look into it. A couple of suggestions on making sfex useful in other contexts were making a quorum plugin and a HBcomm plugin. Did you investigate further these options? Yes we did but we think that those would be totally different approach from sfex. - a quorum plugin A quorum plugin is executed only on 'the cluster leader node' in CCM, I don't think so. CCM delivers connectivity and quorum information on each node. However, that's probably not relevant. and it does not care where the resource is running on, whereas sfex should run on the same node which the resource in question is running on because it's for the protection of the data which resides in the resource. In other words, sfex is to control with resource granularity, whereas a quorum plugin is to control 'the partition' granularity. Right. The point was however to use parts of sfex for the quorum functionality. I'll see if I can get back to you with a more detailed and specific proposal. I still don't understand you very well, sorry. I'd appreciate if you could explain more details. - HBcomm plugin I remember that somebody posted this before, called 'dskcm'. Somehow missed that one. This is also interesting idea but the approach is very different. This approach is: - having yet another redundant communication path through the shared medium. whereas sfex's approach is: - provide a protection method when ALL of the communication paths are failed. Even though they have the similar goal the functionality is very different. Yes. Though again sfex would need to be twisted a bit to provide heartbeats over shared storage. I'll take a look at dskcm. It was this: http://www.gossamer-threads.com/lists/linuxha/dev/39716#39716 Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] sfex
Dejan, Thank you for taking care of it. Yes, NTT is very glad and agrees to include sfex into the heartbeat repository! Dejan Muhamedagic [EMAIL PROTECTED] writes: Hello, Since last year NTT designed and implemented sfex, a suite of programs to improve shared disk usage (see linux-ha.org/sfex) which unfortunately didn't attract attention it deserves. I reviewed the code and attached you'll find some comments and some simple changes. One general remark: all programs (sfex_*) are monolithic and, though they are not that big, it would be beneficial to code readers if they were split into more units/functions. That sounds reasonable. Where can I find your comments and modifications? A couple of suggestions on making sfex useful in other contexts were making a quorum plugin and a HBcomm plugin. Did you investigate further these options? Yes we did but we think that those would be totally different approach from sfex. - a quorum plugin A quorum plugin is executed only on 'the cluster leader node' in CCM, and it does not care where the resource is running on, whereas sfex should run on the same node which the resource in question is running on because it's for the protection of the data which resides in the resource. In other words, sfex is to control with resource granularity, whereas a quorum plugin is to control 'the partition' granularity. - HBcomm plugin I remember that somebody posted this before, called 'dskcm'. This is also interesting idea but the approach is very different. This approach is: - having yet another redundant communication path through the shared medium. whereas sfex's approach is: - provide a protection method when ALL of the communication paths are failed. Even though they have the similar goal the functionality is very different. Of course, if you agree, we could include sfex into the heartbeat repository. Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] IPv6 HBcomm plugin
Andrew Beekhof [EMAIL PROTECTED] writes: On Tue, Jun 17, 2008 at 09:48, Keisuke MORI [EMAIL PROTECTED] wrote: Andrew Beekhof [EMAIL PROTECTED] writes: and in case anyone cares... the new pingd tool (the stand-alone version that supports both stacks) also supports IPv6 It's something I'm interested in... Do you have any plan when it will be available? Its already in pacemaker-dev (which I think you're testing already). It will also be part of 0.7 (unstable) which will be out this month. Ok, I didn't realize that it's already in there. I will take a look at it. -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [PATCH] IPv6 HBcomm plugin
Hi, I've been implementing HBcomm plugin to enable IPv6 communication among the cluster nodes and the ping nodes. It is still an experimental implementation and I would appreciate on any feedback. Thanks, The IPv6 HBcomm plugin usage 1. Building Apply the attached patch and do './ConfigureMe configure' and make. The patch is made against the dev branch at: changeset: 11945:5c915f1d5b7b It has built and tested on RHEL 5.1. 2. Configuration The following two directives are available in ha.cf: 1) mcast6 Use IPv6 multicast for the heartbeat communication between the nodes. The syntax is same as 'mcast'. Eg. mcast6 eth1 ff02::694 694 1 0 Note: Please choose a multicast address that available on your subnet. The address in the example is not officially registered to IANA. 2) ping6 Use an IPv6 address as a ping node. This is equivalent to 'ping' directive in IPv4. The syntax is also same as 'ping' except that you can specify a interface name for the address by concatenating with '%'. Eg. ping6 fe80::1:1%eth0 Note: the interface name (%eth0 above) is mandatory if you want to ping to a link-local address (by the design of IPv6). You can omit this part if you're pinging to a global address. 3. TODO / known issues - Still experimental status and not completely tested yet. Please test yourself and give me your feedback :-). - the 'ping_group' equivalent is not implemented. (is it possible to use an anycast address instead of this?) - ping6: the ioctl() to set a ICMPv6 filter fails. It can be ignored but preferable for the optimization. - mcast6: the allocated memory for the private area is never freed. It would not be a big problem but preferable to fix. Same in 'mcast'. -- Keisuke MORI NTT DATA Intellilink Corporation diff -r 5c915f1d5b7b lib/plugins/HBcomm/Makefile.am --- a/lib/plugins/HBcomm/Makefile.am Wed May 28 09:14:21 2008 +1000 +++ b/lib/plugins/HBcomm/Makefile.am Wed Jun 04 13:02:19 2008 +0900 @@ -46,6 +46,7 @@ halibdir = $(libdir)/@HB_PKG@ halibdir = $(libdir)/@HB_PKG@ plugindir = $(halibdir)/plugins/HBcomm plugin_LTLIBRARIES = bcast.la mcast.la ping.la serial.la ucast.la \ + mcast6.la ping6.la \ ping_group.la $(HBAPING) $(OPENAIS) $(TIPC) bcast_la_SOURCES = bcast.c @@ -80,3 +81,12 @@ tipc_la_SOURCES = tipc.c tipc_la_SOURCES = tipc.c tipc_la_LDFLAGS = -export-dynamic -module -avoid-version tipc_la_LIBADD = $(top_builddir)/replace/libreplace.la + +mcast6_la_SOURCES = mcast6.c +mcast6_la_LDFLAGS = -export-dynamic -module -avoid-version +mcast6_la_LIBADD = $(top_builddir)/replace/libreplace.la + +ping6_la_SOURCES = ping6.c +ping6_la_LDFLAGS = -export-dynamic -module -avoid-version +ping6_la_LIBADD = $(top_builddir)/replace/libreplace.la + diff -r 5c915f1d5b7b lib/plugins/HBcomm/mcast6.c --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/lib/plugins/HBcomm/mcast6.c Thu Jun 05 17:00:20 2008 +0900 @@ -0,0 +1,788 @@ +/* + * mcast6.c: implements hearbeat API for UDP/IPv6 multicast communication + * + * Author: Keisuke MORI [EMAIL PROTECTED] + * + * based on mcast.c written by the following authors. + * Copyright (C) 2000 Alan Robertson [EMAIL PROTECTED] + * Copyright (C) 2000 Chris Wright [EMAIL PROTECTED] + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + */ + +#include lha_internal.h +#include stdio.h +#include stdlib.h +#include unistd.h +#include errno.h +#include string.h +#include ctype.h +#include fcntl.h +#include sys/types.h +#include sys/socket.h +#include netinet/in.h +#include arpa/inet.h +#include net/if.h +#include sys/ioctl.h + +#ifdef HAVE_SYS_SOCKIO_H +# include sys/sockio.h +#endif + +#include HBcomm.h + +#define PIL_PLUGINTYPE HB_COMM_TYPE +#define PIL_PLUGINTYPE_SHB_COMM_TYPE_S +#define PIL_PLUGIN mcast6 +#define PIL_PLUGIN_Smcast6 +#define PIL_PLUGINLICENSE LICENSE_LGPL +#define PIL_PLUGINLICENSEURL URL_LGPL +#include pils/plugin.h +#include heartbeat.h + +struct mcast6_private { + char * interface; /* Interface name */ + char * mcastaddr; /* multicast address for IPv6
Re: [Linux-ha-dev] [RFC] heartbeat-2.1.4
Hi, How is the 2.1.4 release going? Will it be released soon? or any trouble with it? I look forward to see it! Thanks, Lars Marowsky-Bree [EMAIL PROTECTED] writes: Hi all, the Linux-HA project is undergoing some changes, as you've noticed. Not all of them have gone as well as expected, and it hasn't stabilized yet. Under guidance with Alan, the project members have met and decided to change the governance of the project in the future. This will be announced in more detail soon, stay tuned. We also want to likely make some further changing to the package layout, and understand that users, admins and distro maintainers dislike it when we do that, so we don't want to make it a habit. We recognize the needs of our users (I hope!) to receive timely updates, and thus have decided to go ahead and propose releasing one more 2.1.4 (following the 2.1.x package layout) as the last release of that branch before the restructuring kicks in completely. (When we decided to split off pacemaker, we didn't expect that this would cause the upstream Linux-HA project to cease releasing completely, and unfortunately there's been little discussion on the lists regarding this since.) For SLE10 SP2, it was already too late to change the package layout, so I've been backporting changes (which is quite easy with Mercurial) from the Pacemaker project, the GUI, and heartbeat-dev into the 2.1.x codebase, and done a fair amount of testing on x86, x86-64, s390x. However, I've been mostly focused on cherry picking what we (as in, Novell) needed, so in particular the packaging for non-SUSE dists is somewhat neglected in this version. If other distro maintainers would please help me with fixing up the packaging, and more community members would pound on it, I would really appreciate this. My proposal would be to release 2.1.4 by the end of next week (2008-04-18). (Mostly because after that I go on vacation ;-) I know this is a highly condensed schedule and doesn't follow any proper release methodology. The reasons for this in bullet points: - It's been too long since the last official gasp from the heartbeat project. The code we have is clearly better than 2.1.3, and we should get it to our users ASAP. - Novell has done a fair amount of testing on it already. The code is good (as in much better than 2.1.3), except the packaging. - The new governance will eventually decide on a new release methodology for the Linux-HA project, I expect, but this will take some more weeks, and I don't want to delay releasing even further. So, with the above reasoning, I'm volunteering myself - and hijacking the vacuum, I acknowledge - to do the 2.1.4 release, as the current split hasn't been adopted everywhere yet, 2.1.x is defunc, and our user community appears to need it now and not in several months. I'd plan on building the packages for all dists via OBS, if nobody holds any strong objections and update the DownloadSoftware page after we agree that the 2.1.4 release is good. And of course would be much approving of distro maintainers pulling it into their official distro repositories too! So, that said, I've pushed my proposed code to http://hg.linux-ha.org/lha-2.1/. It, for reasons outlined above, likely doesn't build yet (because the in-tree packaging is broken), but I wanted to share the scope of changes with you. As a further point of reference, I'm attaching the SLES changes section to this mail. (bnc# refers to bugzilla.novell.com.) Let me emphasize strongly that I really don't want to step on anyone's toes, or rush the new governance board, but only fill the current void until that is actually operational and has settled down, as I suggest our users need it. Please comment. Regards, Lars -- Team lead Kernel, SuSE Labs, Research and Development SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Re: [RFC] heartbeat-2.1.4
Hi, Andrew Beekhof [EMAIL PROTECTED] writes: On Wed, Apr 16, 2008 at 1:31 PM, HIDEO YAMAUCHI [EMAIL PROTECTED] wrote: Hi Andrew, I asked for the right function but the wrong frame number - I should have asked for frame 2. Sorry :( (gdb) frame 2 #2 0x00416c74 in stop_recurring_action_by_rsc (key=0x755f60, value=0x755f40, user_data=0x545a10) at lrm.c:1442 1442if(op-interval != 0 safe_str_eq(op-rsc_id, rsc-id)) { (gdb) print *rsc Variable rsc is not available. (gdb) print *op No symbol op in current context. Is what or my operation a mistake? Looks like gcc is being too clever for it's own good (by optimizing away some of the variables) :-( Can you try the following patch please? diff -r be12cb83cd2d crmd/lrm.c --- a/crmd/lrm.c Wed Apr 16 10:46:59 2008 +0200 +++ b/crmd/lrm.c Wed Apr 16 15:02:16 2008 +0200 @@ -1451,7 +1451,7 @@ stop_recurring_action_by_rsc(gpointer ke { lrm_rsc_t *rsc = user_data; struct recurring_op_s *op = (struct recurring_op_s*)value; - + crm_info(op-rsc=%s (%p), rsc=%s (%p), crm_str(op-rsc_id), op-rsc_id, crm_str(rsc-id), rsc-id); if(op-interval != 0 safe_str_eq(op-rsc_id, rsc-id)) { cancel_op(rsc, key, op-call_id, FALSE); } I think I found the cause of this issue. I attached the additional log with your patch (a bit different though) and the stacktrace. Here's my observation: - An element of pending_ops is removed at lrm.c:L497 - It is called inside from g_has_table_foreach() at L1475 - This is violating the usage of g_has_table_foreach() according to the glib manual. - Therefore the iteration can not proceed correctly and would try to refer to a removed element. http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c (...) 946 /* not doing this will block the node from shutting down */ 947 g_hash_table_remove(pending_ops, key); (...) 1475g_hash_table_foreach(pending_ops, stop_recurring_action_by_rsc, rsc); http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach (...) The hash table may not be modified while iterating over it (you can't add/remove items). I also attached my suggested patch, although I can not guarantee the correctness but just to show you the idea. Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ms-additional-log-20080422.tar.gz Description: ms-additional-log-20080422.tar.gz diff -r 333aef5bd4ed -r 36c0fd90691d crm/crmd/lrm.c --- a/crm/crmd/lrm.c Thu Apr 17 18:55:57 2008 +0200 +++ b/crm/crmd/lrm.c Tue Apr 22 17:48:47 2008 +0900 @@ -943,8 +943,9 @@ cancel_op(lrm_rsc_t *rsc, const char *ke if(key remove) { delete_op_entry(NULL, rsc-id, key, op); } + /* return FALSE to be removed from pending_ops */ /* not doing this will block the node from shutting down */ - g_hash_table_remove(pending_ops, key); + return FALSE; } return TRUE; @@ -954,15 +955,20 @@ gboolean cancel_done = FALSE; gboolean cancel_done = FALSE; lrm_rsc_t *cancel_rsc = NULL; -static void +static gboolean cancel_action_by_key(gpointer key, gpointer value, gpointer user_data) { struct recurring_op_s *op = (struct recurring_op_s*)value; if(safe_str_eq(op-op_key, cancel_key)) { cancel_done = TRUE; - cancel_op(cancel_rsc, key, op-call_id, TRUE); - } + if (!cancel_op(cancel_rsc, key, op-call_id, TRUE)) { + /* return TRUE to be removed from pending_ops */ + /* when the cancellation failed */ + return TRUE; + } + } + return FALSE; } static gboolean @@ -976,7 +982,7 @@ cancel_op_key(lrm_rsc_t *rsc, const char CRM_CHECK(key != NULL, return FALSE); - g_hash_table_foreach(pending_ops, cancel_action_by_key, NULL); + g_hash_table_foreach_remove(pending_ops, cancel_action_by_key, NULL); if(cancel_done == FALSE remove) { crm_err(No known %s operation to cancel, key); @@ -1433,15 +1439,21 @@ send_direct_ack(const char *to_host, con free_xml(update); } -static void +static gboolean stop_recurring_action_by_rsc(gpointer key, gpointer value, gpointer user_data) { lrm_rsc_t *rsc = user_data; struct recurring_op_s *op = (struct recurring_op_s*)value; if(op-interval != 0 safe_str_eq(op-rsc_id, rsc-id)) { - cancel_op(rsc, key, op-call_id, FALSE); - } + if (!cancel_op(rsc, key, op-call_id, FALSE)) { + /* return TRUE to be removed from pending_ops */ + /* when the cancellation failed */ + return TRUE; + } + } + + return FALSE; } void @@ -1472,7 +1484,7 @@ do_lrm_rsc_op(lrm_rsc_t *rsc, const char || crm_str_eq(operation, CRMD_ACTION_DEMOTE, TRUE) || crm_str_eq(operation, CRMD_ACTION_PROMOTE, TRUE) || crm_str_eq(operation, CRMD_ACTION_MIGRATE, TRUE)) { - g_hash_table_foreach(pending_ops, stop_recurring_action_by_rsc, rsc); + g_hash_table_foreach_remove(pending_ops, stop_recurring_action_by_rsc, rsc); } /* now do the op
Re: [Linux-ha-dev] Re: [RFC] heartbeat-2.1.4
Andrew Beekhof [EMAIL PROTECTED] writes: (snip) Here's my observation: - An element of pending_ops is removed at lrm.c:L497 - It is called inside from g_has_table_foreach() at L1475 - This is violating the usage of g_has_table_foreach() according to the glib manual. - Therefore the iteration can not proceed correctly and would try to refer to a removed element. Turns out that the Stateful resource in CTS was never getting promoted. Once I fixed this, I was able to trigger the bug too (in the last few minutes). A weird thing is that, it is not reproducable on every environments. As far as we've tested: - it _always_ happens on a RedHat 4 environment. - it has _never_ happened on a RedHat 5 environment. I'm not sure if it's the only difference but possibly the difference of the glib versions may be related to the behavior. Thanks for your diagnosis and the patch, you've certainly saved me some time :-) http://hg.linux-ha.org/lha-2.1/annotate/333aef5bd4ed/crm/crmd/lrm.c (...) 946 /* not doing this will block the node from shutting down */ 947 g_hash_table_remove(pending_ops, key); (...) 1475g_hash_table_foreach(pending_ops, stop_recurring_action_by_rsc, rsc); http://library.gnome.org/devel/glib/stable/glib-Hash-Tables.html#g-hash-table-foreach (...) The hash table may not be modified while iterating over it (you can't add/remove items). I also attached my suggested patch, although I can not guarantee the correctness but just to show you the idea. Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] Process monitor daemon (revised)
Hi Lars, Thank all of you for reviewing and making a suggestion. I think I understand your point as the Heartbeat architecture, but it would require re-writing the code almost all ;-) I will discuss with my colleagues about what we can do for procd as the next step. Lars Marowsky-Bree [EMAIL PROTECTED] writes: On 2008-02-27T20:39:13, Keisuke MORI [EMAIL PROTECTED] wrote: Hi Keisuke-san, thanks for your patch and contribution. I have to apologize in the name of everyone for the late feedback. I really appreciate the idea of monitoring processes directly, and receiving async failure notifications to reduce fail-over times. I have just discussed this with Dejan and Andrew, and we think that the best path forward, alas necessary before inclusion, is to - Make procd independent of Pacemaker. It should talk only to the RAs and the LRM. - RAs should sign in with it for the processes they want monitored, instead of listing the processes in the procd configuration section (means it gets decoupled from the CIB further). The RAs could write a record to /var/run/heartbeat/procd/resource-id, for example. The RAs would add/remove the required processes on start/promote or demote/stop. (So procd itself would not need to be master-slave.) I'm afraid that having users manually specify process lists in the CIB really is not workable - the users will not be able to get this right. - Instead of respawning procd, there should be a resource agent which starts/stops (and monitors!) procd. You already have one, but why doesn't it go into resources/OCF/ ? We've only thought to use procd by respawning so far and we didn't have a such RA yet. - procd should talk to the LRM to insert a fake failed resource action, which would then cause the CRM/PE to handle the resource as failed and initiate recovery. (This is not currently possible with the LRM client library; you could exec crm_resource -F, which would mean you no longer have a build-time dependency on the CRM.) - This would have the advantage of decoupling procd from pacemaker as well as heartbeat. It could be included with the LRM/RA package build, and possibly be useful with other cluster managers too. I think all that would help simplify the code. +#define RSCID_LEN 128 /* ref. include/lrm/lrm_api.h */ +#define MAX_PID_LEN256 /* ref. lrm/lrmd/lrmd.h */ +#define MAX_LISTEN_NUM 10 /* ref. lib/clplumbing/ipcsocket.c */ If you're referencing from other include files, please do include the includes as to avoid diverging header definitions. Right. Regards, Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] brocade fencing anyone?
Hi, Johan Hoeke [EMAIL PROTECTED] writes: Dejan Muhamedagic wrote: Hi, On Fri, Feb 22, 2008 at 05:59:23PM +0100, Johan Hoeke wrote: LS, Anybody here using some kind of brocade fencing with heartbeat like RedHat offers in its cluster software? I found this reference: http://linux.die.net/man/8/fence_brocade It turns out to be the attached perl script. Got it from http://mirror.centos.org/centos/4/csgfs/i386/RPMS/fence-1.32.50-2.el4.centos.1.i686.rpm Would it be possible to use this as an external stonith script? No, because stonith is about fencing nodes and this would be fencing resources. Point taken! Fencing nodes by isolating I/O is very interesting idea though. I think that right now the only way would be to implement an RA which would fence resource. That's what Junko Ikeda and NTT people did: http://lists.linux-ha.org/pipermail/linux-ha/2007-October/028388.html Good stuff, thanks you for pointing it out. And mr Ikeda and NTT for sharing! I don't know why their code was not included in Heartbeat. This is an important issue, so it should get more attention. Agreed! What can I do to make it included in Heartbeat? (we call it as SF-EX) I and my colleagues would be really happy if it is included as a standard component in Heartbeat and available for everyone. BTW, she is Ms. Ikeda. ;-) -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Basic SNMP/Linux HA question
Hi, Mike Toler [EMAIL PROTECTED] writes: I don't know if I'm an idiot, have failed to compile the load correctly, or just don't have the secret handshake down correctly, but either way, I am unable to query any statistics from Linux HA using snmp. I've read the README file in the snmp-subagent directory in the source, and I *THINK* I've followed the directions. 1.Install Net-SNMP 2.Verify snmp queries to server work. (I have setup 'pass' scripts for DRBD and NFS to query counters, so net-snmp is working on my system). 3.download Linux-HA source 4.run ./ConfigureMe configure --enable-snmp-subagent 5.make and make install 6.Start Linux-HA using service heartbeat start What am I missing Do I need to specifically start 'hbagent' somewhere? Am I missing something from the /etc/ha.d/ha.cf file? Does anyone have any debugging tips that I can use to try and isolate where my disconnect is? Do you have added a line like this in your ha.cf? respawn root /usr/lib/heartbeat/hbagent -r 5 (I realized now that this is not mentioned in the README clearly.) Providing your ha.cf and the logs would be more helpful. -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] 2.1.3 RPM names
Hi, Thank you very much for the great Christmas present for us ;-) I've noticed that the RPM names in 2.1.3 for i386 have been changed on the official download web site. http://linux-ha.org/download/index.html#2.1.3 pils-2.1.3-1.fc7.i386.rpm stonith-2.1.3-1.fc7.i386.rpm Is there any reason why the change was made? Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] IPaddr: netmask or cidr_netmask?
Lars, Thank you for your answer. I will use cidr_netmask from now on. Lars Marowsky-Bree [EMAIL PROTECTED] writes: On 2007-12-14T14:54:38, Keisuke MORI [EMAIL PROTECTED] wrote: IPaddr RA has two kinds of parameter to specify the netmask: netmask and cidr_netmask. Which one is officially supported and recommended to use? The fact that only the cidr_netmask is in the metadata is a pretty big clue. ;-) Yes. I should have believed documents. ;-) Regards, Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] IPaddr: netmask or cidr_netmask?
Hi, IPaddr RA has two kinds of parameter to specify the netmask: netmask and cidr_netmask. Which one is officially supported and recommended to use? From the mail archive below, I thought that cidr_netmask was wrong, http://www.gossamer-threads.com/lists/linuxha/dev/36035#36035 but on the other hand, the GUI can handle only cidr_netmask (because the IPaddr meta-data contains only cidr_netmask). As long as I tried both works the same; both can take either CIDR form (e.g. 24) or the dot-notation (255.255.255.0), but I want to always use the official one. Thanks, -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-ha-dev] [PATCH] SNMP subagent syslog fix
Hi, The attached patch fixes the SNMP subagent so that it obeys the syslog policy of heartbeat; 1) use logd if it's enabled. 2) the default syslog facility is taken from the configure option as well as lrmd, mgmtd, etc. The current SNMP subagent produces its logs always into LOG_USER which is hard-coded. This is not good. This patch can be applied solely (i.e. independent from the SNMP extention for V2), so please consider including this patch into 2.1.3. Regards, -- Keisuke MORI NTT DATA Intellilink Corporation diff -r 0890907b816f snmp_subagent/hbagent.c --- a/snmp_subagent/hbagent.c Tue Dec 11 01:10:53 2007 +0100 +++ b/snmp_subagent/hbagent.c Tue Dec 11 17:08:47 2007 +0900 @@ -562,7 +562,10 @@ init_heartbeat(void) hb = NULL; cl_log_set_entity(lha-snmpagent); - cl_log_set_facility(LOG_USER); + cl_log_set_facility(HA_LOG_FACILITY); + + /* Use logd if it's enabled by heartbeat */ + cl_inherit_logging_environment(0); hb = ll_cluster_new(heartbeat); ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] SNMP subagent syslog fix
Hi, Dejan Muhamedagic [EMAIL PROTECTED] writes: Hi, On Tue, Dec 11, 2007 at 08:26:52PM +0900, Keisuke MORI wrote: Hi, The attached patch fixes the SNMP subagent so that it obeys the syslog policy of heartbeat; 1) use logd if it's enabled. 2) the default syslog facility is taken from the configure option as well as lrmd, mgmtd, etc. The current SNMP subagent produces its logs always into LOG_USER which is hard-coded. This is not good. This patch can be applied solely (i.e. independent from the SNMP extention for V2), so please consider including this patch into 2.1.3. Thanks for the patch. I can recall vaguely seeing the problem, perhaps I even filed a bugzilla for it. Or something. My memory isn't in the best shape today. By grep'ing the source, there are still some hard-coded LOG_USER. Do they also need to be fix? In particular, send_arp.c, cl_status.c, xml_diff.c, lrmadmin.c are visible to end users, I think. $ hg id 885e02e00632 tip $ grep -R cl_log_set_facility * | grep LOG_USER crm/pengine/ptest.c:cl_log_set_facility(LOG_USER); crm/admin/xml_diff.c: cl_log_set_facility(LOG_USER); fencing/test/apitest.c: cl_log_set_facility(LOG_USER); heartbeat/libnet_util/send_arp.c:cl_log_set_facility(LOG_USER); lib/hbclient/api_test.c:cl_log_set_facility(LOG_USER); lib/clplumbing/netstring_test.c:cl_log_set_facility(LOG_USER); lrm/admin/lrmadmin.c: cl_log_set_facility(LOG_USER); lrm/test/apitest.c: cl_log_set_facility(LOG_USER); membership/ccm/ccm_testclient.c:cl_log_set_facility(LOG_USER); telecom/apphbd/apphbd.c:cl_log_set_facility(LOG_USER); telecom/apphbd/apphbtest.c: cl_log_set_facility(LOG_USER); telecom/recoverymgrd/recoverymgrd.c:cl_log_set_facility(LOG_USER); tools/cl_status.c: cl_log_set_facility(LOG_USER); -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
[Linux-ha-dev] [Q] group resources and unmanaged status
Andrew, Can I ask a question about the internal status of the PE? My SNMP subagent code is using cluster_status(pe_working_set_t) to analyze the current status of resources like crm_mon. When a parent resource(group/clone/master) is unmanaged, 'running_on' and 'allowed_nodes' member of resource_t gets NULL. Is this an expected value? or any intention about this? If the parent resource was managed, those members have node values according to its children. In the case of a child resource(primitive), those members always contain node values no matter if it's in managed or unmanaged. My SNMP subagent has a minor problem for displaying the status of an unmanaged group resource and I'm now looking into how should I fix it. Thanks, Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: AW: [Linux-ha-dev] Call for testers: 2.1.3
Spindler Michael [EMAIL PROTECTED] writes: Hi, This problem has been solved. My packaging box didn't have all necessary packages for building GUI rpm. When I added them it was able to build haclinet (GUI) and that find-lang.sh tool worked fine. I didn't find the problem with pegasus on my CentOS 5.0 but I have 32 bit version, and the problem was reported for 64 bit. OK. So, this step should only be included if --enable-mgmt, I guess? Right. It establish language settings for the GUI, so it's not needed if GUI isn't needed. We are trying to build it on RedHat(Red Hat Enterprise Linux ES release 4 (Nahant Update 4)), and a problem remains before us. Please check Mori-san's patch again. http://developerbugs.linux-foundation.org//attachment.cgi?id=1109 -if test x${CIMOM} = x; then -if test x${CIMOM} = x; then -AC_CHECK_PROG([CIMOM], [cimserver], [pegasus]) +if test x${enable_cim_provider} = xyes; then # maybe, here # +if test x${CIMOM} = x; then +if test x${CIMOM} = x; then I attached the configure.log fyi: I was able to build the rpms on RedHat AS 4 without any problems. The error above should occur only when tog-pegasus packages has been installed on your RedHat. I thought that tog-pegasus is installed by default on RedHat ES 4... -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Q] group resources and unmanaged status
Andrew Beekhof [EMAIL PROTECTED] writes: On Dec 7, 2007, at 11:56 AM, Keisuke MORI wrote: Andrew, Can I ask a question about the internal status of the PE? My SNMP subagent code is using cluster_status(pe_working_set_t) to analyze the current status of resources like crm_mon. When a parent resource(group/clone/master) is unmanaged, 'running_on' and 'allowed_nodes' member of resource_t gets NULL. Is this an expected value? I thought that group/clone/master always had NULL... since they can be running on more than one node (especially clone and m/s resources) Judging from the output of the SNMP agent, two pair of a clone and a primitive are observed and each of the parent clone has the node which its child primitive is running on. Maybe my code is doing something wrong, I'll check it again. I recall also doing something special for unmanaged resources but I can probably change that behavior for you. that said, it would be better to use the recently added API call: node_t *(*location)(resource_t *, GListPtr*, gboolean); eg. node_t *native_location(resource_t *rsc, GListPtr *list, gboolean current) Thanks, I will look into this. -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] Call for testers: 2.1.3
Hi, The problem is reported in the bugzilla #1662. Please see my comment and a patch at comment #6 and #8. http://developerbugs.linux-foundation.org/show_bug.cgi?id=1662#c6 Thanks, Keisuke MORI Dejan Muhamedagic [EMAIL PROTECTED] writes: Hi, On Thu, Dec 06, 2007 at 10:54:36AM +1100, Amos Shapira wrote: On 06/12/2007, Alan Robertson [EMAIL PROTECTED] wrote: We are in the final weeks of testing for release 2.1.3 - which has been delayed to the week of Dec 19. Trying to do make rpm on CentOS 5 I get the following error: gmake[1]: Leaving directory `/root/Downloads/heartbeat/Heartbeat-Testing-d8d7ce11fbad/contrib' find heartbeat-2.1.3 -type d ! -perm -777 -exec chmod a+rwx {} \; -o \ ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \ ! -type d ! -perm -400 -exec chmod a+r {} \; -o \ ! -type d ! -perm -444 -exec /bin/sh /root/Downloads/heartbeat/Heartbeat-Testing-d8d7ce11fbad/install-sh -c -m a+r {} {} \; \ || chmod -R a+r heartbeat-2.1.3 tardir=heartbeat-2.1.3 /bin/sh /root/Downloads/heartbeat/Heartbeat-Testing-d8d7ce11fbad/missing --run tar chof - $tardir | GZIP=--best gzip -c heartbeat-2.1.3.tar.gz { test ! -d heartbeat-2.1.3 || { find heartbeat-2.1.3 -type d ! -perm -200 -exec chmod u+w {} ';' rm -fr heartbeat-2.1.3; }; } /usr/bin/rpmbuild -ta heartbeat-2.1.3.tar.gz /dev/null; error: Macro %CMPI_PROVIDER_DIR has empty body sh: line 0: fg: no job control error: Failed build dependencies: pegasus is needed by heartbeat-2.1.3-2.x86_64 make: *** [rpm] Error 1 Funny, can't find pegasus neither on suse nor on debian. It's got do with CIM though. There is no such package pegasus in CentOS 5. I tried installing tog-pegasus but I'm not sure it's even related and it didn't help. http://rpmfind.net/linux/rpm2html/search.php?query=tog-pegasus-cimserver That one? You got the same error? Care to post your config.log? Thanks, Dejan Please help us test this upcoming new release! Would love to if I can sneak in tests in my schedule - especially if it'll help me get heartbeat 2 running on my CentOS 5 Xen guests. Thanks for all your work. --Amos ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-ha-dev] [patch] Fix potential memory leaks in the HB client library
Dejan, Dejan Muhamedagic [EMAIL PROTECTED] writes: Hi, On Tue, Oct 30, 2007 at 08:53:54PM +0900, Keisuke MORI wrote: Hi, I've been testing the heartbeat with valgrind enabled, and found that it reported a couple of leaks which are in the heartbeat API client library. I'm submitting my proposed patch to fix them, so could somebody please review it and the correctness? In my understanding, these leaks are not so serious because the leaks only happens when the heartbeat exits, but it may be a problem if a HB client does signon()/signoff()/delete() repeatedly in a single process. (omit) Your patch is in this changeset: http://hg.linux-ha.org/dev/rev/84e6520764bf Thank you for taking care of it. BTW, do you have hg write access? No, I don't. Is there any authorization procedure to gain the access? Thanks, Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] Proposal SNMP subagent extention for CRM resources
Dejan, Dejan Muhamedagic [EMAIL PROTECTED] writes: Hi, On Fri, Nov 09, 2007 at 03:12:29PM +0900, Keisuke MORI wrote: Hello all, I would like to propose an extention for the SNMP hbagent so that it can handle the CRM resource information provided by Heartbeat Version 2. The attached patch is my proposal implementation. The patch is already tested and debugged by our team with using valgrind. I would appreciate any comments and suggestions to make it more usable for everybody in the community. I'll take a look at the code. Thanks a lot for the contribution. Thank you for taking a look at it. Please advise me if there's any suspicious code. We'll correct it. Thanks, Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [PATCH] Proposal SNMP subagent extention for CRM resources
Andrew, Thank you for your comments. Andrew Beekhof [EMAIL PROTECTED] writes: I would appreciate any comments and suggestions to make it more usable for everybody in the community. you might want to include so other data (such as failed, is_managed, etc) in the trap also, it might be an idea to include a back-link for resources that are in groups/clones/etc quote NOTE : This trap is sent only when the resource operation succeeds. Concretely, the extended hbagent gets the cib information when it changes, and parse it. And if the rc_code of the operation (like CRMD_ACTION_START) is 0, then the hbagent sends a trap. /quote it worries me a little that you only send the trap when rc=0... you don't want to know about failed actions? The intention of the trap is letting you know the current status of resources (such as running/stopped/etc.), and not the result of each operations. This is similar to LHAResourceGroupStatus object which is the resource status V1. (The note above is just for an explanation that how it's implemented.) But, yeah, your point is right and it might be also useful. Does anybody want to use this information? We're considering to extend it more, but before we proceed I would like to design the new MIB definition first. Does anyone have comments for this? I would like to hear more, particularly about what kind of information is needed from whom really wants to use the SNMP agent. Regards, Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] snmp notification v2 cluster
Hi, I posted my proposal patch of the SNMP hbagent extention for V2 resources to the development mailing list. http://www.gossamer-threads.com/lists/linuxha/dev/43676 Please take a look and I would appreciate if you could give me any comments. Thanks, Keisuke MORI [EMAIL PROTECTED] writes: Abraham, and everyone in the list, Our company is now working on adding a feature to the hbagent so that it notifies you when the V2 resource status is changed. Our first implementation is almost done actually, so I will post it to here in a week as soon as we're ready. I hope it helps for the community. Thanks, Abraham Iglesias [EMAIL PROTECTED] writes: It sounds good. I am really surprised because i didn't expect so many replies :D . Thanks to everyone. I will start trying the hbagent to see which capabilities are implemented at this moment. Any suggestion will be wellcome :) -bram Michael Schwartzkopff escribió: Am Mittwoch, 31. Oktober 2007 13:23 schrieb Peter Clapham: Linux-HA comes with a full SNMP subagent with its own MIB and the capability send sending traps. nagios is only an add-on for proper alerting and visualization of the alerts. Look for SNMP or hbagent in the documentation of the sources. Regrettably not resource aware, hence an update to full v2 would be rather nice tm :-) Yes, but at least it count online nodes. So you get a hint if there is a problem. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] snmp notification v2 cluster
Abraham, and everyone in the list, Our company is now working on adding a feature to the hbagent so that it notifies you when the V2 resource status is changed. Our first implementation is almost done actually, so I will post it to here in a week as soon as we're ready. I hope it helps for the community. Thanks, Abraham Iglesias [EMAIL PROTECTED] writes: It sounds good. I am really surprised because i didn't expect so many replies :D . Thanks to everyone. I will start trying the hbagent to see which capabilities are implemented at this moment. Any suggestion will be wellcome :) -bram Michael Schwartzkopff escribió: Am Mittwoch, 31. Oktober 2007 13:23 schrieb Peter Clapham: Linux-HA comes with a full SNMP subagent with its own MIB and the capability send sending traps. nagios is only an add-on for proper alerting and visualization of the alerts. Look for SNMP or hbagent in the documentation of the sources. Regrettably not resource aware, hence an update to full v2 would be rather nice tm :-) Yes, but at least it count online nodes. So you get a hint if there is a problem. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-ha-dev] [Bug 1722] First item in a group is not stopped when the second fails (and can't be migrated)
Andrew Beekhof [EMAIL PROTECTED] writes: I think that the old behavior is preferable because running a part of the group is pointless from the service availability's point of view and confusing to users. no. just because items later in the group fail doesn't mean the rest of the group should be stopped. In the HA database cluster, the database service is typically provided by the group like: Filesystem + MySQL + IP If any of the resources failed then the database service is no longer available. Running only Filesystem does not mean anything to the service availability. consider: IP + Filesystem + Apache + MySQL Just because MySQL fails doesn't mean Apache, the Filesystem nor the IP should be stopped. I can understand that, but in that case, I think it would be more straightforward to have two separate groups; one for the database server and the other for the web server, because they can run independently, right? We usually group up resources because they need to run together to provide the service (database, web server, or whatever) as a whole, therefore running a part of the group does not make sense. 2) If it would not be possible, then would you tell me what is the correct configuration to achieve the same result as 2.1.2 in the new version? (with correct I mean by design and unlikely change in the near future) I'm also wondering how anybody else configures about this behavior. Let me instead ask what you believe you gain by stopping the first resource. Because it is just simple and intuitive for users. And I believe that the most of commercial HA software would also behave like this (at least in the typical usage). Our costomers are considering to migrate from a commercial HA software to heartbeat, and all of them expect to behave like this so far. At least it would be nice if it's able to be customized, I would think. Regards, Keisuke MORI NTT DATA Intellilink Corporation ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/