Re: [Linux-HA] Doubt in HA configuration
Hi Rajesh, the pacemaker mailing list has moved since February to users: http://oss.clusterlabs.org/mailman/listinfo/users - so you'll need to subscribe to the 'users' list and repost it there! Regards Nikita On 07.05.2015 17:19, Rajesh S wrote: Hi Team, Sorry for inconvenience.I want to do HA between two different machines one is HP workshop and another one is VMware virtual machine but both of the device's are in different IP series as 192.168.70.102 and 192.168.90.130 .Does HA is possible or not.Please advice me. With Regards, RAJESH S. On 7 May 2015 at 18:39, Rajesh S raj...@tevatel.com wrote: Hi Team, I want to do HA between two different machines one is HP workshop and another one is VMware virtual machine but both of the device's are in different IP series as 192.168.90.102 and 192.168.90.130 .Does HA is possible or not.Please advice me. With Regards, RAJESH S. ___ Linux-HA mailing list is closing down. Please subscribe to us...@clusterlabs.org instead. http://clusterlabs.org/mailman/listinfo/users ___ Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha ___ Linux-HA mailing list is closing down. Please subscribe to us...@clusterlabs.org instead. http://clusterlabs.org/mailman/listinfo/users ___ Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha
Re: [Linux-HA] Announcing the Heartbeat 3.0.6 Release
On 10.02.2015 22:24, Lars Ellenberg wrote: TL;DR: If you intend to set up a new High Availability cluster using the Pacemaker cluster manager, you typically should not care for Heartbeat, but use recent releases (2.3.x) of Corosync. If you don't care for Heartbeat, don't read further. Unless you are beekhof... there's a question below ;-) After 3½ years since the last officially tagged release of Heartbeat, I have seen the need to do a new maintenance release. The Heartbeat 3.0.6 release tag: 3d59540cf28d and the change set it points to: cceeb47a7d8f GREAT !!! Thank you very much, Lars! Heartbeat is still running on some our production clusters ... The main reason for this was that pacemaker more recent than somewhere between 1.1.6 and 1.1.7 would no longer work properly on the Heartbeat cluster stack. Because some of the daemons have moved from glue to pacemaker proper, and changed their paths. This has been fixed in Heartbeat. And because during that time, stonith-ng was refactored, and would still reliably fence, but not understand its own confirmation message, so it was effectively broken. This I fixed in pacemaker. If you chose to run new Pacemaker with the Heartbeat communication stack, it should be at least 1.1.12 with a few patches, see my December 2014 commits at the top of https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12 I'm not sure if they got into pacemaker upstream yet. beekhof? Do I need to rebase? Or did I miss you merging these? --- If you have those patches, consider setting this new ha.cf configuration parameter: # If pacemaker crmd spawns the pengine itself, # it sometimes forgets to kill the pengine on shutdown, # which later may confuse the system after cluster restart. # Tell the system that Heartbeat is supposed to # control the pengine directly. crmd_spawns_pengine off Here is the shortened Heartbeat changelog, the longer version is available in mercurial: http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog - fix emergency shutdown due to broken update_ackseq - fix node dead detection problems - fix converging of membership (ccm) - fix init script startup glitch (caused by changes in glue/resource-agents) - heartbeat.service file for systemd platforms - new ucast6 UDP IPv6 communication plugin - package ha_api.py in standard package - update some man pages, specifically the example ha.cf - also report ccm membership status for cl_status hbstatus -v - updated some log messages, or their log levels - reduce max_delay in broadcast client_status query to one second - apply various (mostly cosmetic) patches from Debian - drop HBcompress compression plugins: they are part of cluster glue - drop openais HBcomm plugin - better support for current pacemaker versions - try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle) - dopd: ignore dead ping nodes - cl_status improvements - api internals: reduce IPC round-trips to get at status information - uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient) - fix /dev/null as log- or debugfile setting - move daemon binaries into libexecdir - document movement of compression plugins into cluster-glue - fix usage of SO_REUSEPORT in ucast sockets - fix compile issues with recent gcc and -Werror Note that a number of the mentioned fixes have been created two years ago already, and may have been released in packages for a long time, where vendors have chosen to package them. As to future plans for Heartbeat: Heartbeat is still useful for non-pacemaker, haresources-mode clusters. We (Linbit) will maintain Heartbeat for the foreseeable future. That should not be too much of a burden, as it is stable, and due to long years of field exposure, all bugs are known ;-) The most notable shortcoming when using Heartbeat with Pacemaker clusters would be the limited message size. There are currently no plans to remove that limitation. With its wide choice of communications paths, even exotic communication plugins, and the ability to run arbitrarily many paths, some deployments may even favor it over Corosync still. But typically, for new deployments involving Pacemaker, in most cases you should chose Corosync 2.3.x as your membership and communication layer. For existing deployments using Heartbeat, upgrading to this Heartbeat version is strongly recommended. Thanks, Lars Ellenberg ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] crmsh 2.0 released, and moving to Github
Hi Kristoffer, Nice work - thank you very much! Best wishes Nikita Michalko Am Donnerstag, 3. April 2014 18:03:33 schrieb Kristoffer Grönlund: Hello everyone, Today, I have two major announcements to make: crmsh is moving to a new location, and I'm releasing the next major version of the crm shell! == Find us at crmsh.github.io Since the rest of the High-Availability stack is being developed over at Github, we thought it would make things easier to move crmsh over there as well. This means we're not only moving the website and issue tracker, we're also switching from Mercurial to git. From this release forward, you will find everything crmsh-related at http://crmsh.github.io, and the source code at https://github.com/crmsh/crmsh. Here are the new URLs related to crmsh: * Website: http://crmsh.github.io/ * Documentation: http://crmsh.github.io/documentation.html * Source repository: https://github.com/crmsh/crmsh/ * Issue tracker: https://github.com/crmsh/crmsh/issues/ Not everything has moved quite yet, but the source code and web site are in place. == New stable release: crmsh 2.0 Secondly, we are proud to finally release crmsh 2.0! This is the version of crmsh I have been developing since I became a maintainer last year, and there are a lot of new and improved features in this release. For a more complete list of changes since the previous version, please refer to the changelog: * https://github.com/crmsh/crmsh/blob/2.0.0/ChangeLog Packages for several popular Linux distributions (updated soon): http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/ Zip archive of the tagged release: * https://github.com/crmsh/crmsh/archive/2.0.0.zip Here is a short list of some of the biggest changes and features in crmsh 2.0: * *More stable than ever before!* Many bugs and issues have been fixed, with plenty of help from the community. At the same time, this is a major release with many new features. Testing and pull requests are more than welcome! * *Cluster management commands.* We've added a couple of new sub-levels that help with the installation and management of the cluster, as well as maintaining and synchronizing the corosync configuration across nodes. There are now commands for starting and stopping the cluster services, as well as cluster scripts that make the installation and configuration of cluster-controlled resources a one-line command. * *Cleaner CLI syntax.* The parser for the configure syntax of crmsh has been rewritten, allowing for cleaner syntax, better error detection and improved error messages. * *Tab completion everywhere.* Now tab completion works not only in the interactive mode, but directly from bash. In addition, the completion back end has been completely rewritten and many more commands now have full completion. It's not quite every single command yet, but we're getting there. * *New and improved configuration.* The new configuration file is installed in /etc/crm/crm.conf by default or per user if desired, and allows for a much more flexible configuration of crmsh. * *Cluster health evaluation.* As part of the cluster script functionality, there is now a cluster health command which analyses and reports on low disk space, problems with network configuration, firewall configuration issues and more. The best part of the cluster health command is that it can work without a configured cluster, providing a checklist of issues to amend before setting up a new cluster. * *And wait, there's more!* There is now not only an extensive regression test suite but a growing set of unit tests as well, support for many new features in Pacemaker 1.1.11 such as resource sets in location constraints, anonymous shadow CIBs makes it easier to avoid race conditions in scripts, full syntax highlighting for the built-in help, the assist sub-command helps with more advanced configurations... the list goes on. Big thanks to everyone who have helped with bug fixes, comments and contributions for this release! ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Robert Koeppl ist außer Haus. Robert Koeppl is out of office
On 28.11.2013 19:29, robert.koe...@knapp.com wrote: Ich werde ab 28.11.2013 nicht im Büro sein. Ich kehre zurück am 03.12.2013. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Sehr geehrter Herr Koeppl, bitte stellen Sie ENDLICH das Versenden solchen Mails an linux-ha@lists.linux-ha.org ein - ich WILL NICHT KEINE Mails von Ihnen mehr erhalten TIA! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Q: ERROR: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped in state S_RELEASE_DC! (120000ms)
Also seen in logs with the pacemaker 1.1.5: the point was unequal MTU settings between the servers ;-) After setting the same MTU on both NICs, all was working like a charm! Have a nice day! Nikita Michalko Am Mittwoch, 5. Juni 2013 14:25:19 schrieb Ulrich Windl: Hi! When the cluster (SLES11 SP2, current) was put in maintenance mode, two messages repeat periodically, filling the log: ERROR: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped in state S_RELEASE_DC! (12ms) WARN: do_log: FSA: Input I_ELECTION_DC from crm_timer_popped() received in state S_RELEASE_DC crmd: [7285]: info: handle_request: Current ping state: S_RELEASE_DC Is this expected, or is it yet another bug (pacemaker-1.1.7-0.13.9)? Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Virtual interface (eth0:0) disappeared
Am Dienstag, 21. Mai 2013 00:00:03 schrieb DaveW: We are running heartbeat 2.1.3 on CentOS 5.4. Last Monday AM, I - Man, so OLD! Any chance to update to the latest version ? Nikita Michalko received a call while getting ready for work. Our high availability server was not responding. The previous Saturday, our I.T. admins had re-configured the network to expand IP address ranges on some subnets. For whatever reason, this action caused our main server (in a two-node HA configuration) to loose its virtual interface, rendering our high-availability server unavailable. The network worked fine; the nodes could ping each other based on their normal IP's and they could ping the ping node, but the virtual IP (the one we REALLY care about) was ignored. Nothing in the logs, no errors, nothing. Just an unresponsive virtual server. A manual fail-over brought it back quickly as the backup took over. I.T. had done their work on Sat and, had I checked our server on Sunday, I would have found it unreachable with a normal ping. When my colleague called me, I asked him what ifconfig looked like. He described three interfaces; eth0, eth1 and lo; no eth0:0. I had him initiate the manual fail-over. After pouring over the logs, unable to find anything that indicated a problem, I tried to simulate the problem with ifconfig eth0:0 down. Sure enough, no fail-over, no errors, nothing; just (once again) an unresponsive server. ifconfig eth0:0 IP_ADDRESS up brought it right back (I tried this last Saturday, BTW, when no one was working). It seems that heartbeat (ipfail?) creates this virtual interface when it starts, then forgets about it. I presume that the assumption is that if eth0 remains intact, eth0:0 will remain intact, as well. Am I missing something in the configuration settings or docs? I find nothing about configuring the backup node to monitor the virtual address, just the other node (which has a different IP and kept working after the network changes). I am about to set up a service to monitor the virtual IP, but I wanted to check with the list, first, to see if there's already been something built in that I have not configured correctly. I have used main.company.com and backup.company.com as the two hostnames of the nodes. Both systems have these names in an /etc/hosts file, along with the hostname and IP of the virtual server and the ping node. My configuration: /etc/ha.d/ha.cf: debugfile /var/log/ha-debug logfile/var/log/ha-log logfacilitylocal0 keepalive 2 deadtime 10 warntime 3 initdead 120 udpport694 baud9600 serial/dev/ttyS0 ucast eth1 10.0.0.1 ucast eth1 10.0.0.2 auto_failback off node main.company.com backup.company.com ping 129.196.140.130 respawn hacluster /usr/lib/heartbeat/ipfail deadping 10 /etc/ha.d/haresources main.company.com drbddisk::drbd_resource_0 Filesystem::/dev/drbd0::/usr0::ext3 mysql IPaddr::129.196.140.14 httpd smb MailTo::root ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Robert Koeppl ist außer Haus. Robert Koeppl is out of office
Mr. Koeppl: it's absolutly not interesting for me - why get I this message?! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Robert Koeppl ist außer Haus. Robert Koeppl is out of office
Danke für die Info, Herr Koeppl, aber diese Mails sind hier nicht gewünscht! Bitte nicht mehr schicken! Nikita Michalko Am Montag, 13. Mai 2013 01:13:20 schrieb robert.koe...@knapp.com: Ich werde ab 08.05.2013 nicht im Büro sein. Ich kehre zurück am 21.05.2013. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: crmd: [31942]: WARN: decode_transition_key: Bad UUID (crm-resource-25438) in sscanf result (3) for 0:0:crm-resource-25438
Am Donnerstag, 16. August 2012 17:54:06 schrieb EXTERNAL Konold Martin (erfrakon, RtP2/TEF72): Hi, What I do is evaluation od SLES11 SP2 (we run SP1) now. So testing anything that's not part of SP2 (plu Updates) is not planned right now. I also think when reporting problems here early might get you mentally prepared when the problem is eventually reported via official support. Maybe also in times of google, other people may be interested to see what other people found out. From my experience with SLES11 SP2 (with all current updates) I conclude that actually nobody is seriously running SP2 without local bugfixes. I am also testing SP2 - and yes, it's true: not yet ready for production ;-( E.g. Even the most simple examples from the official SuSE documentation don't work as expected. A trivial example is ocf:heartbeat:exportfs as distributed by SuSE with SP2 causes unlimited growth of .rmtab files (goes fast in the gigabytes for serious NFS servers). I could work around this issue using some shell scripting. There are other issues which are more than annoying and actually make the SLES SP2 HA Extension unusable for production systems. E.g. clvmd cannot be made less verbose from the cluster configuration. (No daemon_options=-d0 does not help!) Not funny is also the fact that the official SLES 11 SP2 kernels crash seriously (when a node rejoins the cluster) when using STCP as recommended in the SLES HA documentation and offered via the wizards. It took me a while to find out what was going on. When setting up a system with many (rather simple) resources funny things happen due to race conditions all over the place. (can be worked around mostly using arbitrary start-delay options. Oh, did I mention that situations which are actually forbidden by constraints (e.g. using a score of INFINITY) actually do happen... Depending on the environment this can lead to not so funny effects. E.g. I defined the following constraints: colocation c17 inf: p_lsb_ccslogserver p_fs_daten order o34 inf: p_fs_daten p_lsb_ccslogserver:start I can proof from the logs that ccslogserver (an application) got migrated from node A to node B while p_fs_daten (a filesystem on top of drbd) was definitely still running on node A Reporting bugs is not possible without a direct support contract. (You must enter into a support contract with SuSE before you can even report a bug or provide a patch ) Regards Martin Konold (Who used to maintain SuSE Clusters since 2001) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: crmd: [31942]: WARN: decode_transition_key: Bad UUID (crm-resource-25438) in sscanf result (3) for 0:0:crm-resource-25438
Hi Lars! Am Freitag, 17. August 2012 10:36:37 schrieb Lars Marowsky-Bree: On 2012-08-17T08:41:15, Nikita Michalko michalko.sys...@a-i-p.com wrote: I am also testing SP2 - and yes, it's true: not yet ready for production ;-( What problems did you find? - e.g. the problem with SLES 11 SP2 kernels crash - the same as described by Martin: SP2 kernels crash seriously (when a node rejoins the cluster) when using STCP as recommended in the SLES HA documentation and offered via the wizards. - and some specific problems with ISP-RAID driver, but those are solved in the meantime by reseller Regards Nikita Regards, Lars ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat question about multiple services
Am Freitag, 20. April 2012 12:42:16 schrieb sgm: Hi, I have a question about heartbeat, if I have three services, apache, mysql and sendmail,if apache is down, heartbeat will switch all the services to the standby server, right? It's depending on configuration - also possible ... If so, how to configure heartbeat to avoid this happen? You can configure your 2 services (mysql and sendmail for example ) with colocations constraints, or as a group - there are many possibilities. Did you already RTFM (read the f... manuals)? Very Appreciated.gm ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems HTH Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Failover Configuration Question
Hi, Net Warrior! What version of HA/Pacemaker do you use? Did you already RTFM - e.g. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained - or: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch HTH Nikita Michalko Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior: Hi There I configured heartbeat to failover an IP address , if I for example shutdown one node, the other takes it's ip address, so far so good, now my doubt is if there is a way to configure it not to make the failover automatically and have someone run the failover manually, can you provide any configuration example please? is this stanza the one that does the magic? auto_failback on Thanks for your time and support Best regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problem with stonith RA using external/ipmi over lan/lanplus interface
Glad to hear it - HTH! Nikita Michalko Am 16.04.2012 19:10, schrieb Pham, Tom: -- Thanks Nikita, The ipmitool work fine now. The password and user I was given from the person who configured the iLO is not correct. I created a new one and it works fine now. Message: 4 Date: Fri, 13 Apr 2012 09:52:06 +0200 From: Nikita Michalkomichalko.sys...@a-i-p.com Subject: Re: [Linux-HA] Problem with stonith RA using external/ipmi over lanor lanplus interface To: General Linux-HA mailing listlinux-ha@lists.linux-ha.org Message-ID:201204130952.06313.michalko.sys...@a-i-p.com Content-Type: Text/Plain; charset=iso-8859-1 Hi, When I tried ipmitool -I lan -U root -H ip -a chassis power cycle. It did not work but it worked with -I open interface. - it can't work without IP-adr. (ip=?) You must first configure the ipmi( iLO 3) card with ipmitool, e.g.: ipmitool lan print ipmitool lan set 1 ipaddr 10.10.40.48 ipmitool lan set 1 netmask 255.255.255.0 ipmitool lan set 1 defgw ipaddr 10.10.40.1 ipmitool lan set 1 auth ADMIN PASSWORD ipmitool lan set 1 user ... etc- man ipmitool is your friend ;-) What should I do to enable lan/lanplus on SUSE 11? Nikita Michalko -- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems End of Linux-HA Digest, Vol 101, Issue 13 * ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problem with stonith RA using external/ipmi over lan or lanplus interface
Hi, When I tried ipmitool -I lan -U root -H ip -a chassis power cycle. It did not work but it worked with -I open interface. - it can't work without IP-adr. (ip=?) You must first configure the ipmi( iLO 3) card with ipmitool, e.g.: ipmitool lan print ipmitool lan set 1 ipaddr 10.10.40.48 ipmitool lan set 1 netmask 255.255.255.0 ipmitool lan set 1 defgw ipaddr 10.10.40.1 ipmitool lan set 1 auth ADMIN PASSWORD ipmitool lan set 1 user ... etc- man ipmitool is your friend ;-) What should I do to enable lan/lanplus on SUSE 11? Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problem with stonith RA using external/ipmi over lan or lanplus interface
Hi, did you properly configure BOTH ipmi with ipmitool? And ipmi started? /etc/init.d/ipmi start What says the command: ipmitool -I lan -H IP_OF_OTHER_NODE -U SOMEUSER -A MD5 -P SOMEPASSWORD power status HTH Nikita Michalko Am Mittwoch, 11. April 2012 23:00:41 schrieb Pham, Tom: Hi everyone, I try to test a 2 nodes cluster with stonith resource using external/ipmi ( I tried external/riloe first but it does not seem to work) My system has HP Proliant BL460c G7 with iLO 3 card Firmware 1.15 SUSE 11 Corosync version 1.2.7 ; Pacmaker 1.0.9 When I use the interface lan or lanplus, It failed to start the stonith resource. I get the error below external/ipmi[12173]: [12184]: ERROR: error executing ipmitool: Error: Unable to establish IPMI v2 / RMCP+ session Unable to get Chassis Power Status However, when I used the interface = open instead lan/lanplus ,it started the stonith resource fine. When I tried to kill -9 corosync in node1, I expect it will reboot node1 and started all resource on node2. But it reboot node1. Someone mentioned that open interface is a local interface and only allows to fence itself. Anyone knows why the lan/lanplus does not work? Thanks Tom Pham ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] compiling cluster glue in solaris 10 getting error
Hmmm, it seems you don't know nothing about Christmas - no wonder ... I don't know nothing about solaris 10, but obviously are you missing some packages for compiling: - maybe libbz2 + libbz2-devel ? HTH Nikita Michalko Am 26.12.2011 11:57, schrieb salim ep: No Response hmmm its seems to make me nervous on Linux HA service . :( :(( On Sun, Dec 25, 2011 at 5:54 PM, salim epeepeesali...@gmail.com wrote: Dear Team, am facing an error while compiling an /Reusable-Cluster-Components-glue--ebb567a5b758 pakage in solaris 10 X86 for Linux HA Service Pakage:Heartbeat-3-0-STABLE-3.0.2 Pakage:Reusable-Cluster-Components-glue--ebb567a5b758 checking if libnet is required... yes checking for libnet-config... no checking for t_open in -lnsl... yes checking for socket in -lsocket... (cached) yes checking for libnet_get_hwaddr in -lnet... yes checking for libnet... found libnet1.1 checking for libnet_init in -lnet... yes checking for netinet/icmp6.h... no checking for ucd-snmp/snmp.h... no checking net-snmp/net-snmp-config.h usability... yes checking net-snmp/net-snmp-config.h presence... yes checking for net-snmp/net-snmp-config.h... yes checking for net-snmp-config... /usr/sfw/bin//net-snmp-config checking for special snmp libraries... -R../lib -L/usr/sfw/lib -lnetsnmp -lgen -lpkcs11 -lkstat -lelf -lm -ldl -lnsl -lsocket -ladm checking For libOpenIPMI version 1.4 or greater... no checking curl/curl.h usability... no checking curl/curl.h presence... no checking for curl/curl.h... no checking openhpi/SaHpi.h usability... no checking openhpi/SaHpi.h presence... no checking for openhpi/SaHpi.h... no checking vacmclient_api.h usability... no checking vacmclient_api.h presence... no checking for vacmclient_api.h... no checking bzlib.h usability... yes checking bzlib.h presence... yes checking for bzlib.h... yes checking for BZ2_bzBuffToBuffCompress in -lbz2... yes ./configure: line 28817: syntax error near unexpected token `DBUS,' ./configure: line 28817: ` PKG_CHECK_MODULES(DBUS, dbus-1, dbus-glib-1)' bash-3.00# pwd /export/home/salim/Reusable-Cluster-Components-glue--ebb567a5b758 kindly advice me about this error this is making me sleepless night :( -- ‘Winner make things happen, Lossers let things happen’ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] UNKNOWN version of resource-agents: 2 questions
That is SLES11/SP1 (64bit of course) which is in the package util-linux-2.16-6.8.2. Nikita Michalko Am Freitag, 21. Oktober 2011 13:34:06 schrieb Dejan Muhamedagic: On Thu, Oct 20, 2011 at 09:17:55AM +0200, Nikita Michalko wrote: Great - thank you, Dejan! Which platform was that? Maybe we need to fix the which requirement. Cheers, Dejan Nikita Michalko Am Mittwoch, 19. Oktober 2011 17:58:39 schrieb Dejan Muhamedagic: Hi, On Wed, Oct 19, 2011 at 01:56:39PM +0200, Nikita Michalko wrote: Hi all, I've just succesfully compiled the ClusterLabs-resource-agents-v3.9.2-65- g46c6990(.zip) from: https://github.com/ClusterLabs/resource-agents, configured with: ./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc --enable- fatal-warnings=no You should add: --with-ras-set=linux-ha After make I've tried make rpm and now I'm facing to the following errors: ... snip ... gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/ldirectord' /opt/HA/sourc/ClusterLabs-resource-agents-46c6990/doc gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/doc' gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/doc' gmake \ top_distdir=resource-agents-UNKNOWN distdir=resource-agents- UNKNOWN \ dist-hook gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' if test -d .git; then \ LC_ALL=C ./make/gitlog-to-changelog \ --since=2000-01-01 resource-agents-UNKNOWN/cl-t; \ rm -f resource-agents-UNKNOWN/ChangeLog.devel; \ mv resource-agents-UNKNOWN/cl-t resource-agents- UNKNOWN/ChangeLog.devel;\ fi echo UNKNOWN resource-agents-UNKNOWN/.tarball-version rm -f resource-agents-UNKNOWN/resource-agents.spec \ cp ./resource-agents.spec resource-agents-UNKNOWN/resource- agents.spec gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' find resource-agents-UNKNOWN -type d ! -perm -777 -exec chmod a+rwx {} \; -o \ ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \ ! -type d ! -perm -400 -exec chmod a+r {} \; -o \ ! -type d ! -perm -444 -exec /bin/sh /opt/HA/sourc/ClusterLabs- resource-agents-46c6990/install-sh -c -m a+r {} {} \; \ || chmod -R a+r resource-agents-UNKNOWN tardir=resource-agents-UNKNOWN /bin/sh /opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar chof - $tardir | GZIP=--best gzip -c resource-agents-UNKNOWN.tar.gz tardir=resource-agents-UNKNOWN /bin/sh /opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar chof - $tardir | bzip2 -9 -c resource- agents-UNKNOWN.tar.bz2 { test ! -d resource-agents-UNKNOWN || { find resource-agents-UNKNOWN -type d ! -perm -200 -exec chmod u+w {} ';' rm -fr resource-agents-UNKNOWN; }; } gmake[2]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' gmake[1]: »resource-agents-UNKNOWN.tar.gz« ist bereits aktualisiert. gmake[1]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' rpmbuild --define _sourcedir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _specdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _builddir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _srcrpmdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _rpmdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 -ba resource-agents.spec error: Failed build dependencies: which is needed by resource-agents-UNKNOWN-1.x86_64 make: *** [rpm] Fehler 1 I guess that this distribution doesn't have the which package. So, just remove it from the list of requirements. Q.: 1. why is this version UNKNOWN? Perhaps with-ras-set is going to fix that. Q.2.: what is needed yet by resource-agents-UNKNOWN for succesfull build of an RPM? Did the above help? Thanks, Dejan TIA! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] UNKNOWN version of resource-agents: 2 questions
Great - thank you, Dejan! Nikita Michalko Am Mittwoch, 19. Oktober 2011 17:58:39 schrieb Dejan Muhamedagic: Hi, On Wed, Oct 19, 2011 at 01:56:39PM +0200, Nikita Michalko wrote: Hi all, I've just succesfully compiled the ClusterLabs-resource-agents-v3.9.2-65- g46c6990(.zip) from: https://github.com/ClusterLabs/resource-agents, configured with: ./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc --enable- fatal-warnings=no You should add: --with-ras-set=linux-ha After make I've tried make rpm and now I'm facing to the following errors: ... snip ... gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/ldirectord' /opt/HA/sourc/ClusterLabs-resource-agents-46c6990/doc gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/doc' gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/doc' gmake \ top_distdir=resource-agents-UNKNOWN distdir=resource-agents- UNKNOWN \ dist-hook gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' if test -d .git; then \ LC_ALL=C ./make/gitlog-to-changelog \ --since=2000-01-01 resource-agents-UNKNOWN/cl-t; \ rm -f resource-agents-UNKNOWN/ChangeLog.devel; \ mv resource-agents-UNKNOWN/cl-t resource-agents- UNKNOWN/ChangeLog.devel;\ fi echo UNKNOWN resource-agents-UNKNOWN/.tarball-version rm -f resource-agents-UNKNOWN/resource-agents.spec \ cp ./resource-agents.spec resource-agents-UNKNOWN/resource- agents.spec gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' find resource-agents-UNKNOWN -type d ! -perm -777 -exec chmod a+rwx {} \; -o \ ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \ ! -type d ! -perm -400 -exec chmod a+r {} \; -o \ ! -type d ! -perm -444 -exec /bin/sh /opt/HA/sourc/ClusterLabs- resource-agents-46c6990/install-sh -c -m a+r {} {} \; \ || chmod -R a+r resource-agents-UNKNOWN tardir=resource-agents-UNKNOWN /bin/sh /opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar chof - $tardir | GZIP=--best gzip -c resource-agents-UNKNOWN.tar.gz tardir=resource-agents-UNKNOWN /bin/sh /opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar chof - $tardir | bzip2 -9 -c resource- agents-UNKNOWN.tar.bz2 { test ! -d resource-agents-UNKNOWN || { find resource-agents-UNKNOWN -type d ! -perm -200 -exec chmod u+w {} ';' rm -fr resource-agents-UNKNOWN; }; } gmake[2]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' gmake[1]: »resource-agents-UNKNOWN.tar.gz« ist bereits aktualisiert. gmake[1]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' rpmbuild --define _sourcedir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _specdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _builddir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _srcrpmdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _rpmdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 -ba resource-agents.spec error: Failed build dependencies: which is needed by resource-agents-UNKNOWN-1.x86_64 make: *** [rpm] Fehler 1 I guess that this distribution doesn't have the which package. So, just remove it from the list of requirements. Q.: 1. why is this version UNKNOWN? Perhaps with-ras-set is going to fix that. Q.2.: what is needed yet by resource-agents-UNKNOWN for succesfull build of an RPM? Did the above help? Thanks, Dejan TIA! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] UNKNOWN version of resource-agents: 2 questions
Hi all, I've just succesfully compiled the ClusterLabs-resource-agents-v3.9.2-65- g46c6990(.zip) from: https://github.com/ClusterLabs/resource-agents, configured with: ./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc --enable- fatal-warnings=no After make I've tried make rpm and now I'm facing to the following errors: ... snip ... gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/ldirectord' /opt/HA/sourc/ClusterLabs-resource-agents-46c6990/doc gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/doc' gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/doc' gmake \ top_distdir=resource-agents-UNKNOWN distdir=resource-agents- UNKNOWN \ dist-hook gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' if test -d .git; then \ LC_ALL=C ./make/gitlog-to-changelog \ --since=2000-01-01 resource-agents-UNKNOWN/cl-t; \ rm -f resource-agents-UNKNOWN/ChangeLog.devel; \ mv resource-agents-UNKNOWN/cl-t resource-agents- UNKNOWN/ChangeLog.devel;\ fi echo UNKNOWN resource-agents-UNKNOWN/.tarball-version rm -f resource-agents-UNKNOWN/resource-agents.spec \ cp ./resource-agents.spec resource-agents-UNKNOWN/resource- agents.spec gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' find resource-agents-UNKNOWN -type d ! -perm -777 -exec chmod a+rwx {} \; -o \ ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \ ! -type d ! -perm -400 -exec chmod a+r {} \; -o \ ! -type d ! -perm -444 -exec /bin/sh /opt/HA/sourc/ClusterLabs- resource-agents-46c6990/install-sh -c -m a+r {} {} \; \ || chmod -R a+r resource-agents-UNKNOWN tardir=resource-agents-UNKNOWN /bin/sh /opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar chof - $tardir | GZIP=--best gzip -c resource-agents-UNKNOWN.tar.gz tardir=resource-agents-UNKNOWN /bin/sh /opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar chof - $tardir | bzip2 -9 -c resource- agents-UNKNOWN.tar.bz2 { test ! -d resource-agents-UNKNOWN || { find resource-agents-UNKNOWN -type d ! -perm -200 -exec chmod u+w {} ';' rm -fr resource-agents-UNKNOWN; }; } gmake[2]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' gmake[1]: »resource-agents-UNKNOWN.tar.gz« ist bereits aktualisiert. gmake[1]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' rpmbuild --define _sourcedir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _specdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _builddir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _srcrpmdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _rpmdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 -ba resource-agents.spec error: Failed build dependencies: which is needed by resource-agents-UNKNOWN-1.x86_64 make: *** [rpm] Fehler 1 Q.: 1. why is this version UNKNOWN? Q.2.: what is needed yet by resource-agents-UNKNOWN for succesfull build of an RPM? TIA! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Why 'crm resource cleanup' cannot work
Hi Robin, I had a similar problem in the past with the old version. You may want send your configuration and logs? Regards Nikita Michalko Am Mittwoch, 31. August 2011 13:22:36 schrieb robin: If they are official stable release, we can consider the upgrade. But does the upgrade can resolve my issue? Regards, -Robin At 2011-08-31 17:56:23,Nikita Michalko michalko.sys...@a-i-p.com wrote: Am Mittwoch, 31. August 2011 11:23:46 schrieb robin: Append the version [root@master ~]# rpm -qa|grep heartbeat heartbeat-3.0.3-2.3.el5 heartbeat-libs-3.0.3-2.3.el5 [root@master ~]# rpm -qa|grep pacemaker pacemaker-libs-1.0.9.1-1.15.el5 pacemaker-1.0.9.1-1.15.el5 Regards, -Robin At 2011-08-31 16:21:17,Nikita Michalko michalko.sys...@a-i-p.com wrote: Am Mittwoch, 31. August 2011 09:09:52 schrieb robin: Hi All, I've a small cluster using heartbeat+pacemaker, and now I encounter a problem as below: When some resource failed, the Failed actions will always be shown in 'crm status' even if I fixed the issue and run crm resource cleanup linkmon installer-11-00 ++ Failed actions: linkmon_monitor_0 (node=installer-11-00, call=2, rc=1, status=complete): unknown error linkmon_start_0 (node=installer-11-00, call=13, rc=1, status=complete): unknown error +++ The only way I can do is to restart heartbeat service, but it's not the way I wanted. Could anyone have any idea about it? Regards, -Robin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems - Versions ? Regards Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Any chance to update to the newest versions - at least pacemaker-1.1.5 and heartbeat-3.0.5 ? Regards Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Why 'crm resource cleanup' cannot work
Am Mittwoch, 31. August 2011 09:09:52 schrieb robin: Hi All, I've a small cluster using heartbeat+pacemaker, and now I encounter a problem as below: When some resource failed, the Failed actions will always be shown in 'crm status' even if I fixed the issue and run crm resource cleanup linkmon installer-11-00 ++ Failed actions: linkmon_monitor_0 (node=installer-11-00, call=2, rc=1, status=complete): unknown error linkmon_start_0 (node=installer-11-00, call=13, rc=1, status=complete): unknown error +++ The only way I can do is to restart heartbeat service, but it's not the way I wanted. Could anyone have any idea about it? Regards, -Robin ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems - Versions ? Regards Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] help is needed as the stonith_host directive is not happending!
Am Freitag 22 Juli 2011 18:01:03 schrieb Avestan: Hello, Sorry for the lack of information. You guys are so good that sometimes I think you have a crystal-ball. ;o) As the following shows, I am running Heartbeat and STONITH version 2.0.8 release 1 on Fedora 7. - Ooops: V.2.0.8 !? That was very buggy! Any chance to upgrade to V.3.x ? Nikita Michalko [root@shemshak~]# rpm -qa | grep -i heartbeat heartbeat-2.0.8-1.fc7 [root@shemshak~]# rpm -qa | grep -i stonith stonith-2.0.8-1.fc7 This is an old system which I built over 2 years ago and still runs like a clock. I have recently added two STONITH Devices (APC9225 MasterSwithch Plus with APC9617 Network Management card) and here is the heartbeat configuration file /etc/ha.d/ha.cf: # Heartbeat logging configuration debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 # Heartbeat cluster members node shemshak node dizin # Heartbeat communication timing keepalive 2 deadtime 32 initdead 64 # Heartbeat communication paths udpport 694 bcast eth1 #ucast eth1 192.168.1.21 #ucast eth1 192.168.1.22 #ucast eth0 192.168.1.81 #ucast eth0 192.168.1.82 baud 19200 serial /dev/ttyS0 # Don't fail back automatically - on/off auto_failback on # Monitoring of network connection to default gateway ping 192.168.1.1 #respawn hacluster /usr/lib64/heartbeat/ipfail #STONITH stonith_host Testing apcmaster 192.168.1.56 apc apc Here is also my log file /var/log/ha-log after stopping the heartbeat on the primary host by issuing service heartbeat stop command at 2011/07/22_08:30:48 [root@shemshak ~]# tail -f /var/log/ha-log heartbeat[4741]: 2011/07/21_18:36:04 info: Current arena value: 0 heartbeat[4741]: 2011/07/21_18:36:04 info: MSG stats: 0/190108 ms age 10 [pid4749/HBWRITE] heartbeat[4741]: 2011/07/21_18:36:04 info: ha_malloc stats: 379/5069800 38076/18447 [pid4749/HBWRITE] heartbeat[4741]: 2011/07/21_18:36:04 info: RealMalloc stats: 50112 total malloc bytes. pid [4749/HBWRITE] heartbeat[4741]: 2011/07/21_18:36:04 info: Current arena value: 0 heartbeat[4741]: 2011/07/21_18:36:04 info: MSG stats: 0/86408 ms age 20 [pid4750/HBREAD] heartbeat[4741]: 2011/07/21_18:36:04 info: ha_malloc stats: 380/1815007 38160/18491 [pid4750/HBREAD] heartbeat[4741]: 2011/07/21_18:36:04 info: RealMalloc stats: 39660 total malloc bytes. pid [4750/HBREAD] heartbeat[4741]: 2011/07/21_18:36:04 info: Current arena value: 0 heartbeat[4741]: 2011/07/21_18:36:04 info: These are nothing to worry about. heartbeat[4741]: 2011/07/22_08:30:48 info: Heartbeat shutdown in progress. (4741) heartbeat[17136]: 2011/07/22_08:30:48 info: Giving up all HA resources. ResourceManager[17146]: 2011/07/22_08:30:48 info: Releasing resource group: shemshak 192.168.1.8/24/eth0 ResourceManager[17146]: 2011/07/22_08:30:48 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.8/24/eth0 stop IPaddr[17204]: 2011/07/22_08:30:48 INFO: /sbin/ifconfig eth0:0 192.168.1.8 down IPaddr[17183]: 2011/07/22_08:30:48 INFO: Success heartbeat[17136]: 2011/07/22_08:30:48 info: All HA resources relinquished. heartbeat[4741]: 2011/07/22_08:30:49 WARN: 1 lost packet(s) for [dizin] [134127:134129] heartbeat[4741]: 2011/07/22_08:30:49 info: No pkts missing from dizin! heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBFIFO process 4744 with signal 15 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBWRITE process 4745 with signal 15 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBREAD process 4746 with signal 15 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBWRITE process 4747 with signal 15 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBREAD process 4748 with signal 15 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBWRITE process 4749 with signal 15 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBREAD process 4750 with signal 15 heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4749 exited. 7 remaining heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4747 exited. 6 remaining heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4746 exited. 5 remaining heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4745 exited. 4 remaining heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4744 exited. 3 remaining heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4750 exited. 2 remaining heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4748 exited. 1 remaining heartbeat[4741]: 2011/07/22_08:30:51 info: shemshak Heartbeat shutdown complete. when I check the log file I don't see the directive stonith_host taking place! I know the STONITH demean and the device working as I am able to control the device by directly issuing the STONITH commands such as: stonith -t apcmaster -p 192.168.1.56 apc apc -T off Testing stonith -t apcmaster -p 192.168.1.56 apc apc -T on Testing Thank you for your help. Avestan
Re: [Linux-HA] help is needed as the stonith_host directive is not happending!
Hi Avestan, do you use really V1/haresource? What version of HA? config? We have no crystall ball anymore ;-) Nikita Michalko Am Mittwoch 20 Juli 2011 18:08:56 schrieb Avestan: Hello everyone, I am trying to add a STONITH device into my Linux-HA. I have added the stonith_host directive into the configuration file ha.cf as follow: #stonith_host lashgarak apcmaster 192.168.1.55 apc apc #stonith_host dizin apcmaster 192.168.1.56 apc apc The format of the command that I am using is: stonith_host {host_name} {stonith_type} {ipadress_stonith} {user} {password} When I shutdown the heartbeat on the primary host, nothing happen. I have checked the log files both /etc/log/ha-log and /etc/log/messages and I don't see anything in regard with the stonith directive. I should also mention that the resources which are placed in the haresource file are moved from the primary host lashgarak to the secondary host dizin with no issue. Currently the only resource that I have in the haresource file is the floating IP address. Thanks, Avestan -- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] cluster-glue make error
Hi all, I've downloaded the last tarball from http://hg.linux- ha.org/glue/archive/tip.tar.bz2, configured with: ./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc --with- heartbeat --with-stonith --with-pacemaker --with-daemon-user=$CLUSTER_USER -- with-daemon-group=$CLUSTER_GROUP and now by make I've got the following error: ... snip ... libtool: link: ( cd .libs rm -f libstonith.la ln -s ../libstonith.la libstonith.la ) gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../include -I../../include - I../../include -I../../linux-ha -I../../linux-ha -I../../libltdl - I../../libltdl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include - I/usr/include/libxml2 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align - Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat- security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing- declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno- strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi - D_GNU_SOURCE -DANSI_ONLY -Werror -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c cc1: warnings being treated as errors main.c:408: Fehler: kein vorheriger Prototyp für »setup_cl_log« gmake[2]: *** [main.o] Fehler 1 gmake[2]: Leaving directory `/root/neueRPMs/ha/sources/Reusable-Cluster- Components-glue--0ff4e044f1be/lib/stonith' gmake[1]: *** [all-recursive] Fehler 1 gmake[1]: Leaving directory `/root/neueRPMs/ha/sources/Reusable-Cluster- Components-glue--0ff4e044f1be/lib' make: *** [all-recursive] Fehler 1 OS: SLES11/SP1 cluster-glue version: 1.0.7 (Build: 0ff4e044f1be0138e8273a98c9fbee95b643bcf7) What I'm missing? TIA! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Colocation of VIP and httpd
Hi, any chance to update to version 3? 2.1.3 is really very old buggy! HTH Nikita Michalko Am Donnerstag 19 Mai 2011 19:25:54 schrieb 吴鸿宇: Hi All, I have a 2 node cluster. My intention is ensuring the VIP is always on the node that has httpd running, i.e. if service httpd on the VIP node is stopped and fails to start, the VIP should switch to the other node. With the configuration below, I observed that when httpd stops and fails to start, the VIP is stopped also but is not switched to the other node that has healthy httpd. I appreciate any ideas. cib generated=false admin_epoch=0 have_quorum=true ignore_dtd=false num_peers=0 cib_feature_revision=2.0 epoch=28 num_updates=1 cib-last-written=Thu May 19 08:48:49 2011 ccm_transition=1 configuration crm_config cluster_property_set id=cib-bootstrap-options attributes nvpair id=cib-bootstrap-options-dc-version name=dc-version value=2.1.3-node: */ nvpair id=cib-bootstrap-options-cluster-delay name=cluster-delay value=60s/ nvpair id=cib-bootstrap-options-default-resource-stickiness name=default-resource-stickiness value=INFINITY/ /attributes /cluster_property_set /crm_config nodes node id=* uname=node1 type=normal/ node id=* uname=node2 type=normal/ /nodes resources primitive id=vip class=ocf type=IPaddr provider=heartbeat operations op id=vip-check name=monitor interval=3s/ /operations instance_attributes id=* attributes nvpair id=IP1_attr_0 name=ip value=*/ nvpair id=IP1_attr_1 name=netmask value=19/ nvpair id=IP1_attr_2 name=nic value=eth0/ /attributes /instance_attributes /primitive clone id=httpd_clone instance_attributes id=7f9ba44b-5157-414d-bf12-cb94cd6bb043 attributes nvpair id=httpd-unique name=globally-unique value=false/ /attributes /instance_attributes primitive id=httpd class=lsb type=httpd operations op id=httpd_mon name=status interval=2s timeout=30s/ /operations /primitive /clone /resources constraints rsc_colocation id=httpd_on_vip to=httpd_clone from=vip score=INFINITY/ rsc_order id=order from=vip to=httpd_clone/ /constraints /configuration /cib Thanks a lot, Henry ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] serial or bcast
Hi, Am Dienstag 24 Mai 2011 02:05:52 schrieb Hai Tao: can someone answer this: if I use ucast, the following ip should be the ip of its local interface or of the peer's ip? for example, ucast eth0 10.0.0.5 (this IP is the local IP or the peer's IP?) - the peer's IP! HTH Nikita Michalko Thanks. From: taoh...@hotmail.com To: linux-ha@lists.linux-ha.org Date: Mon, 23 May 2011 10:51:06 -0700 Subject: Re: [Linux-HA] serial or bcast if I use ucast, the following ip should be the ip of its local interface or of the peer's ip? Thanks. Hai Tao Date: Sun, 22 May 2011 19:17:08 +0200 From: lars.ellenb...@linbit.com To: linux-ha@lists.linux-ha.org Subject: Re: [Linux-HA] serial or bcast On Sun, May 22, 2011 at 12:54:09AM -0700, Hai Tao wrote: If serial and bcast (or ucast) coexist in the ha.cf file, which device the heartbeat actually use? Heartbeat always sends all communication down all available paths. Communication paths may fail independently, and typically will be recovered as soon as they work again. Cluster communication is lost only if all paths fail, and no working path or communication channel remains. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] attribute migration-threshold does not exist
Hi all! I'm trying to configure a new 2 node cluster with crm and facing with the following errors: crm(bblu)configure# verify crm_verify[8202]: 2011/01/20_14:49:07 WARN: unpack_nodes: Blind faith: not fencing unseen nodes ERROR: cib-bootstrap-options: attribute migration-threshold does not exist Configuration: crm(bblu)configure# show primitive IPaddr_192_168_150_54 ocf:heartbeat:IPaddr \ op monitor interval=60s timeout=60s \ params ip=192.168.150.54 cidr_netmask=24 broadcast=192.168.150.63 primitive IPaddr_193_xx_xx_xx ocf:heartbeat:IPaddr \ op monitor interval=60s timeout=60s \ params ip=193.xx_xx_xx cidr_netmask=26 broadcast=193.xx.xx.63 group group_1 IPaddr_193_xx_xx_xx IPaddr_192_168_150_54 location rsc_location_group_1 group_1 \ rule $id=prefered_location_group_1 1: #uname eq bluedam property $id=cib-bootstrap-options \ stonith-enabled=false \ symmetric-cluster=true \ no-quorum-policy=ignore \ migration-threshold=3 \ stonith-action=reboot \ startup-fencing=false \ stop-orphan-resources=true \ stop-orphan-actions=true \ remove-after-stop=false \ default-action-timeout=110s \ is-managed-default=true \ cluster-delay=60s \ pe-error-series-max=-1 \ pe-warn-series-max=-1 \ pe-input-series-max=-1 \ cluster-infrastructure=Heartbeat Versions: heartbeat-3.0.3 cluster-glue-1.0.1 resource-agents-1.0.3 pacemaker-1.0.10 SLES11/SP1 Is the attribute migration-threshold no more available? TIA Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] multipath'ing on debian
Hi Oliver, are you sure you are on the right list: [Linux-HA]? Cheers! Nikita Michalko Am Montag, 6. Dezember 2010 11:31 schrieb Linux Cook: Hi mga pluggers, I've just configured multipathing on my debian boxes (Server A and Server B) using HP StorageWorks with Dual FCs on each server and can now mount the path alias I defined on my multipath configuration. But everytime I write a data on Server A, the data are not reflecting on Server B. Any help? Oliver Cook ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resource appears to be active on two nodes
Hi, Am Donnerstag, 2. Dezember 2010 10:30 schrieb bharat khandelwal: Preeti Jain Preeti_8644 at yahoo.com writes: Andrew Beekhof andrew at beekhof.net writes: but still which version i should move, can i do it with heartbeat 2.1.4 without pacemaker no, 2.1.4 was never supported which version to use now if it 2.1.4 does not support and how to get pacemaker but that version should support suse 10 (x86_64) - we use HA V.2.1.4 on SLES 10 - SP3. Itt's fairly old but for us still working ... As posted by Lars Marowsky-Bree: The 2.1.4 release is for those users who, for some reason, cannot yet upgrade to the Pacemaker version. It's intended to be the last final release in the 2.1.4 branch.Please use Pacemaker, if you can. So google after Announcing: heartbeat 2.1.4 a little... On SLES11 we use Heartbeat 3.0.2 + Pacemaker - you need to compile the sources from repository though: http://clusterlabs.org/rpm/opensuse-11.1/clusterlabs.repo HTH Nikita Michalko thanks bharat ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resource appears to be active on two nodes
Hi, Am Donnerstag, 2. Dezember 2010 12:23 schrieb Preeti Jain: Nikita Michalko michalko.system at a-i-p.com writes: which version to use now if it 2.1.4 does not support and how to get pacemaker but that version should support suse 10 (x86_64) - we use HA V.2.1.4 on SLES 10 - SP3. Itt's fairly old but for us still working ... As posted by Lars Marowsky-Bree: The 2.1.4 release is for those users who, for some reason, cannot yet upgrade to the Pacemaker version. It's intended to be the last final release in the 2.1.4 branch.Please use Pacemaker, if you can. So google after Announcing: heartbeat 2.1.4 a little... On SLES11 we use Heartbeat 3.0.2 + Pacemaker - you need to compile the sources from repository though: http://clusterlabs.org/rpm/opensuse-11.1/clusterlabs.repo HTH Nikita Michalko So that means no option on SLES 10 but there i saw many versions which one to use is there any pacemaker version available with SLES 10 ?... - only HA V.2.0.7: but this was very buggy old ! Why do you stick on that (also very old) SLES10? Any chance to upgrade at least to SP3? Nikita Michalko Preeti Jain ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How to monitor the nic link status
Hi Pavlos! Am Dienstag, 30. November 2010 22:28 schrieb Pavlos Parissis: Hi Nikita, On 30 November 2010 08:42, Nikita Michalko michalko.sys...@a-i-p.com wrote: Hi Pavlos, Am Dienstag, 30. November 2010 05:59 schrieb Pavlos Parissis: On 29 November 2010 23:43, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Mon, Nov 29, 2010 at 10:24:17PM +0800, Mia Lueng wrote: Hi: I have configured a cluster with two nodes. Lan setting is A eth0: 192.168.10.110 eth1: 172.16.0.1 B eth0:192.168.10.111 eth1: 172.16.0.2 I have configured a resource ip_0 192.168.10.100 on eth0. But when I unplug the eth0 link on A, the resource can not be taken over to B and no any log output. I've checked the /usr/lib/ocf/resource.d/heartbeat/IPaddr2 script and found there are no codes for nic link status checking. How can i monitor the nic link status to protect the virtual ip address? Thanks. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Expl ain ed/ch09s03s03.html I am sorry this is not the expected behaviour, at least to me. I expect from the IPaddr2 to report a failure in a case the interface is not available. What is the point to maintain an IPaddr2 resource on interface which is not up? Furthermore, using the ping resource adds more complexity and basically utilizes Layer 3 protocol in order to monitor a layer 2 device ( the NIC). Using ip link show on the device should a very easy way to check link status on the NIC, ip tool supports also LOWER_UP. - what about configure monitor operation of IP in cib.xml - sth. like this: resources primitive id=IPaddr_194_37_40_42 class=ocf provider=heartbeat type=IPaddr meta_attributes id=primitive-IPaddr_194_37_40_42meta/ operations op name=monitor interval=60s id=IPaddr_194_37_40_42_mon timeout=60s/ /operations - it works for me very well ;-) Mia has reported on this thread that having monitor enabled is not enough for reporting problems on the link. What do you mean works for you? Have you tried to pull out the network cable from an interface on which you have an IPaddr2 resource running? - I use the IPaddr RA, not IPaddr2. And of course did I test it thoroughly - succesfully ;-) You can also use ifconfig ethx down to simulate cable pull out ... Does that action cause a failure on the IPaddr2 resource? - on IPaddr YES! Cheers, Pavlos ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- AIP - Dr. Nikita Michalko | Main office at: | Working hours: 8:00-16:30 Tel: +43 1 408 35 57-14 | A -1160 Wien| except on Fri: 7:30-14:30 Fax: +43 1 408 35 57-26 | Grundsteing. 40 | michalko.sys...@a-i-p.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How to monitor the nic link status
Hi Pavlos, Am Dienstag, 30. November 2010 05:59 schrieb Pavlos Parissis: On 29 November 2010 23:43, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Mon, Nov 29, 2010 at 10:24:17PM +0800, Mia Lueng wrote: Hi: I have configured a cluster with two nodes. Lan setting is A eth0: 192.168.10.110 eth1: 172.16.0.1 B eth0:192.168.10.111 eth1: 172.16.0.2 I have configured a resource ip_0 192.168.10.100 on eth0. But when I unplug the eth0 link on A, the resource can not be taken over to B and no any log output. I've checked the /usr/lib/ocf/resource.d/heartbeat/IPaddr2 script and found there are no codes for nic link status checking. How can i monitor the nic link status to protect the virtual ip address? Thanks. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explain ed/ch09s03s03.html I am sorry this is not the expected behaviour, at least to me. I expect from the IPaddr2 to report a failure in a case the interface is not available. What is the point to maintain an IPaddr2 resource on interface which is not up? Furthermore, using the ping resource adds more complexity and basically utilizes Layer 3 protocol in order to monitor a layer 2 device ( the NIC). Using ip link show on the device should a very easy way to check link status on the NIC, ip tool supports also LOWER_UP. - what about configure monitor operation of IP in cib.xml - sth. like this: resources primitive id=IPaddr_194_37_40_42 class=ocf provider=heartbeat type=IPaddr meta_attributes id=primitive-IPaddr_194_37_40_42meta/ operations op name=monitor interval=60s id=IPaddr_194_37_40_42_mon timeout=60s/ /operations - it works for me very well ;-) Nikita Michalko Cheers, Pavlos ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Reusable-Cluster-Components-glue: make error on 32-bit box
Hi Lars, thank you for your reply! Am Mittwoch, 27. Oktober 2010 11:59 schrieb Lars Ellenberg: On Fri, Oct 22, 2010 at 02:57:48PM +0200, Nikita Michalko wrote: Hi all! I know itt's annoying to do today something on the 32-bit server (SLES11/SP1), but I need it for testing purposes. After configuring the package Reusable-Cluster-Components-glue-1.0.6 with: ./autogen.sh ./configure --enable-fatal-warnings=no --prefix=/usr --sysconfdir=/etc --sharedstatedir=/var/lib/heartbeat/com --localstatedir=/var it seems to be an error on make: /usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: i386:x86-64 architecture of input file `../../replace/.libs/libreplace.a(NoSuchFunctionName.o)' is incompatible with i386 output make clean ? - didn't help ;-( - only small diference at the end: ... /bin/sh ../../libtool --tag=CC --tag=CC --mode=link gcc -std=gnu99 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -o ipctest ipctest.o libplumb.la ../../replace/libreplace.la ../../lib/pils/libpils.la -lbz2 -lxml2 -lc -lrt -ldl -lglib-2.0 -lltdl libtool: link: gcc -std=gnu99 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -o .libs/ipctest ipctest.o ./.libs/libplumb.so /root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib/pils/.libs/libpils.so ../../replace/.libs/libreplace.a ../../lib/pils/.libs/libpils.so -lbz2 /usr/lib/libxml2.so -lz -lm -lc -lrt -lglib-2.0 /usr/lib/libltdl.so -ldl ./.libs/libplumb.so: undefined reference to `uuid_parse' ./.libs/libplumb.so: undefined reference to `uuid_generate' ./.libs/libplumb.so: undefined reference to `uuid_copy' ./.libs/libplumb.so: undefined reference to `uuid_is_null' ./.libs/libplumb.so: undefined reference to `uuid_unparse' ./.libs/libplumb.so: undefined reference to `uuid_clear' ./.libs/libplumb.so: undefined reference to `uuid_compare' collect2: ld returned 1 exit status gmake[2]: *** [ipctest] Fehler 1 gmake[2]: Leaving directory `/root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib/clplumbing' gmake[1]: *** [all-recursive] Fehler 1 Regards Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Reusable-Cluster-Components-glue: make error on 32-bit box
Am Mittwoch, 27. Oktober 2010 12:52 schrieb Lars Ellenberg: On Wed, Oct 27, 2010 at 12:15:34PM +0200, Nikita Michalko wrote: Hi Lars, thank you for your reply! Am Mittwoch, 27. Oktober 2010 11:59 schrieb Lars Ellenberg: On Fri, Oct 22, 2010 at 02:57:48PM +0200, Nikita Michalko wrote: Hi all! I know itt's annoying to do today something on the 32-bit server (SLES11/SP1), but I need it for testing purposes. After configuring the package Reusable-Cluster-Components-glue-1.0.6 with: ./autogen.sh ./configure --enable-fatal-warnings=no --prefix=/usr --sysconfdir=/etc --sharedstatedir=/var/lib/heartbeat/com --localstatedir=/var it seems to be an error on make: /usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: i386:x86-64 architecture of input file `../../replace/.libs/libreplace.a(NoSuchFunctionName.o)' is incompatible with i386 output make clean ? - didn't help ;-( - only small diference at the end: Oh, it did. You now just need to install the build dependencies ;-) You could, of course, just use pre-build packages. ... /bin/sh ../../libtool --tag=CC --tag=CC --mode=link gcc -std=gnu99 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -o ipctest ipctest.o libplumb.la ../../replace/libreplace.la ../../lib/pils/libpils.la -lbz2 -lxml2 -lc -lrt -ldl -lglib-2.0 -lltdl libtool: link: gcc -std=gnu99 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -o .libs/ipctest ipctest.o ./.libs/libplumb.so /root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib/pi ls/.libs/libpils.so ../../replace/.libs/libreplace.a ../../lib/pils/.libs/libpils.so -lbz2 /usr/lib/libxml2.so -lz -lm -lc -lrt -lglib-2.0 /usr/lib/libltdl.so -ldl ./.libs/libplumb.so: undefined reference to `uuid_parse' How about you google that, and follow the first hit? - Ahh, very well - I forgot already my first attempt on 64-bit box with all that dependancies ... Thank you! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Reusable-Cluster-Components-glue: make error on 32-bit box
Hi all! I know itt's annoying to do today something on the 32-bit server (SLES11/SP1), but I need it for testing purposes. After configuring the package Reusable-Cluster-Components-glue-1.0.6 with: ./autogen.sh ./configure --enable-fatal-warnings=no --prefix=/usr --sysconfdir=/etc --sharedstatedir=/var/lib/heartbeat/com --localstatedir=/var it seems to be an error on make: ... /bin/sh ../../libtool --tag=CC --tag=CC --mode=link gcc -std=gnu99 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -version-info 2:0:0 -o libpils.la -rpath /usr/lib pils.lo ../../replace/libreplace.la -lbz2 -lxml2 -lc -lrt -ldl -lglib-2.0 -lltdl libtool: link: gcc -std=gnu99 -shared .libs/pils.o -Wl,--whole-archive ../../replace/.libs/libreplace.a -Wl,--no-whole-archive -lbz2 /usr/lib/libxml2.so -lz -lm -lc -lrt -lglib-2.0 /usr/lib/libltdl.so -ldl-Wl,-soname -Wl,libpils.so.2 -o .libs/libpils.so.2.0.0 /usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: i386:x86-64 architecture of input file `../../replace/.libs/libreplace.a(NoSuchFunctionName.o)' is incompatible with i386 output /usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: final link failed: Invalid operation collect2: ld returned 1 exit status gmake[2]: *** [libpils.la] Fehler 1 gmake[2]: Leaving directory `/root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib/pils' gmake[1]: *** [all-recursive] Fehler 1 gmake[1]: Leaving directory `/root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib' make: *** [all-recursive] Fehler 1 Configuring with --build=x86 didn't help Many thanks in advance for the reply! ... Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] superfluous dependency in heartbeat spec file
Hi Vadym, can I apply this also on SLES10/SLES11 ? TIA Nikita Michalko Am Dienstag, 12. Oktober 2010 14:29 schrieb Vadym Chepkov: It was brought up in pacemaker mail list but this applies to heartbeat rpm packaging as well. Libraries do not depend on base package, they are independent. This is how one can install several version of the same library (compat- packages) Also it is possible to use heartbeat libraries without using heartbeat daemon itself. (if one uses pacemaker with corosync, for instance). Vadym # HG changeset patch # User Vadym Chepkov vchep...@gmail.com # Date 1286886305 14400 # Node ID f1aea427d2c01756e06b4b917787c21ee440f24b # Parent 82fc843fbcf9733e50bbc169c95e51b6c7f97c54 Fix package inter-dependencies diff -r 82fc843fbcf9 -r f1aea427d2c0 heartbeat-fedora.spec --- a/heartbeat-fedora.spec Mon Oct 04 22:12:37 2010 +0200 +++ b/heartbeat-fedora.spec Tue Oct 12 08:25:05 2010 -0400 @@ -40,6 +40,7 @@ BuildRequires:which BuildRequires:cluster-glue-libs-devel BuildRequires:libxslt docbook-dtds docbook-style-xsl +Requires: heartbeat-libs = %{version}-%{release} Requires: PyXML Requires: resource-agents Requires: cluster-glue-libs @@ -81,7 +82,6 @@ %package libs Summary: Heartbeat libraries Group:System Environment/Daemons -Requires: heartbeat = %{version}-%{release} %description libs Heartbeat library package @@ -89,7 +89,7 @@ %package devel Summary:Heartbeat development package Group: System Environment/Daemons -Requires: heartbeat = %{version}-%{release} +Requires: heartbeat-libs = %{version}-%{release} %description devel Headers and shared libraries for writing programs for Heartbeat ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Emergency reboot by stonith-enabled=false
Thank you Dejan - it works now (with crm respawn) ! Cheers! Nikita Michalko Am Freitag, 8. Oktober 2010 18:38 schrieb Dejan Muhamedagic: Hi, On Fri, Oct 08, 2010 at 03:35:13PM +0200, Nikita Michalko wrote: Hi all! My very simple 2 nodes test cluster with PacemakerHeartbeat make me some headaches. Here are my versions: cluster-glue: 1.0.6 resource-agents: 1.0.3 Heartbeat STABLE: 3.0.3 pacemaker: 1.1.3 (all from wiki sources) OS: SLES11/SP1 After succesfully starting Heartbeat on the first node opter (the other node was for test not up- dead) with stonith disabled (see my configuration below) did the first node reboot. Why on my own node? Do I need stonith on symmetric cluster? Yes. HB_Report attached ... stonith-ng failed to connect to the cluster: Oct 08 13:03:27 opteron heartbeat: [10872]: WARN: Client [stonith-ng] pid 10898 failed authorization [no default client auth] Oct 08 13:03:27 opteron heartbeat: [10872]: ERROR: api_process_registration_msg: cannot add client(stonith-ng) ... Oct 08 13:03:27 opteron stonith-ng: [10898]: CRIT: main: Cannot sign in to the cluster... terminating which made heartbeat reboot. I guess that you can add sth like this to ha.cf: apiauth stonith-ng uid=root If you want to prevent reboots, use crm respawn. Thanks, Dejan My configuration: -- crm(live)# configure show node $id=5ac2b85d-802f-40a6-ad0f-38660c4a6fb0 opter node $id=caca825d-2fd9-426d-9ed7-8ff9845bc08f aipsles11 primitive IPaddr_192_168_150_54 ocf:heartbeat:IPaddr \ op monitor interval=60s timeout=60s \ params ip=192.168.150.54 cidr_netmask=24 broadcast=192.168.150.63 primitive IPaddr_19X_XX_XX_54 ocf:heartbeat:IPaddr \ op monitor interval=60s timeout=60s \ params ip=19X.XX.XX.54 cidr_netmask=26 broadcast=19X.XX.XX.63 primitive ubis_udbmain_3 lsb:ubis_udbmain \ op monitor interval=120s timeout=110s group group_1 IPaddr_19X_XX_XX_54 IPaddr_192_168_150_54 ubis_udbmain_3 location rsc_location_group_1 group_1 \ rule $id=prefered_location_group_1 1: #uname eq opter property $id=cib-bootstrap-options \ symmetric-cluster=true \ no-quorum-policy=ignore \ migration-threshold=3 \ stonith-enabled=false \ stonith-action=reboot \ startup-fencing=false \ stop-orphan-resources=true \ stop-orphan-actions=true \ remove-after-stop=false \ short-resource-names=true \ transition-idle-timeout=3min \ default-action-timeout=110s \ is-managed-default=true \ cluster-delay=60s \ pe-error-series-max=-1 \ pe-warn-series-max=-1 \ pe-input-series-max=-1 \ dc-version=1.1.3-7e4c0424e331aa2a51cb1efb69e80b5c8e1f8701 \ cluster-infrastructure=Heartbeat \ last-lrm-refresh=1284125385 Any ideas/comments? TIA! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Emergency reboot by stonith-enabled=false
Am Sonntag, 10. Oktober 2010 20:35 schrieb Lars Marowsky-Bree: On 2010-10-08T15:35:13, Nikita Michalko michalko.sys...@a-i-p.com wrote: Hi all! My very simple 2 nodes test cluster with PacemakerHeartbeat make me some headaches. Here are my versions: cluster-glue: 1.0.6 resource-agents: 1.0.3 Heartbeat STABLE: 3.0.3 pacemaker: 1.1.3 (all from wiki sources) OS: SLES11/SP1 Is there any specific reason why you're using heartbeat? - only customer support costs ;-) Regards Nikita Michalko openais/corosync/pacemaker are available as supported packages on the SLE 11 SP1 platform ... Regards, Lars ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Emergency reboot by stonith-enabled=false
Hi all! My very simple 2 nodes test cluster with PacemakerHeartbeat make me some headaches. Here are my versions: cluster-glue: 1.0.6 resource-agents: 1.0.3 Heartbeat STABLE: 3.0.3 pacemaker: 1.1.3 (all from wiki sources) OS: SLES11/SP1 After succesfully starting Heartbeat on the first node opter (the other node was for test not up- dead) with stonith disabled (see my configuration below) did the first node reboot. Why on my own node? Do I need stonith on symmetric cluster? HB_Report attached ... My configuration: -- crm(live)# configure show node $id=5ac2b85d-802f-40a6-ad0f-38660c4a6fb0 opter node $id=caca825d-2fd9-426d-9ed7-8ff9845bc08f aipsles11 primitive IPaddr_192_168_150_54 ocf:heartbeat:IPaddr \ op monitor interval=60s timeout=60s \ params ip=192.168.150.54 cidr_netmask=24 broadcast=192.168.150.63 primitive IPaddr_19X_XX_XX_54 ocf:heartbeat:IPaddr \ op monitor interval=60s timeout=60s \ params ip=19X.XX.XX.54 cidr_netmask=26 broadcast=19X.XX.XX.63 primitive ubis_udbmain_3 lsb:ubis_udbmain \ op monitor interval=120s timeout=110s group group_1 IPaddr_19X_XX_XX_54 IPaddr_192_168_150_54 ubis_udbmain_3 location rsc_location_group_1 group_1 \ rule $id=prefered_location_group_1 1: #uname eq opter property $id=cib-bootstrap-options \ symmetric-cluster=true \ no-quorum-policy=ignore \ migration-threshold=3 \ stonith-enabled=false \ stonith-action=reboot \ startup-fencing=false \ stop-orphan-resources=true \ stop-orphan-actions=true \ remove-after-stop=false \ short-resource-names=true \ transition-idle-timeout=3min \ default-action-timeout=110s \ is-managed-default=true \ cluster-delay=60s \ pe-error-series-max=-1 \ pe-warn-series-max=-1 \ pe-input-series-max=-1 \ dc-version=1.1.3-7e4c0424e331aa2a51cb1efb69e80b5c8e1f8701 \ cluster-infrastructure=Heartbeat \ last-lrm-refresh=1284125385 Any ideas/comments? TIA! Nikita Michalko hb-report_3.tar.bz2 Description: application/tbz ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Complile error Reusable-Cluster-Components-glue-1.0.6
Thank you Dejan, with the option --enable-fatal-warnings=no it works! Cheers! Nikita Michalko Am Mittwoch, 6. Oktober 2010 16:57 schrieb Dejan Muhamedagic: Hi, On Wed, Oct 06, 2010 at 02:30:34PM +0200, Nikita Michalko wrote: Hi all! After downloading the sources Reusable-Cluster-Components-glue-1.0.6.tar.bz2 from http://www.linux-ha.org/wiki/Download and configuring with: ./configure --with-heartbeat I have a small problem by compiling source with make: snip--- ... libtool: link: ranlib .libs/libstonith.a libtool: link: rm -fr .libs/libstonith.lax libtool: link: ( cd .libs rm -f libstonith.la ln -s ../libstonith.la libstonith.la ) gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../include -I../../include -I../../include -I../../linux-ha -I../../linux-ha -I../../libltdl -I../../libltdl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/libxml2 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -Werror -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c cc1: warnings being treated as errors main.c:64: error: function declaration isn't a prototype main.c:78: error: function declaration isn't a prototype gmake[2]: *** [main.o] Fehler 1 gmake[2]: Leaving directory `/root/neueRPMs/ha/303/sourc/wiki/Reusable-Cluster-Components-glue-1.0.6/ lib/stonith' gmake[1]: *** [all-recursive] Fehler 1 gmake[1]: Leaving directory `/root/neueRPMs/ha/303/sourc/wiki/Reusable-Cluster-Components-glue-1.0.6/ lib' make: *** [all-recursive] Fehler 1 OS: SLES11/SP1, 64bit box Maybe some libraries missing? TIA! Hmpf, no libraries missing, it's something I missed and which got fixed a week after the release. The changeset is 8286b46c91e3. At any rate you can try: ./configure --enable_fatal_warnings=no --with-heartbeat I believe that that's the right incantation. Thanks, Dejan Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Complile error Reusable-Cluster-Components-glue-1.0.6
Hi all! After downloading the sources Reusable-Cluster-Components-glue-1.0.6.tar.bz2 from http://www.linux-ha.org/wiki/Download and configuring with: ./configure --with-heartbeat I have a small problem by compiling source with make: snip--- ... libtool: link: ranlib .libs/libstonith.a libtool: link: rm -fr .libs/libstonith.lax libtool: link: ( cd .libs rm -f libstonith.la ln -s ../libstonith.la libstonith.la ) gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../include -I../../include -I../../include -I../../linux-ha -I../../linux-ha -I../../libltdl -I../../libltdl -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/libxml2 -g -O2 -ggdb3 -O0 -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -Werror -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c cc1: warnings being treated as errors main.c:64: error: function declaration isn't a prototype main.c:78: error: function declaration isn't a prototype gmake[2]: *** [main.o] Fehler 1 gmake[2]: Leaving directory `/root/neueRPMs/ha/303/sourc/wiki/Reusable-Cluster-Components-glue-1.0.6/lib/stonith' gmake[1]: *** [all-recursive] Fehler 1 gmake[1]: Leaving directory `/root/neueRPMs/ha/303/sourc/wiki/Reusable-Cluster-Components-glue-1.0.6/lib' make: *** [all-recursive] Fehler 1 OS: SLES11/SP1, 64bit box Maybe some libraries missing? TIA! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] crmd : CCM Connection failed
Hi, Am Freitag, 24. September 2010 17:28 schrieb sunitha kumar: Hi Nikita, Thanks for your response. The permissions on cib.xml look right. -rw--- 1 hacluster haclient 1487 Sep 23 16:32 /var/lib/heartbeat/crm/cib.xml Is this issue fixed in heartbeat-3.0.3 ? I dont know if that was an issue, but Ihad a similar problem by upgrade from V.2.1.3. After installing the new version AND upgrading CIB - problem disapeared. Look at : http://www.linux-ha.org/wiki/Releases. Cheers! Nikita Michalko thanks, -sunitha On Thu, Sep 23, 2010 at 11:02 PM, Nikita Michalko michalko.sys...@a-i-p.com wrote: Hi, Am Freitag, 24. September 2010 03:31 schrieb sunitha kumar: service heartbeat status heartbeat OK [pid 23886 et al] is running On : service heartbeat restart, the logs show that CCM Connection failed. Any pointers? thnx -sunitha cib: [3301]: WARN: ccm_connect: CCM Activation failed cib: [3301]: WARN: ccm_connect: CCM Connection failed 21 times (30 max) crmd: [3305]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! crmd: [3305]: info: do_cib_control: Could not connect to the CIB service: connection failed crmd: [3305]: WARN: do_cib_control: Couldn't complete CIB registration 21 times... pause and retry cib: [3301]: info: ccm_connect: Registering with CCM... cib: [3301]: WARN: ccm_connect: CCM Activation failed cib: [3301]: WARN: ccm_connect: CCM Connection failed 22 times (30 max) crmd: [3305]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! crmd: [3305]: info: do_cib_control: Could not connect to the CIB service: connection failed crmd: [3305]: WARN: do_cib_control: Couldn't complete CIB registration 22 times... pause and retry .. crmd: [23896]: WARN: do_ccm_control: CCM Activation failed crmd: [23896]: WARN: do_ccm_control: CCM Connection failed 29 times (30 max) crmd: [23896]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! crmd: [23896]: WARN: do_ccm_control: CCM Activation failed crmd: [23896]: ERROR: do_ccm_control: CCM Activation failed 30 (max) times This is on: pacemaker-mgmt-client-1.99.2-6.1 pacemaker-libs-1.0.5-4.1 pacemaker-1.0.5-4.1 pacemaker-mgmt-1.99.2-6.1 pacemaker-libs-devel-1.0.5-4.1 pacemaker-mgmt-devel-1.99.2-6.1 heartbeat-3.0.0-33.2 heartbeat-devel-3.0.0-33.2 - any chance to upgrade to the latest versions (heartbeat-3.0..3)? Are the permissions to the /var/lib/heartbeat/crm/cib.xml OK? HTH Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] crmd : CCM Connection failed
Hi, Am Freitag, 24. September 2010 03:31 schrieb sunitha kumar: service heartbeat status heartbeat OK [pid 23886 et al] is running On : service heartbeat restart, the logs show that CCM Connection failed. Any pointers? thnx -sunitha cib: [3301]: WARN: ccm_connect: CCM Activation failed cib: [3301]: WARN: ccm_connect: CCM Connection failed 21 times (30 max) crmd: [3305]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! crmd: [3305]: info: do_cib_control: Could not connect to the CIB service: connection failed crmd: [3305]: WARN: do_cib_control: Couldn't complete CIB registration 21 times... pause and retry cib: [3301]: info: ccm_connect: Registering with CCM... cib: [3301]: WARN: ccm_connect: CCM Activation failed cib: [3301]: WARN: ccm_connect: CCM Connection failed 22 times (30 max) crmd: [3305]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! crmd: [3305]: info: do_cib_control: Could not connect to the CIB service: connection failed crmd: [3305]: WARN: do_cib_control: Couldn't complete CIB registration 22 times... pause and retry .. crmd: [23896]: WARN: do_ccm_control: CCM Activation failed crmd: [23896]: WARN: do_ccm_control: CCM Connection failed 29 times (30 max) crmd: [23896]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! crmd: [23896]: WARN: do_ccm_control: CCM Activation failed crmd: [23896]: ERROR: do_ccm_control: CCM Activation failed 30 (max) times This is on: pacemaker-mgmt-client-1.99.2-6.1 pacemaker-libs-1.0.5-4.1 pacemaker-1.0.5-4.1 pacemaker-mgmt-1.99.2-6.1 pacemaker-libs-devel-1.0.5-4.1 pacemaker-mgmt-devel-1.99.2-6.1 heartbeat-3.0.0-33.2 heartbeat-devel-3.0.0-33.2 - any chance to upgrade to the latest versions (heartbeat-3.0..3)? Are the permissions to the /var/lib/heartbeat/crm/cib.xml OK? HTH Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3
Am Dienstag, 14. September 2010 12:16 schrieb Florian Haas: On 2010-09-14 12:13, Nikita Michalko wrote: Hi Florian, thank you very much for the link to webinar - very good work! I have tried that on SLES in the meantime, but facing with the following issue: Download heartbeat 3.0.3_STABLE from http://www.linux-ha.org/wiki/Download starts and then immediately comes error: Verbindung zu Rechner hg.linux-ha.org ist unterbrochen Works just fine for me. Some upstream proxy getting in the way? In case anyone else is having this issue, please speak up now. Really - it was the konqueror : with Firefox it's working fine. Sorry for the noise ... Cheers, Nikita Cheers, Florian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3
Hi Florian, thank you very much for the link to webinar - very good work! I have tried that on SLES in the meantime, but facing with the following issue: Download heartbeat 3.0.3_STABLE from http://www.linux-ha.org/wiki/Download starts and then immediately comes error: Verbindung zu Rechner hg.linux-ha.org ist unterbrochen Other possibilities? Regards Nikita Michalko Am Freitag, 10. September 2010 13:17 schrieb Florian Haas: On 2010-09-10 09:05, Nikita Michalko wrote: Am Donnerstag, 9. September 2010 13:31 schrieb Tim Serong: On 9/9/2010 at 04:38 PM, Nikita Michalko michalko.sys...@a-i-p.com wrote: Am Donnerstag, 9. September 2010 07:09 schrieb Tim Serong: It looks like Pacemaker in network:ha-clustering builds without Heartbeat support (there's no Heartbeat in that repo, so no current source for heartbeat-devel). That being said, I suspect Pacemaker in the openSUSE repos hasn't built with Heartbeat support for some time (seems to be disabled in the spec file for Pacemaker 1.0.x from the openSUSE:11.1 repo, for example). Does it mean I should build the new RPMs from sources? Maybe.. But if you don't want to have to do that, you might try the RPMs from http://www.clusterlabs.org/rpm/ - you may find the openSUSE 11.1 or 11.2 RPMs are installable on SLES (although I haven't tried this myself, I tried it already of course - installed with zypper, but then can't start it: problem with pacemaker (Signon to CIB failed: connection failed ...) In the meantime I installed the pacemaker + heratbeat from scratch and just now testing it ... Even though you're not on Debian, this webinar may still be of help wrt the Heartbeat and Pacemaker upgrade process: http://www.linbit.com/en/training/on-demand-webinars/upgrading-to-pacemaker -on-debian-squeeze/ Cheers, Florian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3
Am Donnerstag, 9. September 2010 13:31 schrieb Tim Serong: On 9/9/2010 at 04:38 PM, Nikita Michalko michalko.sys...@a-i-p.com wrote: Am Donnerstag, 9. September 2010 07:09 schrieb Tim Serong: It looks like Pacemaker in network:ha-clustering builds without Heartbeat support (there's no Heartbeat in that repo, so no current source for heartbeat-devel). That being said, I suspect Pacemaker in the openSUSE repos hasn't built with Heartbeat support for some time (seems to be disabled in the spec file for Pacemaker 1.0.x from the openSUSE:11.1 repo, for example). Does it mean I should build the new RPMs from sources? Maybe.. But if you don't want to have to do that, you might try the RPMs from http://www.clusterlabs.org/rpm/ - you may find the openSUSE 11.1 or 11.2 RPMs are installable on SLES (although I haven't tried this myself, I tried it already of course - installed with zypper, but then can't start it: problem with pacemaker (Signon to CIB failed: connection failed ...) In the meantime I installed the pacemaker + heratbeat from scratch and just now testing it ... and I would still generally encourage people on SLES to use SLE HAE, although I understand you want to continue to use Heartbeat, which makes that a problem :)) Another possibility, if you (or anyone else) is in a position to get a current version of heartbeat building on build.opensuse.org, I can help to get it included in the network:ha-clustering repo (I'm just not really That would be great !! able to do any packaging or testing of it myself). The SLE HAE product replaced Heartbeat with openAIS when SLES 11 was Yes, I know it, we want stay by Heartbeat Pacemaker in the production though ... released (this is now corosync+openais on SLE 11 SP1), so I'm curious to know what OS you're upgrading from, if you previously had heartbeat 2.1.3 running. That was the SLES10 SP2, but with the HA version of heartbeat-resources-2.1.3-23.1 (pacemaker-heartbeat-0.6.5-8.2) OK, understood. Regards, Tim Regards, Nikita ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3
Hi Tim, thank you for reply. See my answer below ... Am Donnerstag, 9. September 2010 07:09 schrieb Tim Serong: On 9/8/2010 at 10:40 PM, Nikita Michalko michalko.sys...@a-i-p.com wrote: Helo list! I am trying now to upgrade heartbeat 2.1.3 (pacemaker 0.6) to 3.0.3 on SLES11/SP1. After installing the new RPM's from http://download.opensuse.org/repositories/network:/ha-clustering/SLE_11_S P1/x 86_64/ I see the following errors in the ha-log: ... WARN: do_cib_control: Couldn't complete CIB registration 30 times... pause and retry ERROR: do_cib_control: Could not complete CIB registration 30 times... hard error ... The cib.xml has proper rights (I think): -rw--- 1 hacluster haclient 3474 2010-09-08 12:25 /var/lib/heartbeat/crm/cib.xml Verifying CIB: crm_verify -VVV -x cib.xml crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading (null)-style configuration to pacemaker-0.6 with no-op crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading transitional-0.6-style configuration to pacemaker-1.0 with /usr/share/pacemake r/upgrade06.xsl crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading pacemaker-1.1-style configuration to pacemaker-1.2 with no-op crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgraded from none to pacemaker-1.2 validation crm_verify[24558]: 2010/09/08_13:20:17 WARN: cluster_status: We do not have quorum - fencing and resource management disabled With crm I can not change anything in cib.xml: crm configure Signon to CIB failed: connection failed Init failed, could not perform requested operations ERROR: cannot parse xml: no element found: line 1, column 0 Installed SW/versions: heartbeat-3.0.3-2.14 Where did that version of Heartbeat come from? It's not present in the openSUSE network:ha-clustering repo (actually, there is no version of heartbeat present in that repo). libgssglue1-0.1-6.22 libglue2-1.0.6-2.1 cluster-glue-1.0.6-2.1 resource-agents-1.0.3-4.2 pacemaker-1.1.2.1-5.1 My cib.xml and ha-log are attached. I suppose my CIB is wrong. How can I update the old cib.xml? Could someone point me pls to the right upgrade sequence/documentation? It looks like Pacemaker in network:ha-clustering builds without Heartbeat support (there's no Heartbeat in that repo, so no current source for heartbeat-devel). That being said, I suspect Pacemaker in the openSUSE repos hasn't built with Heartbeat support for some time (seems to be disabled in the spec file for Pacemaker 1.0.x from the openSUSE:11.1 repo, for example). Does it mean I should build the new RPMs from sources? The SLE HAE product replaced Heartbeat with openAIS when SLES 11 was Yes, I know it, we want stay by Heartbeat Pacemaker in the production though ... released (this is now corosync+openais on SLE 11 SP1), so I'm curious to know what OS you're upgrading from, if you previously had heartbeat 2.1.3 running. That was the SLES10 SP2, but with the HA version of heartbeat-resources-2.1.3-23.1 (pacemaker-heartbeat-0.6.5-8.2) Regards, Tim Best regards Nikita ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3
Helo list! I am trying now to upgrade heartbeat 2.1.3 (pacemaker 0.6) to 3.0.3 on SLES11/SP1. After installing the new RPM's from http://download.opensuse.org/repositories/network:/ha-clustering/SLE_11_SP1/x86_64/ I see the following errors in the ha-log: ... WARN: do_cib_control: Couldn't complete CIB registration 30 times... pause and retry ERROR: do_cib_control: Could not complete CIB registration 30 times... hard error ... The cib.xml has proper rights (I think): -rw--- 1 hacluster haclient 3474 2010-09-08 12:25 /var/lib/heartbeat/crm/cib.xml Verifying CIB: crm_verify -VVV -x cib.xml crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading (null)-style configuration to pacemaker-0.6 with no-op crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading transitional-0.6-style configuration to pacemaker-1.0 with /usr/share/pacemake r/upgrade06.xsl crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading pacemaker-1.1-style configuration to pacemaker-1.2 with no-op crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgraded from none to pacemaker-1.2 validation crm_verify[24558]: 2010/09/08_13:20:17 WARN: cluster_status: We do not have quorum - fencing and resource management disabled With crm I can not change anything in cib.xml: crm configure Signon to CIB failed: connection failed Init failed, could not perform requested operations ERROR: cannot parse xml: no element found: line 1, column 0 Installed SW/versions: heartbeat-3.0.3-2.14 libgssglue1-0.1-6.22 libglue2-1.0.6-2.1 cluster-glue-1.0.6-2.1 resource-agents-1.0.3-4.2 pacemaker-1.1.2.1-5.1 My cib.xml and ha-log are attached. I suppose my CIB is wrong. How can I update the old cib.xml? Could someone point me pls to the right upgrade sequence/documentation? Best regards Nikita Michalko ?xml version=1.0 ? cib admin_epoch=0 epoch=0 num_updates=0 configuration crm_config cluster_property_set id=cib-bootstrap-options attributes nvpair id=cib-bootstrap-options-symmetric-cluster name=symmetric-cluster value=true/ nvpair id=cib-bootstrap-options-no-quorum-policy name=no-quorum-policy value=stop/ nvpair id=cib-bootstrap-options-default-resource-stickiness name=default-resource-stickiness value=2/ nvpair id=cib-bootstrap-options-default-resource-failure-stickiness name=default-resource-failure-stickiness value=-6/ nvpair id=cib-bootstrap-options-stonith-enabled name=stonith-enabled value=false/ nvpair id=cib-bootstrap-options-stonith-action name=stonith-action value=reboot/ nvpair id=cib-bootstrap-options-startup-fencing name=startup-fencing value=true/ nvpair id=cib-bootstrap-options-stop-orphan-resources name=stop-orphan-resources value=true/ nvpair id=cib-bootstrap-options-stop-orphan-actions name=stop-orphan-actions value=true/ nvpair id=cib-bootstrap-options-remove-after-stop name=remove-after-stop value=false/ nvpair id=cib-bootstrap-options-short-resource-names name=short-resource-names value=true/ nvpair id=cib-bootstrap-options-transition-idle-timeout name=transition-idle-timeout value=3min/ nvpair id=cib-bootstrap-options-default-action-timeout name=default-action-timeout value=110s/ nvpair id=cib-bootstrap-options-is-managed-default name=is-managed-default value=true/ nvpair id=cib-bootstrap-options-cluster-delay name=cluster-delay value=60s/ nvpair id=cib-bootstrap-options-pe-error-series-max name=pe-error-series-max value=-1/ nvpair id=cib-bootstrap-options-pe-warn-series-max name=pe-warn-series-max value=-1/ nvpair id=cib-bootstrap-options-pe-input-series-max name=pe-input-series-max value=-1/ /attributes /cluster_property_set /crm_config nodes/ resources group id=group_1 primitive class=ocf id=IPaddr_193_27_40_54 provider=heartbeat type=IPaddr operations op id=IPaddr_193_27_40_54_mon interval=60s name=monitor timeout=60s/ /operations instance_attributes id=IPaddr_193_27_40_54_inst_attr attributes nvpair id=IPaddr_193_27_40_54_attr_0 name=ip value=193.27.40.54/ nvpair id=IPaddr_193_27_40_54_attr_1 name=cidr_netmask value=26/ nvpair id=IPaddr_193_27_40_54_attr_3 name=broadcast value=193.27.40.63/ /attributes /instance_attributes /primitive primitive class=ocf id=IPaddr_192_168_163_54 provider=heartbeat type=IPaddr operations op id=IPaddr_192_168_163_54_mon interval=60s name=monitor timeout=60s/ /operations instance_attributes id=IPaddr_192_168_163_54_inst_attr attributes nvpair id=IPaddr_192_168_163_54_attr_0 name=ip value=192.168.163.54/ nvpair id=IPaddr_192_168_163_54_attr_1 name=cidr_netmask value=26/ nvpair id=IPaddr_192_168_163_54_attr_3 name=broadcast value=192.168.163.63/ /attributes /instance_attributes /primitive primitive class=lsb id=ubis_udbmain_3 provider=heartbeat type=ubis_udbmain
Re: [Linux-HA] Read only filesystem with Heart beat configuration
Hi Jayesh ! Am Dienstag, 13. Juli 2010 12:43 schrieb jayesh shinde: HI , Thanks for your reply. 1) Which is the stable version of HA on RHEL 5.2 , Can you please give me link. Sorry - for RHEL not ( I'm using SLES10), but look at http://www.linux-ha.org/wiki/Download and http://clusterlabs.org/wiki/Install#From_Source 2) Do you think that ready only is problem of HA OR this is the problem of EXT3 filesystem etc ? I was facing to similar problem on AMD/64bit server but without XEN SAN, and it was ONLY the problem of OS HD's failure. 3) How do I avoid the master-master condition in case of n/w failover. Maybe using+configuring a DRBD and Stonith? Look on the mail list with the above threads ... HTH Nikita Michalko Regards Jayesh Shinde --- On Tue, 7/13/10, Nikita Michalko michalko.sys...@a-i-p.com wrote: From: Nikita Michalko michalko.sys...@a-i-p.com Subject: Re: [Linux-HA] Read only filesystem with Heart beat configuration To: General Linux-HA mailing list linux-ha@lists.linux-ha.org Date: Tuesday, July 13, 2010, 1:17 PM Hi, any chance to upgrade to the latest version of HA? 2.1..4 is very old buggy! HTH Nikita Michalko Am Dienstag, 13. Juli 2010 09:39 schrieb jayesh shinde: Hi , Can any one please guide me for my below problem ? I am using this setup with IBM DS 8300 SAN. + HBA + multipathing. Your inputs will be valuable for me. Regards Jayesh Shinde --- On Mon, 7/12/10, jayesh shinde jayesha_shi...@yahoo.com wrote: From: jayesh shinde jayesha_shi...@yahoo.com Subject: Read only filesystem with Heart beat configuration To: linux-ha@lists.linux-ha.org Date: Monday, July 12, 2010, 5:24 PM Dear all , I am facing the problem of read only filesystem with Heart beat configuration. Here are my setup details :-- = I have installed the 2 physical server RHEL 5.2 64 bit with Xen virtualization and IBM BS 8300 SAN. The XEN kernel verison is 2.6.18-92.el5xen. The File system is EXT3. I am using this below setup for a heavy mail server where the POP3 and IMAP traffic is very high ( 1.2 lack emails per day) , I am using cyrus and postfix and ldap Each physical server conatin XEN vitual OS. Outof this one is master and other slave for me. The physical server1 IP :-- 192.168.1.1 The physical server2 IP :-- 192.168.1.2 The Xen VM machine IP under physical server1 :-- 192.168.1.10 ( Master ) The Xen VM machine IP under physical server2 :-- 192.168.1.20 ( Salve ) The HA folting IP between 2 VM is 192.168.1.30 Both the VM and physical machine communicate to each other via switch and not by cross cable. Inside the both Xen VM I am using the 4 SAN partition ( for email boxs), which is accessable and can be mount from 192.168.1.10 192.168.1.20. I am managing the mounting and unmounting of the SAN partition and stoping and starting the services vai script , which is mention in /etc/ha.d/haresources My problem :-- 1) When 192.168.1.10 is master with flowting IP 192.168.1.30 , at that time every thing work proper. But some time due to n/w problem the flowting IP 192.168.1.30 get switch to slave server i.e 192.168.1.20. At that time while mouting the SAN partiton , the san parititon goes in readonly mode , and for correction this require to do fsck -y device deriver i.e failover causing the filesystem to go in read only mode. I observer that this read only problem not come everytime , but it happen it messup evrything. 2) When both the server get lost there n/w connection , both assume that there respective slave/master is down and they both server act as master-master server and also mount the SAN partition on each VM server. So , 1) How to avoid the redaonly file system problem ? 2) How to avoiod master-master problem , in case of n/w failuer. 3) What kindof precuosion I should take while doing mounting and umounting SAN prartition. ? I tried Ipfail and ping in ha.cf but no luck. I am using Heartbeat version 2.1 heartbeat-2.1.4-9.el5.x86_64.rpm heartbeat-pils-2.1.4-9.el5.x86_64.rpm heartbeat-stonith-2.1.4-9.el5.x86_64.rpm Here is my ha.cf (same on both active passive server ):-- === cat /etc/ha.d/ha.cf debugfile /var/halogs/ha-debug logfile /var/halogs/ha-log logfacility local0 keepalive 2 deadtime 15 warntime 10 initdead 30 udpport 694 bcast eth0 auto_failback off node activeimap1 node passiveimap1 #ping 192.168.2.8 #respawn hacluster /usr/lib/heartbeat/ipfail debug 0 Here is my haresource (same on both active passive server ) :-- cat /etc/ha.d/haresources activeimap1 IPaddr::192.168.1.30 ms4-services Here is my ms4service (same on both active passive server
Re: [Linux-HA] Read only filesystem with Heart beat configuration
Hi, any chance to upgrade to the latest version of HA? 2.1..4 is very old buggy! HTH Nikita Michalko Am Dienstag, 13. Juli 2010 09:39 schrieb jayesh shinde: Hi , Can any one please guide me for my below problem ? I am using this setup with IBM DS 8300 SAN. + HBA + multipathing. Your inputs will be valuable for me. Regards Jayesh Shinde --- On Mon, 7/12/10, jayesh shinde jayesha_shi...@yahoo.com wrote: From: jayesh shinde jayesha_shi...@yahoo.com Subject: Read only filesystem with Heart beat configuration To: linux-ha@lists.linux-ha.org Date: Monday, July 12, 2010, 5:24 PM Dear all , I am facing the problem of read only filesystem with Heart beat configuration. Here are my setup details :-- = I have installed the 2 physical server RHEL 5.2 64 bit with Xen virtualization and IBM BS 8300 SAN. The XEN kernel verison is 2.6.18-92.el5xen. The File system is EXT3. I am using this below setup for a heavy mail server where the POP3 and IMAP traffic is very high ( 1.2 lack emails per day) , I am using cyrus and postfix and ldap Each physical server conatin XEN vitual OS. Outof this one is master and other slave for me. The physical server1 IP :-- 192.168.1.1 The physical server2 IP :-- 192.168.1.2 The Xen VM machine IP under physical server1 :-- 192.168.1.10 ( Master ) The Xen VM machine IP under physical server2 :-- 192.168.1.20 ( Salve ) The HA folting IP between 2 VM is 192.168.1.30 Both the VM and physical machine communicate to each other via switch and not by cross cable. Inside the both Xen VM I am using the 4 SAN partition ( for email boxs), which is accessable and can be mount from 192.168.1.10 192.168.1.20. I am managing the mounting and unmounting of the SAN partition and stoping and starting the services vai script , which is mention in /etc/ha.d/haresources My problem :-- 1) When 192.168.1.10 is master with flowting IP 192.168.1.30 , at that time every thing work proper. But some time due to n/w problem the flowting IP 192.168.1.30 get switch to slave server i.e 192.168.1.20. At that time while mouting the SAN partiton , the san parititon goes in readonly mode , and for correction this require to do fsck -y device deriver i.e failover causing the filesystem to go in read only mode. I observer that this read only problem not come everytime , but it happen it messup evrything. 2) When both the server get lost there n/w connection , both assume that there respective slave/master is down and they both server act as master-master server and also mount the SAN partition on each VM server. So , 1) How to avoid the redaonly file system problem ? 2) How to avoiod master-master problem , in case of n/w failuer. 3) What kindof precuosion I should take while doing mounting and umounting SAN prartition. ? I tried Ipfail and ping in ha.cf but no luck. I am using Heartbeat version 2.1 heartbeat-2.1.4-9.el5.x86_64.rpm heartbeat-pils-2.1.4-9.el5.x86_64.rpm heartbeat-stonith-2.1.4-9.el5.x86_64.rpm Here is my ha.cf (same on both active passive server ):-- === cat /etc/ha.d/ha.cf debugfile /var/halogs/ha-debug logfile /var/halogs/ha-log logfacility local0 keepalive 2 deadtime 15 warntime 10 initdead 30 udpport 694 bcast eth0 auto_failback off node activeimap1 node passiveimap1 #ping 192.168.2.8 #respawn hacluster /usr/lib/heartbeat/ipfail debug 0 Here is my haresource (same on both active passive server ) :-- cat /etc/ha.d/haresources activeimap1 IPaddr::192.168.1.30 ms4-services Here is my ms4service (same on both active passive server ) :-- == cat /etc/ha.d/resource.d/ms4-services #!/bin/bash set -x mylist=syslog crond ldap postfix saslauthd cyrus-imapd httpd stop() { for i in $mylist; do /etc/init.d/$i stop /etc/init.d/$i stop /etc/init.d/$i stop done /sbin/ifdown eth0:2 /sbin/ifdown eth0:3 /sbin/ifdown eth0:4 /sbin/ifdown eth0:5 if cd /tmp/ /bin/mount | grep -e on /usr/local /dev/null then # Kill all processes open on filesystem /sbin/fuser -muk /usr/local /sbin/fuser -muk /imap /sbin/fuser -muk /imap1 /sbin/fuser -muk /imap2 sleep 3 /bin/umount /dev/xvdb sleep 3 /bin/umount /dev/xvdd sleep 3 /bin/umount /dev/xvdc sleep 3 /bin/umount /dev/xvdf fi } start() { sleep 150 /bin/mount /dev/xvdf /usr/local sleep 3 mount /dev/xvdc /imap1 sleep 3 mount /dev/xvdd /imap sleep 3 mount /dev/xvdb /imap2 sleep 3 /sbin/ifup eth0:2 /sbin/ifup eth0:3 /sbin/ifup eth0:4 /sbin/ifup eth0:5
Re: [Linux-HA] IPsrcaddr and IPaddr2
Hi,Ilo! I'd say - IIRC - you should configure netmask in your cib (instance_attributes) for all IP adresses. Sth. like : nvpair id=IPaddr_192_168_1_2_attr_1 name=cidr_netmask value=24/ HTH Nikita Michalko Am Mittwoch, 30. Juni 2010 23:20 schrieb Ilo Lorusso: Hi everyone.. I have a server where im using the following resources running which startup and are running fine. ClusterIP (ocf::heartbeat:IPaddr2): Started saamailin0p01.ipnetwork.co.za postfix (ocf::heartbeat:postfix): Started saamailin0p01.ipnetwork.co.za for the clusterIP address I have assigned and ip address of 57.24.98.55 which i said is working fine.Now what I want to add to this mix is IPsrcaddr, so and traffix that orginates from the server will leave with the ip address 57.24.98.55. I cant seem to get i working I get a whole bunch of errors in the halog .. below is a snippet and as much infomation I could provide as possible .. could someone try shed some light as to why the IPsrcaddr resource wont start up... Thanks... Regards Ilo /\/\/\ primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip=57.24.98.55 cidr_netmask=27 \ op monitor interval=7s primitive IPsrcaddr ocf:heartbeat:IPsrcaddr \ params ipaddress=57.24.98.55 primitive postfix ocf:heartbeat:postfix \ op monitor interval=60s location cli-prefer-ClusterIP ClusterIP \ rule $id=cli-prefer-rule-ClusterIP inf: #uname eq saamailin0p01.ipnetwork.co.za colocation postfix-with-ClusterIP inf: postfix ClusterIP order start-IPsrcaddr-after-postfix inf: postfix IPsrcaddr property $id=cib-bootstrap-options \ dc-version=1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782 \ cluster-infrastructure=Heartbeat \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults $id=rsc-options \ resource-stickiness=100 /\/\/\/\ get these errors in crm_status /\/\/\/\\/\/\ Failed actions: IPsrcaddr_start_0 (node=saamailin0s01.ipnetwork.co.za, call=11, rc=1, status=complete): unknown error IPsrcaddr_start_0 (node=saamailin0p01.ipnetwork.co.za, call=8, rc=1, status=complete): unknown error \/\/\/\/\/\/\/ Below are errors I get from my ha-log.. /\/\/\/\/\/\/\/ Jun 30 23:08:34 SaaMailIN0p01.ipnetwork.co.za lrmd: [2187]: info: RA output: (IPsrcaddr:probe:stderr) ERROR: Cannot use default route w/o netmask [57.24.98.55] Jun 30 23:08:38 SaaMailIN0p01.ipnetwork.co.za lrmd: [2187]: info: RA output: (IPsrcaddr:start:stderr) ERROR: Cannot use default route w/o netmask [57.24.98.55] IPsrcaddr[2516]:2010/06/30_23:08:38 ERROR: command 'ip route replace dev src 57.24.98.55' failed Jun 30 23:08:39 SaaMailIN0p01.ipnetwork.co.za lrmd: [2187]: info: RA output: (IPsrcaddr:stop:stderr) ERROR: Cannot use default route w/o netmask [57.24.98.55] /\/\/\/\/ Below is a snippet of : ip route show \//\/\/\/ 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:0c:29:78:77:1e brd ff:ff:ff:ff:ff:ff inet 57.24.98.50/27 brd 57.24.98.63 scope global eth0 inet 57.24.98.55/27 brd 57.24.98.63 scope global secondary eth0 inet6 fe80::20c:29ff:fe78:771e/64 scope link valid_lft forever preferred_lft forever /\//\\/\/\/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] explain the difference between servers?
Hi mike, it seems to be no HA-problem anymore though, but: Am Montag, 31. Mai 2010 01:29 schrieb mike: So I've got ldirector up and running just fine and providing ldap high availability to 2 backend real servers on port 389. Here is the output of netstat on both real servers: tcp0 0 0.0.0.0:389 0.0.0.0:* LISTEN tcp0 0 :::389 :::*LISTEN So I used the same director server to create another highly available application jboss running on port 8080. If I look at the director server I see the output of ipvsadm shows both real servers alive and well. [r...@lvsuat1a ha.d]# ipvsadm IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags - RemoteAddress:Port Forward Weight ActiveConn InActConn TCP esbuat1.vip.intranet.mydom lc - gasayul9300602.intranet.mydom Tunnel 1 0 0 - gasayul9300601.intranet.mydom Tunnel 1 1 0 Looks good so far. Now the problem is that I cannot telnet to the VIP on port 8080; I get connection refused. If I change the ldirectord.cf to listen on port 22, it works perfectly. So this would seem to indicate that I have things set up appropriately on the director server. So I started poking around on the backend real servers and netstat looks like this: [supp...@esbuat1b ~]$ netstat -an | grep 8080 tcp0 0 172.28.185.13:8080 0.0.0.0:* LISTEN - which process is running on this port - i.e. lsof -i :8080 so comparing this to the netstat above that listens on port 389 I see that perhaps there is an entry missing, perhaps this: tcp0 0 :::8080 :::*LISTEN So I don't claim to be a networking expert and so maybe I've missed something in my setup and this is why port 8080 is having issues. Can anyone provide me with any pointers or where to go to next? After getting the ldap servers working, I figured this would be easy but I'm struggling with this one. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems HTH Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] HB Troubles
Any chance to update at least to V. 2.1.4? 2.1.3 is very old buggy! Nikita Michalko Am Mittwoch, 5. Mai 2010 01:05 schrieb Baird, Josh: Hi, I have a 2 node HB 2.1.3 cluster running on CentOS 5. I just upgraded the passive node to CentOS 5.4, but the heartbeat packages did not change: heartbeat-stonith-2.1.3-3.el5.centos heartbeat-2.1.3-3.el5.centos heartbeat-pils-2.1.3-3.el5.centos Now, when I try to start HB on the node, it reports that it is starting, but the daemons never actually start: r...@fc-fmcln02:~$ service heartbeat start logd is already running Starting High-Availability services: 2010/05/04_18:02:53 INFO: Resource is stopped 2010/05/04_18:02:53 INFO: Resource is stopped [ OK ] r...@fc-fmcln02:~$ ps aux | grep heartbeat root 6117 0.0 0.0 3916 696 pts/0S+ 18:02 0:00 grep heartbeat Logs say: May 4 18:02:53 fc-fmcln02 heartbeat: [6112]: info: Version 2 support: false May 4 18:02:53 fc-fmcln02 heartbeat: [6112]: WARN: logd is enabled but logfile/debugfile is still configured in ha.cf May 4 18:02:53 fc-fmcln02 heartbeat: [6112]: info: ** May 4 18:02:53 fc-fmcln02 heartbeat: [6112]: info: Configuration validated. Starting heartbeat 2.1.3 May 4 18:02:53 fc-fmcln02 heartbeat: [6113]: info: heartbeat: version 2.1.3 May 4 18:02:53 fc-fmcln02 heartbeat: [6113]: info: Heartbeat generation: 1208455483 Running /usr/lib/heartbeat/heartbeat -d 1000 shows: heartbeat[6122]: 2010/05/04_18:04:00 ERROR: Cannot shmget for process status: Invalid argument heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(keepalive,1) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(deadtime,10) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(warntime,5) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(initdead,120) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(udpport,694) heartbeat: udpport setting must precede media statementsheartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(bcast,eth1) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(auto_failback,off) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(node,fc-fmcln01.corp.follett.com) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(node,fc-fmcln02.corp.follett.com) heartbeat[6122]: 2010/05/04_18:04:00 info: respawn directive: hacluster /usr/lib/heartbeat/ipfail heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(use_logd,yes) heartbeat[6122]: 2010/05/04_18:04:00 info: Enabling logging daemon heartbeat[6122]: 2010/05/04_18:04:00 info: logfile and debug file are those specified in logd config file (default /etc/logd.cf) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(logfile,/var/log/hb.log) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(debugfile,/var/log/heartbeat-debug.log) heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=hacluster, gid=null heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=hacluster, gid=null heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=null, gid=haclient heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=root, gid=null heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=null, gid=haclient heartbeat[6122]: 2010/05/04_18:04:00 debug: Beginning authentication parsing heartbeat[6122]: 2010/05/04_18:04:00 debug: 16 max authentication methods heartbeat[6122]: 2010/05/04_18:04:00 debug: Keyfile opened heartbeat[6122]: 2010/05/04_18:04:00 debug: Keyfile perms OK heartbeat[6122]: 2010/05/04_18:04:00 debug: 16 max authentication methods heartbeat[6122]: 2010/05/04_18:04:00 debug: Found authentication method [sha1] heartbeat[6122]: 2010/05/04_18:04:00 info: AUTH: i=1: key = 0x8c52d78, auth=0x5c6228, authname=sha1 heartbeat[6122]: 2010/05/04_18:04:00 debug: Outbound signing method is 1 heartbeat[6122]: 2010/05/04_18:04:00 debug: Authentication parsing complete [1] heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(cluster,linux-ha) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(hopfudge,1) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(baud,19200) heartbeat: baudrate setting must precede media statementsheartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(hbgenmethod,file) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(realtime,true) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(msgfmt,classic) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(conn_logd_time,60) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(log_badpack,true) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(syslogmsgfmt,false) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(coredumps,true) heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(crm,false) heartbeat[6122]: 2010/05/04_18:04:00 info: Version 2 support: false heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(autojoin,none
Re: [Linux-HA] HA Stats
Hi, Hari, did you already search on the General Linux-HA mailing list ? You find sth. about HA at : http://www.clusterlabs.org/wiki/Documentation and at: http://www.clusterlabs.org/doc HTH Nikita Michalko Am Montag, 19. April 2010 09:11 schrieb Hari: Hi, I recently joined this group. Iam working on a HA project for wireless controller. We are looking at different options. I just wanted to know how is the linux HA implemented and some statistical information ... 1) How is the heartbeat implemented? 2) What is the retry mechanism used? 3) What are the different timer values used in this solution? 4) What are the performance stats (like switchover time in different scenarios)? Please guide in this. Let me know if there are any documents/links related to this. Thanks Regards, Srihari. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat unplug ethernet cable
Hi, as already x-times mentioned, we don't have any crystall ball ;-) - Version, configuration, logs ...??? Nikita Michalko Am Montag, 15. März 2010 10:01 schrieb Liang Xiao Zhu: Hi all, I done everything but heartbeat works only when i use service heartbeat stop, when i unplug the ethernet cable from node 1 doesnt work, what's wrong? i have to add something in the http://ha.cf ha.cf? Thanks in advance ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] drbd with Linux-HA
Hi Muhammad, I know it will not help you to solve the problem, but anyway: where did you install SLES 10 SP 3 from ? I didn't find it ... TIA Nikita Michalko Am Mittwoch, 3. Februar 2010 07:26 schrieb Muhammad Sharfuddin: OS: SLES 10 SP 3 I am running a two node(node1, node2) active passive(standby) Oracle cluster via Linux-HA. Oracle is installed on /oracle, and /oracle is an 'ext3' filesystem on SAN/LUN. At any given time, either all of the resources(IP, Filesystem, and Oracle) are either on node1, or node2. Now to make a DR, I want to put/implement 'drbd', but in a way that both the cluster nodes(node1, and node2) remains mounting the same disk/device(SAN Disk), but there will be another machine(node3) which should not be the part of Linux-HA cluster, and will be the standby oracle machine having its own and separate disk. so my drbd configuration should be like /oracle on SAN(/dev/sdb1... mounted by either node1 or node2), and /oracle(/dev/sdc1) on node3 will be the drbd devices. is it possible ? any help/document/url will be highly appreciated Regards, --ms ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Failover during server freeze
Hi Lars, we have the same HA-version, without mysql though. Could you send me pls the forkbomb to test it on our cluster? TIA! Nikita Michalko Am Donnerstag, 26. November 2009 14:37 schrieb Lars Johansen: Hi, I have a 2 cluster setup running heartbeat 2.1.3. Im running a active/passive setup, where I have mysql in master- master replication, stored on each nodes filesystem, and I have a DRBD drive with NFS. Pulling cables, shuttding down server etc. works very well.. the alive server takes over.. However Ive tried to simulate a server freeze by create a forkbomb on the active server, after doing that, I cannot log in with ssh, access nfs or mysql, ping works, and I guess since the kernel responds, heartbeat wont switch over. How can I setup heartbeat so it detects if a server is frozen? Greetings, LArs ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Failover during server freeze
Hi Johan, thanks for that - so easy can one disable the server ;-) Nikita Michalko Am Montag, 30. November 2009 14:11 schrieb Johan Verrept: On Mon, 2009-11-30 at 13:11 +0100, Nikita Michalko wrote: Hi Lars, we have the same HA-version, without mysql though. Could you send me pls the forkbomb to test it on our cluster? #!/bin/bash while (true); do bash $0; done; ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] how to Install Oralce10gR2 in Linux HA Environment
Hi Muhammad, your problem is here a little bit out of topic though, but anyway: you must change your listener address in the /etc/hosts and in listener.ora for use with HA so that it listen also on the common HA-IP-address , e.g.: ( address = ( protocol = tcp ) ( host = HA-VIP ) ( port = 1526) HTH Nikita Michalko Am Samstag, 21. November 2009 13:25 schrieb Muhammad Sharfuddin: Hi Guys hearbeat package also provides an OCF resource for Oracle(/usr/lib/ocf/resource.d/heartbeat/oracle). I want to create a cluster of oracle. # cat /etc/hosts dbnode1 192.168.0.236 ## hostname and physical IP of server1 dbnode2 192.168.0.238 ## hostname and physical IP of server2 dbserver 192.168.0.245 ## virtual hostname and IP for cluster I ran the Oracle(oracle10gR2) installer on 'dbnode1', and install/place every thing(db, oracle binaries) on file system /oracle which is on SAN. After installation completes, I unmount the /oracle(SAN), and then mount /oracle on 'dbnode2', but oracle gives error and doesnot starts. To start Oracle on 'dbnode2', I have to change the hostname from dbnode2 to dbnode1. is there any specific method to install oracle(any special option I have to provide to the 'runInstaller' ) for Linux HA Regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] CIB not supported?
Ok, thank you both for infos! Greetings Nikita Michalko Am Montag, 29. Juni 2009 14:42 schrieb Dejan Muhamedagic: Hi, On Fri, Jun 26, 2009 at 01:56:29PM +0200, Nikita Michalko wrote: Hi Michael, Am Donnerstag, 25. Juni 2009 18:40 schrieb Michael Hutchins: Ok, I got this one fixed, too. :) - I wonder how ? Can you pls be more specific? See below. Thanks! Nikita Michalko -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Michael Hutchins Sent: Thursday, June 25, 2009 8:49 AM To: General Linux-HA mailing list Subject: [Linux-HA] CIB not supported? So I am still mucking along trying to figure out this stuff. And as I am following along the how-to I previously linked to, I am at the crm part. I have crm up and running (thanks for the help all) and now I get this as soon as I get in the crm shell : crm(live)# configure ERROR: CIB not supported: validator 'transitional-0.6', release '3.0.1' ERROR: You may try the upgrade command crm configure upgrade (which is cibadmin --upgrade) should do the trick. The crm shell doesn't support old XML. The pacemaker 1.0 can still work with the old Xml in a compatibility mode, but it's strongly recommended to upgrade as soon as you can. Thanks, Dejan I didn't get that when I didn't have crm on in my /etc/ha.d/ha.cf config file If I google it, it comes back with nothing meaning full. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] CIB not supported?
Hi Michael, Am Donnerstag, 25. Juni 2009 18:40 schrieb Michael Hutchins: Ok, I got this one fixed, too. :) - I wonder how ? Can you pls be more specific? Thanks! Nikita Michalko -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Michael Hutchins Sent: Thursday, June 25, 2009 8:49 AM To: General Linux-HA mailing list Subject: [Linux-HA] CIB not supported? So I am still mucking along trying to figure out this stuff. And as I am following along the how-to I previously linked to, I am at the crm part. I have crm up and running (thanks for the help all) and now I get this as soon as I get in the crm shell : crm(live)# configure ERROR: CIB not supported: validator 'transitional-0.6', release '3.0.1' ERROR: You may try the upgrade command I didn't get that when I didn't have crm on in my /etc/ha.d/ha.cf config file If I google it, it comes back with nothing meaning full. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Please Help me
class=ocf type=IPaddr2 provider=heartbeat instance_attributes id=resource_IP_5_instance_attrs attributes nvpair id=19cdeb1d-b7c4-4851-99b8-6c62a2a8de39 name=ip value=192.168.29.156/ nvpair id=cd13c341-e4bc-43f6-90af-00f97e3a5800 name=nic value=eth0/ nvpair id=06d597be-7df9-4546-ae5a-2a0ef088afbb name=cidr_netmask value=255.255.255.0/ nvpair id=78eb447b-12b0-418d-a5ee-c484c24e959e name=mac value=00:01:02:03:04:05:06/ nvpair id=5b417cd5-5a6f-4ba1-bb35-3db4dc839803 name=clusterip_hash value=sourceip-sourceport-destport/ /attributes /instance_attributes meta_attributes id=resource_IP_5:0_meta_attrs attributes nvpair id=resource_IP_5:0_metaattr_target_role name=target_role value=started/ /attributes /meta_attributes /primitive /clone /resources constraints rsc_location id=location_ rsc=IP_5 rule id=prefered_location_ score=0 expression attribute=#uname id=b958f92c-839a-4666-8698-e0c96d04719b operation=eq value=server148/ /rule /rsc_location rsc_location id=location_2 rsc=IP_5 rule id=prefered_location_2 score=0 expression attribute=#uname id=540476fc-3a35-4dd8-87dc-3be82adb6592 operation=eq value=server140/ /rule /rsc_location rsc_colocation id=colocation_ from=IP_5 to=IP_5 score=INFINITY/ rsc_order id=order_ from=IP_5 type=after to=IP_5/ /constraints /configuration /cib [r...@server140 ~]# vi /var/lib/heartbeat/crm/cib.xml [r...@server140 ~]# cat /etc/ha.d/ha.cf #use_logd on #crm yes ## Allow to add dynamically a new node to the cluster ##autojoin any udpport 694 bcast eth0 auto_failback on mcast eth0 225.0.0.1 694 1 0 node server140 node server148 crm on [r...@server140 ~]# r...@server140 ~]# cat /etc/ha.d/ha.cf auto_failback on mcast eth0 225.0.0.1 694 1 0 node server140 node server148 crm on Please you help me 2009/6/17 Nikita Michalko michalko.sys...@a-i-p.com Hi Bui Manh, Am Mittwoch, 17. Juni 2009 12:40 schrieb Bui Manh Nam: thank you very much I don't understand the following cib.xml file - I don't understand what you didn't understand exactly: the whole cib.xml ? Please be more exact/informative ... Which OS do you use? Any logs? Which version of HA? If this is 2.1, then I strongly advise to upgrade at least to V.2.1.4! Nikita Michalko nvpair id=cib-bootstrap-options-dc_deadtime name=dc_deadtime val ue=0/ /attributes /cluster_property_set /crm_config nodes node uname=server148 type=normal id=a07fb162-9071-474c-9a9a-ea4b4ef 526e7 instance_attributes id=nodes-a07fb162-9071-474c-9a9a-ea4b4ef526e7 attributes nvpair name=standby id=standby-a07fb162-9071-474c-9a9a-ea4b4ef5 26e7 value=off/ /attributes /instance_attributes /node node uname=server140 type=normal id=c7f5251a-3bab-489f-a18f-c1a04ff a1591 instance_attributes id=nodes-c7f5251a-3bab-489f-a18f-c1a04ffa1591 attributes nvpair name=standby id=standby-c7f5251a-3bab-489f-a18f-c1a04ffa 1591 value=off/ /attributes /instance_attributes /node /nodes resources clone id=IP_5 meta_attributes id=IP_5_meta_attrs attributes nvpair id=IP_5_metaattr_target_role name=target_role value=st opped/ nvpair id=IP_5_metaattr_clone_max name=clone_max value=2/ nvpair id=IP_5_metaattr_clone_node_max name=clone_node_max val ue=2/ nvpair id=IP_5_metaattr_resource_stickiness name=resource_stick iness value=0/ /attributes /meta_attributes primitive id=resource_IP_5 class=ocf type=IPaddr2 provider=hear tbeat instance_attributes id=resource_IP_5_instance_attrs attributes nvpair id=19cdeb1d-b7c4-4851-99b8-6c62a2a8de39 name=ip value =192.168.29.156/ nvpair id=cd13c341-e4bc-43f6-90af-00f97e3a5800 name=nic valu e=eth0/ nvpair id=06d597be-7df9-4546-ae5a-2a0ef088afbb name=cidr_netm ask value=255.255.255.0/ nvpair id=78eb447b-12b0-418d-a5ee-c484c24e959e name=mac valu e=00:01:02:03:04:05:06/ nvpair id=5b417cd5-5a6f-4ba1-bb35-3db4dc839803 name=clusterip _hash value=sourceip-sourceport-destport/ /attributes /instance_attributes meta_attributes id=resource_IP_5:0_meta_attrs attributes nvpair id=resource_IP_5:0_metaattr_target_role name=target_ro le value=started/ /attributes /meta_attributes /primitive /clone /resources constraints rsc_location id=location_ rsc=IP_5 rule id
Re: [Linux-HA] 2.0.4 / question about Clone stonith Resource
Hi Alain, I use the customized version of external/ipmi with an extra configuration file (ipmi.cfg) containing the apropriate parameters for ipmitools - see attached files ;-) HTH Nikita Michalko Am Dienstag, 16. Juni 2009 13:00 schrieb Alain.Moulle: Hi I read somewhere that it is possible to have different parameters dependant on the node where the clone si started , but with regard to GUI , I can't find how to do so . My goal : I would like to declare a stonith resource from type external/ipmi so that one instance is started on each node of the HA cluster, and will not be migrated in case of failover. But there are 5 parameters with external/ipmi resource, and notably the ipmi adress of the adjacent node we would like to be fenced ... so this adress is different for each node, and moreover what about a cluster with let's say 4 nodes ? Do we have to declare 3 different stonith resources so that any node could fence aany other one in the HA cluster ? Thanks Regards Alain ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ipmi.cfg Description: application/shellscript ipmi Description: application/shellscript ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] cannot start heartbeat resource stopped
Hi Miguel, as already x-times mentioned here on the list: PLEASE upgrade ASAP to at least V.2.1.4 or V2.99! V2.0.8 is buggy! HTH Nikita Michalko Am Donnerstag, 18. Juni 2009 16:03 schrieb Miguel Olivares: Hi everybody, I try to configure heartbeat but without success becase when i try to start heartbeat i cannot get the virtual IP, and i got a message Resources is stopped i followed differers procedures in order to find a mistake or something and i looked on the internet. But i don't why i have this message , because in the log files everything seems ok, even when y try to stop it doesn't works. [r...@sun2 ~]# rpm -qa |grep heartbeat heartbeat-2.0.8-2.el4.centos heartbeat-gui-2.0.8-2.el4.centos heartbeat-pils-2.0.8-2.el4.centos heartbeat-stonith-2.0.8-2.el4.centos [r...@sun2 ~]# uname -a Linux sun2 2.6.9-55.ELsmp #1 SMP Fri Apr 20 17:03:35 EDT 2007 i686 athlon i386 GNU/Linux does anybody help me? Thanks [r...@sun2 ~]# /etc/init.d/heartbeat start logd is already running Starting High-Availability services: 2009/06/18_15:42:35 INFO: Resource is stopped [ OK ] [r...@sun2 ~]# /etc/init.d/heartbeat stop Stopping High-Availability services: [ha-log] heartbeat[6142]: 2009/06/18_15:42:35 info: Configuration validated. Starting heartbeat 2.0.8 heartbeat[6143]: 2009/06/18_15:42:35 info: heartbeat: version 2.0.8 heartbeat[6143]: 2009/06/18_15:42:35 info: Heartbeat generation: 19 heartbeat[6143]: 2009/06/18_15:42:35 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[6143]: 2009/06/18_15:42:35 info: G_main_add_TriggerHandler: Added signal manual handler heartbeat[6143]: 2009/06/18_15:42:35 info: Removing /var/run/heartbeat/rsctmp failed, recreating. heartbeat[6143]: 2009/06/18_15:42:35 info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 heartbeat[6143]: 2009/06/18_15:42:35 info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 heartbeat[6143]: 2009/06/18_15:42:35 info: glib: ping heartbeat started. heartbeat[6143]: 2009/06/18_15:42:35 info: G_main_add_SignalHandler: Added signal handler for signal 17 heartbeat[6143]: 2009/06/18_15:42:35 info: Local status now set to: 'up' heartbeat[6143]: 2009/06/18_15:42:37 info: Link 192.168.1.98:192.168.1.98 up. heartbeat[6143]: 2009/06/18_15:42:37 info: Status update for node 192.168.1.98: status ping heartbeat[6143]: 2009/06/18_15:42:37 info: Comm_now_up(): updating status to active heartbeat[6143]: 2009/06/18_15:42:37 info: Local status now set to: 'active' heartbeat[6143]: 2009/06/18_15:52:50 WARN: Shutdown delayed until current resource activity finishes. My configuration files: [ha.cf] debugfile /var/log/ha-debug logfile /var/log/ha-log keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 bcast eth0 auto_failback off node sun2 ping 192.168.1.98 [haresources] sun2 192.168.1.249 [authkeys] auth 1 1 crc ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Please Help me
Hi Bui Manh, Am Mittwoch, 17. Juni 2009 12:40 schrieb Bui Manh Nam: thank you very much I don't understand the following cib.xml file - I don't understand what you didn't understand exactly: the whole cib.xml ? Please be more exact/informative ... Which OS do you use? Any logs? Which version of HA? If this is 2.1, then I strongly advise to upgrade at least to V.2.1.4! Nikita Michalko nvpair id=cib-bootstrap-options-dc_deadtime name=dc_deadtime val ue=0/ /attributes /cluster_property_set /crm_config nodes node uname=server148 type=normal id=a07fb162-9071-474c-9a9a-ea4b4ef 526e7 instance_attributes id=nodes-a07fb162-9071-474c-9a9a-ea4b4ef526e7 attributes nvpair name=standby id=standby-a07fb162-9071-474c-9a9a-ea4b4ef5 26e7 value=off/ /attributes /instance_attributes /node node uname=server140 type=normal id=c7f5251a-3bab-489f-a18f-c1a04ff a1591 instance_attributes id=nodes-c7f5251a-3bab-489f-a18f-c1a04ffa1591 attributes nvpair name=standby id=standby-c7f5251a-3bab-489f-a18f-c1a04ffa 1591 value=off/ /attributes /instance_attributes /node /nodes resources clone id=IP_5 meta_attributes id=IP_5_meta_attrs attributes nvpair id=IP_5_metaattr_target_role name=target_role value=st opped/ nvpair id=IP_5_metaattr_clone_max name=clone_max value=2/ nvpair id=IP_5_metaattr_clone_node_max name=clone_node_max val ue=2/ nvpair id=IP_5_metaattr_resource_stickiness name=resource_stick iness value=0/ /attributes /meta_attributes primitive id=resource_IP_5 class=ocf type=IPaddr2 provider=hear tbeat instance_attributes id=resource_IP_5_instance_attrs attributes nvpair id=19cdeb1d-b7c4-4851-99b8-6c62a2a8de39 name=ip value =192.168.29.156/ nvpair id=cd13c341-e4bc-43f6-90af-00f97e3a5800 name=nic valu e=eth0/ nvpair id=06d597be-7df9-4546-ae5a-2a0ef088afbb name=cidr_netm ask value=255.255.255.0/ nvpair id=78eb447b-12b0-418d-a5ee-c484c24e959e name=mac valu e=00:01:02:03:04:05:06/ nvpair id=5b417cd5-5a6f-4ba1-bb35-3db4dc839803 name=clusterip _hash value=sourceip-sourceport-destport/ /attributes /instance_attributes meta_attributes id=resource_IP_5:0_meta_attrs attributes nvpair id=resource_IP_5:0_metaattr_target_role name=target_ro le value=started/ /attributes /meta_attributes /primitive /clone /resources constraints rsc_location id=location_ rsc=IP_5 rule id=prefered_location_ score=0 expression attribute=#uname id=b958f92c-839a-4666-8698-e0c96d0471 9b operation=eq value=server148/ /rule /rsc_location rsc_location id=location_2 rsc=IP_5 rule id=prefered_location_2 score=0 expression attribute=#uname id=540476fc-3a35-4dd8-87dc-3be82adb65 92 operation=eq value=server140/ /rule /rsc_location rsc_colocation id=colocation_ from=IP_5 to=IP_5 score=INFINITY/ rsc_order id=order_ from=IP_5 type=after to=IP_5/ /constraints /configuration /cib [r...@server140 ~]# cat /var/lib/heartbeat/crm/cib.xml cib admin_epoch=0 have_quorum=true ignore_dtd=false num_peers=2 cib_fe ature_revision=2.0 generated=true num_updates=1 ccm_transition=2 dc_uuid =a07fb162-9071-474c-9a9a-ea4b4ef526e7 epoch=1306 cib-last-written=Mon Jun 1 5 12:56:56 2009 configuration crm_config cluster_property_set id=cib-bootstrap-options attributes nvpair id=cib-bootstrap-options-dc-version name=dc-version value =2.1.3-node: 552305612591183b1628baa5bc6e903e0f1e26a3/ nvpair id=cib-bootstrap-options-stonith-enabled name=stonith-enab led value=true/ nvpair id=cib-bootstrap-options-no-quorum-policy name=no-quorum-p olicy value=stop/ nvpair name=last-lrm-refresh id=cib-bootstrap-options-last-lrm-re fresh value=1245040474/ nvpair id=cib-bootstrap-options-default-resource-stickiness name= default-resource-stickiness value=INFINYTY/ nvpair id=cib-bootstrap-options-dc_deadtime name=dc_deadtime val ue=0/ /attributes /cluster_property_set /crm_config nodes node uname=server148 type=normal id=a07fb162-9071-474c-9a9a-ea4b4ef 526e7 instance_attributes id=nodes-a07fb162-9071-474c-9a9a-ea4b4ef526e7 attributes nvpair name=standby id=standby-a07fb162-9071-474c-9a9a-ea4b4ef5 26e7 value=off/ /attributes /instance_attributes /node node uname=server140 type
Re: [Linux-HA] Please Help me
Hi bui manh, Am Mittwoch, 17. Juni 2009 05:31 schrieb nambuim...@vccorp.vn: Hi all my name is bui manh nam, i have a problem as the following: I have already installed heartbeat* on server140 which ip address is 192.168.29.140, other server148 with the address 192.168.29.148 VIP: 192.168.29.156 on server140: 192.168.29.140 create file eth0:0 DEVICE=eth0:0 BOOTPROTO=static HWADDR=01:02:03:04:05:06 IPADDR=192.168.29.156 NETMASK=255.255.255.0 ONBOOT=yes - you don't need to configure file eth0:0, that does heartbeat for you automatically, if correct configured ;-) #wget -O /etc/yum.repos.d/CentOS-Testing.repo http://dev.centos.org/centos/5/CentOS-Testing.repo #yum --enablerepo c5-testing install iptables-mod-CLUSTERIP #modprobe ipt_conntrack #modprobe ipt_CLUSTERIP #iptables -A INPUT -p tcp -d 192.168.29.156 -j CLUSTERIP --new --hashmode sourceip --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 2 --hashmode sourceip #service iptables save #yum install lighttpd -y '== I have tested it run ok on both servers of the subnet 192.168.29.0/24but when I test in other subnet, it is not run. I use other server with the IP address is 192.168.30.107, I can not ping to 192.168.29.156 and the access to http://192.168.29.156/ is impossible. I ping 192.168.29.140 and 192,168.29.148 ok and http://192.168.29.140/ http://192.168.29.148/ access web ok. please help me! I'm hearing from you soon! many thanks! - it would help: version HA, configuration (at least cib.xml and ha.cf) ... Regards Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] getting started with ha2: 404
You are right, Malte - there are still running old clusters in production ... ;-) Nikita Michalko Am Freitag, 5. Juni 2009 09:26 schrieb Malte Geierhos: hi No, we would recommend you to advance to version 3 or pacemaker as as the central component is called now. See: www.clusterlabs.org and doc there. Besides your recommendation i personally think that it's a bad idea to remove all the documentation about previous versions! please undo that. What about a small box on top of every page content telling you that docu for newer versions can now be found at clusterlabs.org ... Specially because you tell everyone on clusterlabs.org : Welcome to the home of Pacemaker - The scalable High-Availability cluster resource manager formerly part of Heartbeat. Pacemaker makes use of your cluster infrastructure (either OpenAIS or Heartbeat) to stop, start and monitor the health of the services (aka. resources) you want the cluster to provide. A PART of Heartbeat - so why remove the COMPLETE V2 DOCUMENTATION as you wrote Pacemaker makes use of a Cluster Infrastructure wether it is openais or heartbeat. So there are folks who need to have access to examples and documentation. kind regards, Malte Geierhos ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat 2.1.4 and 2.9.9 together?
Hi Andrew, what is the difference, if I go for crm cluster with 0.6? TIA Nikita Michalko Am Montag, 4. Mai 2009 08:58 schrieb Andrew Beekhof: haresources clusters should be fine. for crm clusters it depends if you go for 1.0 or 0.6 On Fri, May 1, 2009 at 10:32 PM, Mike Sweetser - Adhost mik...@adhost.com wrote: Hello: I'm looking to migrate an existing Heartbeat 2.1.4 installation to 2.9.9. Would it be possible to upgrade the servers one at a time, which would require running one server with 2.1.4 and one server with 2.9.9 for a short period? Would there be any incompatibility issues in doing so? Thank You, Mike Sweetser ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] which action first
Hi, Am Donnerstag, 5. Februar 2009 05:28 schrieb lakshmipadmaja maddali: Hi All, I want to know that when heartbeat is started, which function does it calls first? Is it monitor or start? - it is monitor - look on the ha-log ... Waiting for reply, Regards, Padmaja ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Regards Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resource restarting constantly while setting up heartbeat2
Hi akshat, it is quite usefull on this list to send usually logs, configuration files etc - hb_report ! Regards Nikita Michalko Am Donnerstag, 29. Januar 2009 18:28 schrieb akshat kansal: Hi all, I am facing a issue while setting up heartbeat version 2.0 using cib.xml *Issue: The hearbeat resources are starting and stopping constantly and becoming stable after a certain amount of time.* * * *Please help me out with this issue * *Regards* *Akshat* ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resource restart after switching to stand-by mode
Hi Jakub, I dont know too much about Stonith/Ipmi, but I noticed the following in your cib.xml: ... nvpair id=ipmistonith0_userid name=userid value=stoop/ ... nvpair id=ipmistonith1_userid name=userid value=stoop/ - look at this : ^^^ HTH Nikita Michalko Am Dienstag, 16. Dezember 2008 13:58 schrieb Jakub Kuźniar: Thank you very much for help So the problem is not that they migrated, but that this caused a restart of the resources already on the second node? Yes, that's right. There is an unnecessary restart of resources on second node. If so, add the interleave meta attribute to the clones and set it to true. I have added this meta attribute for clone resources Xenconfig_cloneset and Xendata_clonest. I also attached a new version of CIB with this attribute added. But there was no change. Still resources one the second node are restarted whenever first node is switched into stand-by mode. I have erased CIB content and added resources configuration again. Still the same result. Jakub On Tuesday 16 of December 2008 11:48:07 Andrew Beekhof wrote: On Tue, Dec 16, 2008 at 00:33, Jakub Kuźniar jakub.kuzn...@s4.pl wrote: Hi everybody, I have recently updated heartbeat 2.0.8 to heartbeat 2.1.4. I'am running two node Xen cluster using OCFS2. With heartbeat 2.0.8 my configuration worked fine, but after upgrade strange things started to happen. When one of the node was switched to stand-by mode the virtual machines were migrated (live) to the second node, forcing the restart of the virtual machines running on second node. So the problem is not that they migrated, but that this caused a restart of the resources already on the second node? If so, add the interleave meta attribute to the clones and set it to true. When the first node was then switched back online, the failed over VMs, where migrated back to first node, once again causing the restart of VMs running on the second node. This behaviour seems strange to me. I' am probably making some mistake in the configuration. I would be very grateful for any help. I attach also configuration part of cib and ha.cf file. Thank you for response Jakub ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How to configure network timeout
Hi Christian, have a look at operations in cib.xml - you can set there for example: ... op id=IPaddrX_mon interval=600s name=monitor timeout=600s / HTH regards Nikita Michalko Am Montag, 15. Dezember 2008 13:57 schrieb Christian Ratzlaff: Hi everyone, we had a network-problem last week. There were small network outages and both HA-Clusters went crazy... This was not necessary as I see it. There must be a way to set the timeout to about 5 minutes globally. I just found the timeout for starting resources but this is not the same, right? I want to make it possible that there is a timeout within the heartbeat itself. Where can I configure it? kind Christian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] HA gets stuck at: Stopping high available services...
Hi Mario, what platform/version on ? Nikita Michalko Am Dienstag, 9. Dezember 2008 10:48 schrieb Darren Mansell: On Tue, 2008-12-09 at 08:53 +0100, m...@bortal.de wrote: Hello List, currently my 2nd node is down for hardware maintance. Now i tried to reboot my 1st node but it gets stuck at: Stopping high available services... ( i only waited 5mins, buts thats too long anyway) Here is my config file: logfile /var/log/ha-log debugfile /var/log/ha-debug keepalive 500ms deadtime 5 warntime 3 initdead 20 ucast eth2 10.0.0.1 ucast eth2 10.0.0.2 ucast eth0 10.11.12.1 ucast eth0 10.11.12.2 auto_failback off node node01 node node02 debug 1 Any idea why its getting stuck? Is it trying to resolve something? Thanks, Mario Mine does it a lot too. I have to kill -9 the heartbeat master control process. Darren ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat 2.1.3 avoid resource failover
Hi Fulvio, I don't see any attachment on your mail ... Forgotten ? Nikita Michalko Am Montag, 3. November 2008 11:15 schrieb fulvio fabiani: Hi all, we have a clustered installation of Heartbeat 2.1.3 that manages Apache / VIP resource in Active / Standby configuration. What we observe at machine reboot is the Failover of Apache / VIP resource group to a preferred destination (node_02). How can we avoid this behavior and let the resource to don't operate auto failback operation? We already configure auto_failback=off option on ha.cf and try with default_resource_stickiness param in cib.xml but the behavior is the same In attach the used cib.xml Thanks a lot Fulvio Fabiani ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] What to do when file-system becomes read-only due to harddisk error and the heartbeat still alive?
Hi Michele, thank you for you answer, but: I've tried also the second form of mount command - i.e: mount -n -o remount -t reiserfs /dev/sda1 / without succes - I became only error message: Input/output error There was an SCSI-error in dmesg and it was not possible to read or write anything on the system then ... Regards Nikita Michalko Am Samstag, 1. November 2008 15:39 schrieb Michele Mase': Hoping this should be useful ... If root filesystem is readonly, the option mount -o rw,remount / shouldn't work because /etc is readonly too and the mtab shouldn't be written. According to http://www.unixguide.net/linux/faq/04.15.shtml (and the man mount too) Root File System Is Read-OnlyRemount it. If /etc/fstab is correct, you can simply type: mount -n -o remount / If /etc/fstab is wrong, you must give the device name and possibly the type, too: e.g. mount -n -o remount -t ext2 /dev/hda2 / To understand how you got into this state, see, (EXT2-fs: warning: mounting unchecked file system.) have you tried it in this manner? Michele On Fri, Oct 31, 2008 at 11:48 AM, Nikita Michalko [EMAIL PROTECTED] wrote: Hi Jonas, Am Donnerstag, 30. Oktober 2008 11:07 schrieb Jonas Andradas: Hello, have you tried remounting the file system in read-write mode? Something like: mount -o rw,remount / ¿Forcing a node switch also fails, due to the read-only state? - we have tried almost everything - but with no succes. There was impossible to run anything (not shutdown, reboot -NOTHING!) - almost only with error: Input/Output error Regards Nikita Michalko On Mon, Oct 13, 2008 at 03:31, Teh Tze Siong [EMAIL PROTECTED] wrote: I have been playing with HA+DRBD on 2 servers and each with one hard disk installed. The server is hosting the database for other application servers. Recently the application has failed and problem lies in the database server, the primary node. I've checked the server, network still up, I still can read the files and access to database but I cannot update anything and not even allowed to run a shutdown -h now or turn off the network interface - The file system has become read-only mode and heartbeat is still alive. I have no choice but to request the datacenter guy to unplug the network cables at the primary node, wait until I confirmed secondary node has taken over, then only power down the primary node. HA+DRBD works well for network failure or power failure on primary node but when it comes to this half-dead situation, it is driving me crazy. As the server is installed in remote datacenter, is there a better way for me to counter act this half-dead situation? Thanks, Zev Disclaimer - --- This e-mail and any files transmitted with it are intended only for the use of the addressee. If it contains confidential information, the addressee is expressly prohibited from distributing the e-mail and legal action may be taken against you. Access by any other persons to this e-mail is unauthorized. If you are not the intended recipient, any disclosure, copying, use, or distribution is prohibited and may be unlawful. This e-mail and any files and/or attachments, shall not constitute binding legal obligation with the company. The company shall also not be responsible for any computer virus transmitted with this e-mail. - --- MCSB Systems (M) Berhad ([EMAIL PROTECTED]) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] What to do when file-system becomes read-only due to harddisk error and the heartbeat still alive?
Hi Jonas, Am Donnerstag, 30. Oktober 2008 11:07 schrieb Jonas Andradas: Hello, have you tried remounting the file system in read-write mode? Something like: mount -o rw,remount / ¿Forcing a node switch also fails, due to the read-only state? - we have tried almost everything - but with no succes. There was impossible to run anything (not shutdown, reboot -NOTHING!) - almost only with error: Input/Output error Regards Nikita Michalko On Mon, Oct 13, 2008 at 03:31, Teh Tze Siong [EMAIL PROTECTED] wrote: I have been playing with HA+DRBD on 2 servers and each with one hard disk installed. The server is hosting the database for other application servers. Recently the application has failed and problem lies in the database server, the primary node. I've checked the server, network still up, I still can read the files and access to database but I cannot update anything and not even allowed to run a shutdown -h now or turn off the network interface - The file system has become read-only mode and heartbeat is still alive. I have no choice but to request the datacenter guy to unplug the network cables at the primary node, wait until I confirmed secondary node has taken over, then only power down the primary node. HA+DRBD works well for network failure or power failure on primary node but when it comes to this half-dead situation, it is driving me crazy. As the server is installed in remote datacenter, is there a better way for me to counter act this half-dead situation? Thanks, Zev Disclaimer - --- This e-mail and any files transmitted with it are intended only for the use of the addressee. If it contains confidential information, the addressee is expressly prohibited from distributing the e-mail and legal action may be taken against you. Access by any other persons to this e-mail is unauthorized. If you are not the intended recipient, any disclosure, copying, use, or distribution is prohibited and may be unlawful. This e-mail and any files and/or attachments, shall not constitute binding legal obligation with the company. The company shall also not be responsible for any computer virus transmitted with this e-mail. - --- MCSB Systems (M) Berhad ([EMAIL PROTECTED]) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] What to do when file-system becomes read-only due to harddisk error and the heartbeat still alive?
Hi Teh, we had some similar error: all FS on the system HD were set to read-only due to SCSI error, HA was still alive, but no action possible - no logs etc. due to read-only ... Only possibility was to turn the server off and on. In this situation is also Stonith unusable ;-) Best regards Nikita Michalko Am Montag, 13. Oktober 2008 04:31 schrieb Teh Tze Siong: I have been playing with HA+DRBD on 2 servers and each with one hard disk installed. The server is hosting the database for other application servers. Recently the application has failed and problem lies in the database server, the primary node. I've checked the server, network still up, I still can read the files and access to database but I cannot update anything and not even allowed to run a shutdown -h now or turn off the network interface - The file system has become read-only mode and heartbeat is still alive. I have no choice but to request the datacenter guy to unplug the network cables at the primary node, wait until I confirmed secondary node has taken over, then only power down the primary node. HA+DRBD works well for network failure or power failure on primary node but when it comes to this half-dead situation, it is driving me crazy. As the server is installed in remote datacenter, is there a better way for me to counter act this half-dead situation? Thanks, Zev Disclaimer --- - This e-mail and any files transmitted with it are intended only for the use of the addressee. If it contains confidential information, the addressee is expressly prohibited from distributing the e-mail and legal action may be taken against you. Access by any other persons to this e-mail is unauthorized. If you are not the intended recipient, any disclosure, copying, use, or distribution is prohibited and may be unlawful. This e-mail and any files and/or attachments, shall not constitute binding legal obligation with the company. The company shall also not be responsible for any computer virus transmitted with this e-mail. --- - MCSB Systems (M) Berhad ([EMAIL PROTECTED]) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] What to do when file-system becomes read-only due to harddisk error and the heartbeat still alive?
Hi Dejan, it was a long time ago already and I can't find the logs and configuration files of my colleague (no more here) any more , but I think it was something on the MB of server (IPMI ?). Sorry - can't say exactly ;-( Regards Nikita Michalko Am Donnerstag, 30. Oktober 2008 13:53 schrieb Dejan Muhamedagic: Hi, On Thu, Oct 30, 2008 at 10:58:54AM +0100, Nikita Michalko wrote: Hi Teh, we had some similar error: all FS on the system HD were set to read-only due to SCSI error, HA was still alive, but no action possible - no logs etc. due to read-only ... Only possibility was to turn the server off and on. In this situation is also Stonith unusable ;-) Hmm, what kind of stonith device do you have? Unless it's a stonith device which depends on the operating system and the other node is fine, it should work. Thanks, Dejan Best regards Nikita Michalko Am Montag, 13. Oktober 2008 04:31 schrieb Teh Tze Siong: I have been playing with HA+DRBD on 2 servers and each with one hard disk installed. The server is hosting the database for other application servers. Recently the application has failed and problem lies in the database server, the primary node. I've checked the server, network still up, I still can read the files and access to database but I cannot update anything and not even allowed to run a shutdown -h now or turn off the network interface - The file system has become read-only mode and heartbeat is still alive. I have no choice but to request the datacenter guy to unplug the network cables at the primary node, wait until I confirmed secondary node has taken over, then only power down the primary node. HA+DRBD works well for network failure or power failure on primary node but when it comes to this half-dead situation, it is driving me crazy. As the server is installed in remote datacenter, is there a better way for me to counter act this half-dead situation? Thanks, Zev Disclaimer --- - This e-mail and any files transmitted with it are intended only for the use of the addressee. If it contains confidential information, the addressee is expressly prohibited from distributing the e-mail and legal action may be taken against you. Access by any other persons to this e-mail is unauthorized. If you are not the intended recipient, any disclosure, copying, use, or distribution is prohibited and may be unlawful. This e-mail and any files and/or attachments, shall not constitute binding legal obligation with the company. The company shall also not be responsible for any computer virus transmitted with this e-mail. --- - MCSB Systems (M) Berhad ([EMAIL PROTECTED]) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Weird issue with Heartbeat
Hi Daren, which version of HA do you use ? Am Montag, 13. Oktober 2008 12:30 schrieb Daren Tay: Hi guys, I have heartbeat running on 2 machines, say db1 (master) and db2. I realize the VIP is activated on both sides at the same time! And to recreate the problem, I just have to down heartbeat on db1, with db2 taking over the VIP. Then when I up db1, the VIP will appear on db1, but db2 will still have the VIP. Logs shows nothing weird. any idea what areas I could troubleshoot? Maybe it is anyway interesting though ;-) My ha.cf is as follows # Time between heatbeats in seconds keepalive 1 # Node is pronouced dead after 15 seconds deadtime 15 # Prevents the master node from re-aquiring cluster resouces after a failover auto_failback on #auto_failback off # Port for udp (default) udpport 694 # Use a udp heartbeat over the eth2 interface. Old version uses udp. bcast eth2 use_logd yes debugfile /var/log/ha/ha.debug logfile /var/log/ha/ha.log # First node of the cluster node db1.domain.com Did the uname -n return the same name as here on the db1 node ? # Second node of the cluster node db2.domain.com Did the uname -n return the same name as here on the db2 ? Thanks! Daren ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How can a group of Resource be Migrated After one resource in the group fail ?
Am Freitag, 19. September 2008 09:14 schrieb heartbeat: Hi all, I found that if one of the resource failed in the group, none of the resources of the group can be migrated to the other node.My cib's key configuration is as follows: .. configuration crm_config cluster_property_set id=cib-bootstrap-options attributes nvpair id=cib-bootstrap-options-default-resource-stickiness name=default-resource-stickiness value=0/ nvpair id=cib-bootstrap-options-default-resource-failure-stickiness name=default-resource-failure-stickiness value=-INFINITY/ group id=group1 primitive id=ap_1_ip_217 class=ocf type=IPaddr2 provider=heartbeat instance_attributes id=ap_1_ip_attr_1 attributes nvpair id=ap_1_ipaddr_1 name=ip value=10.1.41.217/ nvpair id=ap_1_nic_1 name=nic value=eth1:1/ nvpair id=ap_1_mask_1 name=netmask value=25/ /attributes /instance_attributes /primitive primitive id=ap_1_ip_217_agent class=ocf type=myagent provider=heartbeat operations op id=1 name=monitor interval=5s timeout=4s on_fail=restart/ /operations /primitive /group .. How can a group of Resource be Migrated After one resource in the group fail ? Thanks in advance! Hi heartbeat: ! Did you already read following : http://clusterlabs.org/mw/Image:Configuration_Explained.pdf http://www.linux-ha.org/ClusterInformationBase/ResourceGroups * don't set default-resource-stickiness to INFINITY * Grab this script: http://hg.clusterlabs.org/pacemaker/dev/raw-file/tip/contrib/showscores.sh and use it to determine SCORE * set default-resource-failure-stickiness = SCORE / 3 Try to set the value for default-resource-failure-stickiness and default-resource-stickiness to something reasonable - not -/+INFINITY and not 0 ! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Active node gives up resources when passive fails - not good
Am Dienstag, 16. September 2008 10:36 schrieb Wayne Gemmell: Hi Wayne, just a dumb question: Hi all I'm running 2.1.3 on Ubuntu hardy. My ha.cf looks like below. This is the first time I'm using unicast connections and so far there is wierdness going on. shutdown by ipmi because of high ambient temp. They've been having aircon problems in our hosting center and when the ambient temp gets to around 38 deg C my passive server shuts down (No I don't know how to stop this from happening.) - is this maybe so configured in BIOS ? The active server then follows suite and gives up all its resources. This happened again at 2am this morning and I really need to get to the bottom of this. I'm trying broadcast coms to see if it makes a difference but I though I'd post here for ideas. udpport 694 ucast eth5 10.40.0.1 ucast eth5 10.40.0.2 debug 1 use_logd on keepalive 2 warntime 15 deadtime 30 initdead 45 auto_failback off node nikki node asia crm off HTH Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: AW: AW: [Linux-HA] Heartbeat 2.1.3-23.1 (where is the stable version HA/pacemaker)
Hi Maddin, Am Mittwoch, 27. August 2008 14:29 schrieb Mega Mailingliste: Hi Andreas, Hi all, this is exactly what I mean, where are the 2.1.3-23.1 heartbeat/pacemaker packages? - I have the RPMs for SLES10, if you want ... HTH Nikita Michalko I need this packages urgently for a productive-cluster and won't try 2.4.1 or 2.99 which is still beta. Thanks a lot Maddin -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:linux-ha- [EMAIL PROTECTED] Im Auftrag von Andreas Mock Gesendet: Mittwoch, 27. August 2008 13:31 An: General Linux-HA mailing list Betreff: Re: AW: [Linux-HA] Heartbeat 2.1.3-23.1 (where is the stable version HA/pacemaker) Hi Michael, i all, but that does not answer the very first question: Which version is the current stable version of HA/pacemaker? What I understood so far: a) 2.1.3 with 0.6.6 is stable but cannot be found packaged in the know location http://download.opensuse.org/repositories/server:/ha- clustering/ b) In that location is the combination of HA2.99.0/pacemaker 0.6.6 which is declared beta. c) HA Version 2.1.4 is a full HA stack as we were used to before the pacemaker split off. This was done especially for SLES customers. Have I understood it right? Where is the stable HA/pacemaker combination? Best regards Andreas Mock -Ursprüngliche Nachricht- Von: Michael Schwartzkopff [EMAIL PROTECTED] Gesendet: 27.08.08 11:48:20 An: General Linux-HA mailing list linux-ha@lists.linux-ha.org Betreff: Re: AW: [Linux-HA] Heartbeat 2.1.3-23.1 Am Mittwoch, 27. August 2008 10:55 schrieb Mega Mailingliste: Nobody who knows where i can get this?? Is 2.99 beta or not? Yes. It declared beta. -- Dr. Michael Schwartzkopff MultiNET Services GmbH Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany Tel: +49 - 89 - 45 69 11 0 Fax: +49 - 89 - 45 69 11 21 mob: +49 - 174 - 343 28 75 mail: [EMAIL PROTECTED] web: www.multinet.de Sitz der Gesellschaft: 85630 Grasbrunn Registergericht: Amtsgericht München HRB 114375 Geschäftsführer: Günter Jurgeneit, Hubert Martens --- PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B Skype: misch42 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] installing on ubuntu 6.06
Hello Sander, I really don't want to bother you, but it would be great, if you can send me the complete documentation of what you've done on Ubuntu Server 8.04 LTS. I 'll give it try ... Thanks ! Nikita Michalko Am Dienstag, 12. August 2008 14:58 schrieb Sander van Vugt: Just to encourage you to do an upgrade: I've recently succesfully implemented DRBD+iSCSI+Heartbeat on Ubuntu Server 8.04 LTS. If it helps, I can send you the complete documentation of what I've done. And my two cents about the stability of 8.04: I didn't like the (in)stability of 7.04 Server, but after having done a lot of things with 8.04 server, I have to admit that I'm starting to like it. :-) Best regards, Sander On Tue, 2008-08-12 at 11:40 +0100, Matthew Macdonald-Wallace wrote: On Mon, 11 Aug 2008 21:59:33 -0600 (MDT) RYAN vAN GINNEKEN [EMAIL PROTECTED] wrote: sorry to bump an old post but i has still not solved this problem yet Seeing as 8.04 is now a LTS version of Ubuntu in the same way the 6.06 was, is upgrading to a newer version of Ubuntu an option? If you are still having issues, I would seriously consider a re-install. I don't want to get into a distro war here, I would just like to say that I consider Ubuntu to be an excellent (if not the best when it comes to user-friendly aspects) Desktop Distribution and that I use Debian (or gentoo if severe customisation is required!) as a server distribution as I believe that they are more stable. Kind regards, Matt ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Linux-ha for firewalls
Hi Christof, could you send me your last hb_report and a copy of your whitepaper in DOC-Format ? I will take a look ... Thanks ! Nikita Michalko Am Donnerstag, 7. August 2008 17:26 schrieb [EMAIL PROTECTED]: I apologize in advance for the top posting and the horrible web based e-mail I'm on the road. I wrote a whitepaper/book about building Internet firewalls using Linux based systems, and have been keeping it up until relatively recently. It includes a chapter on using Heartbeat in order to manage an active/passive firewall setup. The book itself is centered around RHEL/CentOS, but the majority of it would work for pretty much any Linux distribution. The main reason I haven't been keeping it up is that I am working on the Second Edition of it. The original was based around the 4.x version of RHEL/CentOS. The new version will be based around the 5.x version. Another important note is that in the old version, it uses 2.0.8 of Heartbeat. The new version will be using 2.1.3, but the config files, at least as far as a firewall is concerned, look like they will be the same. I'd be more than happy to send you a copy. I can either send you the PDF of it or the DOC version of it. Dear list members, at the moment I try to setup a linux cluster of 2 firewalls that should both be online and only one should run virtual ip addresses of all network segments. My configuration looks like the following: master fw is linux (uname) and slave is idefix. I generated a resource group called grp_vips that contains all virtual ip resources (rsc_int_vip and rsc_ext_vip). If I reboot the master (linux) idefix takes over all resources and everything is ok, but if I shutdown a resource (rsc_int_vip) on the master the second resource (rsc_ext_vip) migratesto the slave (idefix) and the first resource (rsc_int_vip) stays at the master (linux) as unmanaged. Attached are the ha.cf and cib.xml files of my configuration. What I want to achieve is: - one dedicated master (linux), only, if there are problems switch to the slave (idefix) - if the master comes back (or only the interface that was gone) the whole group should migrate back to the primary master (linux) - if one resource of the group goes down, the whole group should be migrated to the slave (collocated = true of the group is already set) - if possible, the slave should become master (to always have the master where the resources are running One feature I detected also with my init scripts on Opensuse 10.3 is that heartbeat sometimes (80%) does not start because the network is not ready. I downloaded heartbeat rpms from the linux-ha download site and I'm using heartbeat 2.1.3. Any hints how I can achieve what I want are highly appreciated. Thank you for your help. Best regards Christof ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Linux-HA Leadership Announcement
Hi Alan, THANK YOU VERY MUCH FOR YOUR GREAT WORK ! GOOD LUCK ! Nikita Michalko Am Dienstag, 24. Juni 2008 15:26 schrieb Alan Robertson: After more than 10 years as the Linux-HA project leader, I've decided to create a new leadership structure. One of my original success criteria for the project was that it eventually would not need me. In the last few years, it has seemed more and more likely that we'd reached this plateau of success - and the time has come to put that supposition to the test. Effective today, I am appointing a team of three people to lead and govern the project going forward. These three outstanding people have proved themselves key contributors to the project, and are ready and willing to take over the reins of leadership - and lead the project into the future. These people are: Keisuke MORI [EMAIL PROTECTED] Dave Blaschke [EMAIL PROTECTED] Lars Marowsky-Bree [EMAIL PROTECTED] As for me, my current assignment in IBM doesn't permit me to spend full time on the project, but I will continue to promote and contribute to the project as time permits. Should future circumstances permit it, I expect that I will increase my efforts the project again. Congratulations to Mori-san, Dave and Lars! They're working out their new roles, scheduling releases, and so on. Expect to hear from them soon! -- Alan Robertson [EMAIL PROTECTED] Linux-HA founder, Linux-HA project leader emeritus ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] need help
Am Mittwoch, 16. April 2008 17:29 schrieb Philip Michael D Vargas: Good day to all, I'm new here, i need some help regarding HA for a linux. Do you have any howto article for this? I'm using fedora, freebsd and ubuntu Your help is much appreciated. --- Linux Registered User # 413558 [EMAIL PROTECTED] (philip at comclark dot com) ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Hi, Philip, you can start at http://wiki.linux-ha.org/ and also at http://www.linux-ha.org//HeartbeatTutorials HTH Regards -- Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat emergency shutdown
Hi Manas, can you check if your local firewall is blocking access to port 649 (or another HA-used port) ? Maybe some networking problems ? HTH Nikita Michalko Am Montag, 17. März 2008 14:15 schrieb Manas Garg: Hi, We have a two nodes setup running heartbeat version 2.0.8-1. On one node, heartbeat exited saying Emergency Shutdown. It was restarted. After the restart, the heartbeat on the other node exited giving roughly the same reason. Can someone please help us identify the issue. If these are known bugs and if those bugs have been fixed in later releases? Any help would be greatly appreciated. The nodes configuration: sh-3.00# uname -a Linux S-FL2-PLS-NAC 2.6.17-1.2142_FC4smp #1 SMP Sat Aug 12 08:16:08 EDT 2006 i686 i686 i386 GNU/Linux Following are the logs from the first node: Mar 3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue is filling up (197 messages in queue) Mar 3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue is filling up (198 messages in queue) Mar 3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue is filling up (199 messages in queue) Mar 3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue is filling up (200 messages in queue) Mar 3 14:47:10 S-FL2-PLS-NAC last message repeated 7 times Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7 for s-fl2-sls-nac.yardi.com: seqno too low Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207, lowseq=7,ackseq=0,lastmsg=6 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7 for s-fl2-sls-nac.yardi.com: seqno too low Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207, lowseq=7,ackseq=0,lastmsg=6 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue is filling up (200 messages in queue) Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8 for s-fl2-sls-nac.yardi.com: seqno too low Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, lowseq=8,ackseq=0,lastmsg=7 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8 for s-fl2-sls-nac.yardi.com: seqno too low Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, lowseq=8,ackseq=0,lastmsg=7 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8 for s-fl2-sls-nac.yardi.com: seqno too low Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, lowseq=8,ackseq=0,lastmsg=7 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8 for s-fl2-sls-nac.yardi.com: seqno too low Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode = s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information: Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208, lowseq=8,ackseq=0,lastmsg=7 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: lowseq cannnot be greater than ackseq Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist-ackseq =10, old_ackseq=0 Mar 3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist-lowseq =201, hist-hiseq=208, send_cluster_msg_level=0 Mar 3 14:47:10 S-FL2-PLS-NAC ccm: [5284]: ERROR: Lost connection to heartbeat service. Need to bail out. Mar 3 14:47:10 S-FL2-PLS-NAC cib: [5285]: ERROR: cib_ha_connection_destroy: Heartbeat connection lost! Exiting. Mar 3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: ERROR: Disconnected with heartbeat daemon Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: CRIT: crmd_ha_msg_dispatch: Lost connection to heartbeat service. Mar 3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: Lost connection to heartbeat service. Mar 3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: notice: /usr/lib/heartbeat/stonithd normally quit. Mar 3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: mem_handle_func:IPC broken, ccm is dead before the client! Mar 3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: attrd_ha_dispatch
Re: [Linux-HA] Strange behavior of the resource group on 2 nodes cluster
Am Montag, 10. März 2008 16:12 schrieb Dominik Klein: Nikita Michalko wrote: Hi all! I have some troubles with HA V2.1.3 on SLES10 SP1, two-node cluster with 1 resource group=2 resources. Intended is forced failover of the group on the third failure of any resource in the group; one node is preferred over the other (see attached configuration). After start are resources running on the preferred node (demo), as expected, but with 1 failcount and with following score (script showscores): Resource Score Node Stick. Failcount Fail.-Stickiness IPaddr_193_27_40_57 0 dbora 2 0 -3 IPaddr_193_27_40_57 2 demo 2 0 -3 ubis_udbmain_13 -INFINITYdbora 2 0 -3 ubis_udbmain_13 INFINITY demo 2 1-3 Score of the first resource (IPaddr_193_27_40_57) is 2 as expected (group resource_stickiness=1) , but the second resource has score INFINITY- why ? Because of added colocation constraint for group ? In a colocated group (which is the default), all subsequent resources are tied to the group's first resource with a score of INFINITY. To not allow them to run on another node but the node the first resource is run on, they also get -INFINITY for any other node. Thank you very much, Dominik, for your reply - but how can I then achieve the Intended behavior: group failover on the third failure ? Best regards Nikita Michalko AIP Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Strange behavior of the resource group on 2 nodes cluster
Am Dienstag, 11. März 2008 10:39 schrieb Dominik Klein: In a colocated group (which is the default), all subsequent resources are tied to the group's first resource with a score of INFINITY. To not allow them to run on another node but the node the first resource is run on, they also get -INFINITY for any other node. Thank you very much, Dominik, for your reply - but how can I then achieve the Intended behavior: group failover on the third failure ? Although I cannot explain it score-wise, as you can only see INFINITY for the group resources, this should work. Just let a resource in the group fail a couple of times and see what happens. Works for me. I'll have Andrew explain this when he's back from Australia :) Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Hi Dominik, I tried to let the resource in the group fail a couple of times, but after the 2-nd try will the failcount for both resources NOT increased. It shows after each try (with ifconfig eth0:x down ) still the same: Resource Score Node Stickin. Failc. Fail.-Stick. IPaddr_193_27_40_57 0 dbora 2 0 -3 IPaddr_193_27_40_57 1 demo 2 1 -3 ubis_udbmain_13 -INFINITY dbora 2 0 -3 ubis_udbmain_13 INFINITY demo 2 1 -3 Thank you again for your time ! Regards Nikita Michalko AIP ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Strange behavior of the resource group on 2 nodes cluster
Am Dienstag, 11. März 2008 13:21 schrieb Dominik Klein: Hi Dominik, I tried to let the resource in the group fail a couple of times, but after the 2-nd try will the failcount for both resources NOT increased. Did you wait for the cluster to restart the resource after you produced the failure before causing another failure? Yes, of course It shows after each try (with ifconfig eth0:x down ) still the same: Resource Score Node Stickin. Failc. Fail.-Stick. IPaddr_193_27_40_57 0 dbora 2 0 -3 IPaddr_193_27_40_57 1 demo 2 1 -3 ubis_udbmain_13 -INFINITY dbora 2 0 -3 ubis_udbmain_13 INFINITY demo 2 1 -3 Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Regards Nikita ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] ha_msg_addraw_ll: illegal field
Hi Tom, only one tip from me: don't use serial line for heartbeat - better try only network interfaces ! Am Samstag, 8. März 2008 02:15 schrieb Tom Brown: OS: Debian Etch 4.0r3 Kernel: vanilla kernel 2.6.24.3 DRBD: 8.2.5 Heartbeat: 2.1.3 I was testing fail-overs between two nodes: fs01 and fs02. I've alternated rebooting the nodes several times. I saw the errors below show up in /var/log/ha-log, on fs01, once. There didn't seem to be any side effects after the errors showed up. Any ideas what these errors mean? Thanks, Tom heartbeat[1631]: 2008/03/07_15:57:41 WARN: glib: TTY write timeout on [/dev/ttyS0] (no connection or bad cable? [see documentation]) heartbeat[1631]: 2008/03/07_15:57:41 info: glib: See http://linux-ha.org/FAQ#TTYtimeout for details heartbeat[1567]: 2008/03/07_15:57:50 WARN: node fs02: is dead heartbeat[1567]: 2008/03/07_15:57:50 info: Dead node fs02 gave up resources. heartbeat[1567]: 2008/03/07_15:57:50 info: Link fs02:/dev/ttyS0 dead. heartbeat[1567]: 2008/03/07_15:58:37 ERROR: ha_msg_addraw_ll: illegal field heartbeat[1567]: 2008/03/07_15:58:37 ERROR: ha_msg_addraw(): ha_msg_addraw_ll failed heartbeat[1567]: 2008/03/07_15:58:37 ERROR: NV failure (string2msg_ll): heartbeat[1567]: 2008/03/07_15:58:37 ERROR: Input string: [ t=NS_rexmit t=status st=init dt=7d00 protocol=1 src=fs02 (1)srcuuid=XndbW/0xSR2wusgEzHcirQ== seq=1 hg=47d19b04 ts=47d1d6ac ld=0.83 0.21 0.07 1/59 1610 ttl=3 auth=1 af6ef5df ] heartbeat[1567]: 2008/03/07_15:58:37 ERROR: sp= t=status st=init dt=7d00 protocol=1 src=fs02 (1)srcuuid=XndbW/0xSR2wusgEzHcirQ== seq=1 hg=47d19b04 ts=47d1d6ac ld=0.83 0.21 0.07 1/59 1610 ttl=3 auth=1 af6ef5df heartbeat[1567]: 2008/03/07_15:58:37 ERROR: depth=0 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: MSG: Dumping message with 1 fields heartbeat[1567]: 2008/03/07_15:58:37 ERROR: MSG[0] : [t=NS_rexmit] ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems HTH Nikita Michalko AIP ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Strange behavior of the resource group on 2 nodes cluster
Hi all! I have some troubles with HA V2.1.3 on SLES10 SP1, two-node cluster with 1 resource group=2 resources. Intended is forced failover of the group on the third failure of any resource in the group; one node is preferred over the other (see attached configuration). After start are resources running on the preferred node (demo), as expected, but with 1 failcount and with following score (script showscores): Resource Score Node Stick. Failcount Fail.-Stickiness IPaddr_193_27_40_57 0 dbora2 0 -3 IPaddr_193_27_40_57 2 demo2 0 -3 ubis_udbmain_13 -INFINITYdbora 2 0 -3 ubis_udbmain_13 INFINITYdemo 2 1-3 Score of the first resource (IPaddr_193_27_40_57) is 2 as expected (group resource_stickiness=1) , but the second resource has score INFINITY- why ? Because of added colocation constraint for group ? Nikita Michalko AIP cib admin_epoch=0 epoch=2 generated=false have_quorum=false ignore_dtd=false num_peers=1 cib_feature_revision=2.0 num_updates=1 cib-last-written=Thu Mar 6 15:21:33 2008 configuration crm_config cluster_property_set id=cib-bootstrap-options attributes nvpair id=cib-bootstrap-options-symmetric-cluster name=symmetric-cluster value=true/ nvpair id=cib-bootstrap-options-no-quorum-policy name=no-quorum-policy value=stop/ nvpair id=cib-bootstrap-options-default-resource-stickiness name=default-resource-stickiness value=2/ nvpair id=cib-bootstrap-options-default-resource-failure-stickiness name=default-resource-failure-stickiness value=-3/ nvpair id=cib-bootstrap-options-stonith-enabled name=stonith-enabled value=false/ nvpair id=cib-bootstrap-options-stonith-action name=stonith-action value=reboot/ nvpair id=cib-bootstrap-options-startup-fencing name=startup-fencing value=true/ nvpair id=cib-bootstrap-options-stop-orphan-resources name=stop-orphan-resources value=true/ nvpair id=cib-bootstrap-options-stop-orphan-actions name=stop-orphan-actions value=true/ nvpair id=cib-bootstrap-options-remove-after-stop name=remove-after-stop value=false/ nvpair id=cib-bootstrap-options-short-resource-names name=short-resource-names value=true/ nvpair id=cib-bootstrap-options-transition-idle-timeout name=transition-idle-timeout value=5min/ nvpair id=cib-bootstrap-options-default-action-timeout name=default-action-timeout value=110s/ nvpair id=cib-bootstrap-options-is-managed-default name=is-managed-default value=true/ nvpair id=cib-bootstrap-options-cluster-delay name=cluster-delay value=60s/ nvpair id=cib-bootstrap-options-pe-error-series-max name=pe-error-series-max value=-1/ nvpair id=cib-bootstrap-options-pe-warn-series-max name=pe-warn-series-max value=-1/ nvpair id=cib-bootstrap-options-pe-input-series-max name=pe-input-series-max value=-1/ nvpair id=cib-bootstrap-options-dc-version name=dc-version value=2.1.3-node: 552305612591183b1628baa5bc6e903e0f1e26a3/ nvpair id=cib-bootstrap-options-last-lrm-refresh name=last-lrm-refresh value=1204812151/ /attributes /cluster_property_set /crm_config nodes node id=7ce70870-4126-4bb7-b263-221a9e7efc7e uname=dbora type=normal/ node id=aa15721d-a88a-4ec8-9e01-cc7eeb780f79 uname=demo type=normal/ /nodes resources group id=group_1 meta_attributes id=ma-group1 attributes nvpair name=target_role id=ma-group1-1 value=started/ nvpair name=resource_stickiness id=ma-group1-2 value=1/ nvpair name=resource_failure_stickiness id=ma-group1-3 value=-1/ /attributes /meta_attributes primitive class=ocf id=IPaddr_193_27_40_57 provider=heartbeat type=IPaddr operations op id=IPaddr_193_27_40_57_mon interval=60s name=monitor timeout=60s/ /operations instance_attributes id=IPaddr_193_27_40_57_inst_attr attributes nvpair id=IPaddr_193_27_40_57_attr_0 name=ip value=193.27.40.57/ nvpair id=IPaddr_193_27_40_57_attr_1 name=cidr_netmask value=26/ nvpair id=IPaddr_193_27_40_57_attr_3 name=broadcast value=193.27.40.63/ /attributes /instance_attributes /primitive primitive class=lsb id=ubis_udbmain_13 provider=heartbeat type=ubis_udbmain operations op id=ubis_udbmain_13_mon interval=120s name=monitor timeout=110s/ /operations /primitive /group /resources constraints rsc_location id=rsc_location_group_1 rsc=group_1 rule id=prefered_location_group_1 score=1
Re: [Linux-HA] Solving a strange split-brain with drbd and ha
Hello Balabam, Am Donnerstag, 21. Februar 2008 09:25 schrieb Balabam: Hello, I've two nodes working in a split brain configuration and I'm not able to solve this problem. My config is: [EMAIL PROTECTED] ~]# crm_mon Defaulting to one-shot mode You need to have curses available at compile time to enable console mode Last updated: Fri Feb 15 11:05:05 2008 Current DC: rman1c (875afc12-b88e-4940-9816-218d2a5911c3) 2 Nodes configured. 2 Resources configured. Node: rman1a (4d7bd4ec-c121-4b13-a2d4-aec820ea36d5): online Node: rman1c (875afc12-b88e-4940-9816-218d2a5911c3): online Master/Slave Set: ms-drbd0 drbd0:0 (heartbeat::ocf:drbd): Started rman1a drbd0:1 (heartbeat::ocf:drbd): Master rman1c Resource Group: Oracle V_IP(heartbeat::ocf:IPaddr2): Started rman1c FS (heartbeat::ocf:Filesystem):Started rman1c Ora_DB (heartbeat::ocf:oracle):Started rman1c Ora_LSNR(heartbeat::ocf:oralsnr): Started rman1c Failed actions: FS_start_0 (node=rman1a, call=13, rc=1): Error group Oracle is running on node where drbd is in master mode. If I cleanup FS on, heartbeat stop resources on rman1c and try to remount on rman1a making a Fail. I've attached cib..xml and log of messages.. - REALLY ?? I don't see any attachment ! Check again ! Bye ! Nikita Michalko Thanks Stefano - - L'email della prossima generazione? Puoi averla con la nuova Yahoo! Mail ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat 2.1.3 error
Am Donnerstag, 14. Februar 2008 12:51 schrieb maike: Hi people, i update the heartbeat for last version, but when i started the heartbeat an error is issued? heartbeat[5612]: 2008/02/14_09:47:59 WARN: Managed /usr/lib/heartbeat/cib process 5630 exited with return code 1. heartbeat[5612]: 2008/02/14_09:47:59 EMERG: Rebooting system. Reason: /usr/lib/heartbeat/cib Someone help-me? - yes, I would, but: config/log files and cib.xml missed :-( -- Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Documentation/help for forced resource fail-over
Hi all, can somebody point me to the right V2.1.3 documentation or help for configuration of forced fail-over of resources ? I want to configure 2-node symmetric cluster with 7 resources so that after 3 failures on node A will resources stop on node A and then start on node B. How should I set the default-resource-stickiness and default-resource-failure-stickiness ? Any help will be very appreciate ! -- Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Going from V.2.0.7 to V.2.1.3
Hi all, I've got problem after installing HA V.2.1.3 on SLES10 32-bit box. The old V.2.0.7 with the same configuration - 2 Nodes Active/Active, with crm - is running fine (with small problems), but now - with 2.1.3 - I get following errors: lrmd ... info: RA output: (IPaddr2_193_27_40_56:monitor:stderr) 26: unknown interface: No such device ... /usr/lib/heartbeat/findif version 2.1.3 Copyright Alan Robertson Usage: /usr/lib/heartbeat/findif [-C] Options: -C: Output netmask as the number of bits rather than as 4 octets. Environment variables: OCF_RESKEY_ipip address (mandatory!) OCF_RESKEY_cidr_netmask netmask of interface OCF_RESKEY_broadcast broadcast address for interface OCF_RESKEY_nic interface to assign to IPaddr2[24260]: ERROR: [/usr/lib/heartbeat/findif -C] failed - see attached config and log files. The requested IP address 193.27.40.56 is free, not up. crm_verify -V -x cib.xml - OK (without errors) What kind of configuration error or something else could it be ? What did I wrong ? Thanks in advance for any help ! AIP - Nikita Michalko # HA-Services dbfix 193.27.40.56/26/193.27.40.63 ubis_up_mkctab ubis_nserv ubis_mserv demo 193.27.40.57/26/193.27.40.63 ubis_udbmain # Mit privatem NW - derzeit nicht möglich im Zi.17 ! #dbfix 193.27.40.56/26/193.27.40.63 192.168.163.56/26/193.27.40.63 ubis_up_mkctab ubis_nserv ubis_mserv #demo 193.27.40.57/26/193.27.40.63 192.168.163.57/26/192.168.163.63 ubis_udbmain # aipdemo 193.27.40.52/26/193.27.40.63 192.168.163.52/26/193.27.40.63 ubis_applmain - das führt zu 4-fachen IP-Zuteilung : eth0,eth0:1,eth0:2,eth0:3 !! #dbfix 193.27.40.55/26/193.27.40.63 ubis_applmain #server1 193.27.40.181 192.168.163.181 ubis_up_mkctab ubis_nserv ubis_mserv ubis_fax #server1 193.27.40.53/26/193.27.40.63 192.168.163.53/26/193.27.40.63 ubis_up_mkctab ubis_nserv ubis_mserv ubis_fax #demo 193.27.40.53/26/193.27.40.63 aip_haservice #opteron 193.27.40.182/26/193.27.40.63 192.168.163.182/26/192.168.163.63 ubis_udbmain #opteron 193.27.40.54/26/193.27.40.63 192.168.163.54/26/192.168.163.63 ubis_udbmain logfile /var/log/ha-log debugfile /var/log/ha-debug debug 0 logfacility local1 #logfacility kern #use_logd yes #udpport 694 udpport 708 bcast eth0 #bcast eth1 coredumps true auto_failback on keepalive 5 warntime 10 deadtime 15 initdead 180 node demo node dbfix crm yes #stonith external/aipst /etc/ha.d/stonith.ssh #respawn hacluster /usr/lib64/heartbeat/ccm #respawn hacluster /usr/lib64/heartbeat/ipfail #apiauth stonithd uid=root #respawn root /usr/lib64/heartbeat/hbagent pe-warn-17_NM.bz2 Description: BZip2 compressed data ha-log-NM.gz Description: GNU Zip compressed data ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] How to set up group's score/attributes to force failover
Hallo all ! In HA V2.1.2 (R2) have I to set up 2 nodes with 3 resource groups so that every group will failover after 3 (monitor)failures to the another node. I looked already at http://www.linux-ha.com/v2/faq/forced_failover, but but that formula only applies to single resources. Where can I find the apropriate docu/formula for groups to achieve this ? Thank you for your tips ! Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems