Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-07 Thread Thomas Glanzmann
Hello Andrew, I can try and fix that if you re-run with -x and paste the output. (apache-03) [~] crm_report -l /var/adm/syslog/2013/08/05 -f 2013-08-04 18:30:00 -t 2013-08-04 19:15 -x + shift + true + [ ! -z ] + break + [ x != x ] + [ x1375633800 != x ] + masterlog= + [ -z ] + log WARNING:

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-07 Thread Thomas Glanzmann
Hello Andrew, As I said The cluster only stops doing this if writing to disk fails at some point - but there would have been an error in your logs if that were the case. I grepped in the logs and found out that there was a write error on 15 Juli and probably all changes after that did not

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-07 Thread Thomas Glanzmann
Hello Andrew, It really helps to read the output of the commands you're running: Did you not see these messages the first time? apache-03: WARN: Unknown cluster type: any apache-03: ERROR: Could not determine the location of your cluster logs, try specifying --logfile /some/path

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-05 Thread Thomas Glanzmann
Hello Andrew, Any change to the configuration section is automatically written to disk. The cluster only stops doing this if writing to disk fails at some point - but there would have been an error in your logs if that were the case. than I do not get it. Yesterday, when the nodes sucided

Re: [Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-05 Thread Thomas Glanzmann
Hello Andrew, did they ensure everything was flushed to disk first? (apache-03) [/var] cat /proc/sys/vm/dirty_expire_centisecs 3000 So dirty data should be flushed within 3 seconds. But I lost at least 24 hours maybe even more. So it seems that pacemaker / heartbeat does not do persistant

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-05 Thread Thomas Glanzmann
Hello Andrew, You will need to run crm_report and email us the resulting tarball. This will include the version of the software you're running and log files (both system and cluster) - without which we can't do anything. Find the files here: I manually packaged it because crm_report output

Re: [Linux-HA] Antw: Re: pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-05 Thread Thomas Glanzmann
Hello Ulrich, Did it happen when you put the cluster into maintenance-mode, or did it happen after someone fiddled with the resources manually? Or did it happen when you turned maintenance-mode off again? I did not remember, but checked the log files, and yes I did a config change (I removed

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-08-04 Thread Thomas Glanzmann
Hello Andrew, I just got another crash when putting a node into unmanaged node, this time it hit me hard: - Both nodes sucided or snothined each other - One out of four md devices where detected on both nodes after reset. - Half of the config was gone. Could you

[Linux-HA] Wheezy / heartbeat / pacemaker: Howto make persistent configuration changes

2013-08-04 Thread Thomas Glanzmann
Hello, both nodes of my ha cluster just paniced, afterwards the config was gone. Is there a command to force heartbeat / pacemaker to write the config to the disk or do I need to restart heartbeat for persistant changes. The config was at least 24 hours on the node, but I did not restart heatbeat

Re: [Linux-HA] Pacemaker: Only the first DRBD is promoted in a group having multiple filesystems which promote individual drbds

2013-06-16 Thread Thomas Glanzmann
Hello Andrew, If you include a crm_report for the scenario you're describing, I can take a look. The config alone does not contain enough information. I tried to reproduce that on a Debian Wheezy (7.0) in my lab environment and was unable to do so. I'll soon setup multiple other platforms and

Re: [Linux-HA] custom script status)

2013-06-07 Thread Thomas Glanzmann
Hello Mitsuo, from the output you send, you should update because your heartbeat version looks very very ancient to me. A resource script for heartbeat always needs at least these 5 operations: #!/bin/bash . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs export PID=/var/run/postgrey.pid

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-07 Thread Thomas Glanzmann
Hello Andrew, Jun 6 10:17:37 astorage1 crmd: [2947]: ERROR: crm_abort: abort_transition_graph: Triggered assert at te_utils.c:339 : transition_graph != NULL This is the cause of the coredump. What version of pacemaker is this? 1.1.7-1 Installing pacemaker's debug symbols would also

Re: [Linux-HA] Does drbd need re-start after configuration change ?

2013-06-07 Thread Thomas Glanzmann
Hello Fredrik, * Fredrik Hudner fredrik.hud...@gmail.com [2013-06-07 14:03]: Been trying to figure out if drbd which is monitored by HA, needs a restart if you do a configuration change in global_common.conf? http://www.drbd.org/users-guide/s-reconfigure.html So you need to issue a 'drbdadm

Re: [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-07 Thread Thomas Glanzmann
Hello Andrew, Installing pacemaker's debug symbols would also make the stack trace more useful. we tried to install heartbeat-dev to see more, but there are no debugging symbols available. Also I tried to reproduce the issue with a 64 bit Debian Wheezy as I used 32 bit before, I was not able

Re: [Linux-HA] custom script status)

2013-06-07 Thread Thomas Glanzmann
Hello Mitso, 3.0.4-1.el6 from the version I see that you're runing RHEL 6. So RHEL uses corosync or cman but not heartbeat as messaging bus between the nodes. You can follow this guide and the links in this guide. http://clusterlabs.org/quickstart-redhat.html What is annoying from my point

[Linux-HA] Pacemaker: Only the first DRBD is promoted in a group having multiple filesystems which promote individual drbds

2013-06-06 Thread Thomas Glanzmann
Hello, on Debian Wheezy (7.0) I installed pacemaker with heartbeat. When putting multiple filesystems which depend on multiple drbd promotions, only the first drbd is promoted and the group never comes up. However when the promotions are not based on the individual filesystems but on the group or

[Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

2013-06-06 Thread Thomas Glanzmann
Hello, over the last couple of days, I setup an active passive nfs server and iSCSI storage using drbd, pacemaker, heartbeat, lio and nfs kernel server. While testing cluster I was often setting it to unmanaged using: crm configure property maintenance-mode=true Sometimes when I did that, both

Re: [Linux-HA] Pacemaker: Only the first DRBD is promoted in a group having multiple filesystems which promote individual drbds

2013-06-06 Thread Thomas Glanzmann
Hello Emmanuel, * emmanuel segura emi2f...@gmail.com [2013-06-06 11:12]: order drbd_fs_after_drbd inf: ma-ms-drbd5:promote ma-ms-drbd8:promote astorage:start I can see that you promoted multiple drbds in one line. My config where I promote them individually also works. However my question,

Re: [Linux-HA] How to fix ERROR: Cannot chdir to [/var/lib/heartbeat/cores/hacluster]: Permission denied?

2013-06-06 Thread Thomas Glanzmann
Hello Shuwen, What functionality of dir /var/lib/heartbeat/cores/hacluster? if a component of heartbeat crashed, the core files are kept in this directory to do post portem analysis of the problem. How to fix this error print? What is your advice? Fix the permissions. For me the permissions

Re: [Linux-HA] Failed actions

2013-04-08 Thread Thomas Glanzmann
Hello Andrew, In this case, it is the initial monitor (the one that tells pacemaker what state the service is in before we try to start anything) that is failing. For the ones returning rc=1, it looks like something was wrong but the cluster was able to clean them up (by running stop) and

Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Thomas Glanzmann
Hello, ipv6addr=2600:3c00::0034:c007 from the manpage of ocf_heartbeat_IPv6addr it looks like that you have to specify the netmask so try: ipv6addr=2600:3c00::0034:c007/64 assuiming that you're in a /64. Cheers, Thomas ___ Linux-HA

Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Thomas Glanzmann
Hello Nick, Thanks for the tip, however, it did not work. That's actually a /116. So I put in 2600:3c00::0034:c007/116 and am getting the same error. I requested that it restart the resource as well, just to make sure it wasn't the previous error. now, I had to try it: node

Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Thomas Glanzmann
Hello Nick, Anything I need to do to allow IPv6... or something? I agree with Greg here. Have you tried setting the address manually? ip -6 addr add ip/cidr dev eth0 ip -6 addr show dev eth0 ip -6 addr del ip/cidr dev eth0 ip -6 addr show dev eth0 (node-62) [~] ip -6 addr add

Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Thomas Glanzmann
Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the

[Linux-HA] Failed actions

2013-03-22 Thread Thomas Glanzmann
Hello, I have an openais installation on centos which has logged failed actions, but the services appear to be 'started'. As I know heartbeat/pacemaker if an action fails the service should not be started. I also have a system on Debian squeeze that stops the service when a monitor action for IPMI

Re: [Linux-HA] stonith failed to start

2009-08-20 Thread Thomas Glanzmann
Hello Terry, What would cause the stonith 'start' operation to fail after it initially had succeeded? if my understanding is correct (I wrote a stonith agent for vsphere yesterday). Than it runs the status command of the stonith agent and looks at the exist status, like that: (ha-01) [~]

[Linux-HA] vsphere stonith, squid3 agent for debian lenny and example configuration

2009-08-20 Thread Thomas Glanzmann
=0 timeout=30s interval=10s start-delay=10s / action name=meta-data timeout=5s / action name=validate-all timeout=20s / /actions /resource-agent END ;; esac #!/usr/bin/perl use strict; use warnings FATAL = 'all'; # Thomas Glanzmann 10:28 09-08-19 # apt-get install libarchive-zip-perl

Re: [Linux-HA] Automatic Clenaup of certain resources

2008-09-02 Thread Thomas Glanzmann
Hello Andrew, * Andrew Beekhof [EMAIL PROTECTED] [080117 09:13]: On Jan 17, 2008, at 7:34 AM, Thomas Glanzmann wrote: I use Linux HA to monitor some services on a dial in machine. A so called single node lcuster. For example sometimes my dial-in connection or openvpn connection, or IPv6

Re: [Linux-HA] Automatic Clenaup of certain resources

2008-09-02 Thread Thomas Glanzmann
Hello Andrew, is this possible today? yes but only with pacemaker 0.7 thanks a lot I found the configuration option failure-timeout=60s Can someone give me a short walk-through? Look for Migrating Due to Failure in http://clusterlabs.org/mw/Image:Configuration_Explained_1.0.pdf

Re: [Linux-HA] Announcement: heartbeat/pacemaker documentation in hg

2008-04-28 Thread Thomas Glanzmann
Hello Dejan, http://hg.clusterlabs.org/pacemaker/doc/archive/tip.tar.gz I am unable to build this: (ad027088pc) [/var/tmp/Pacemaker-Docs-80da5f68a837] make /usr/lib/ocf/resource.d/heartbeat/AudibleAlarm: line 19: /resource.d/heartbeat/.ocf-shellfuncs: No such file or directory -:1: parser

Re: [Linux-HA] heartbeat failover not working on hard drive error

2008-03-28 Thread Thomas Glanzmann
Hello Coach-X (what a strange name), This has happened several times. Nothing shows up in either log file, and a hard reboot brings the master back online. Is this caused by the serial link still being active? Is there a way to have this type of issue cause the slave to become active?

Re: [Linux-HA] HA maintenance mode

2008-03-28 Thread Thomas Glanzmann
Hello Danny, Would be really nice to have that as cluster command in HA or as hb_gui feature already available. Or just a switch to enable/disable failover for mainteance purpose. it is already there. It is the default policy. I just don't bother to look it up in the manual, but maybe you are

Re: [Linux-HA] VLAN Trunk, IPaddr2, and static routes...

2008-03-27 Thread Thomas Glanzmann
Hello Chris, there is no need to put the vlan logic into the resource agent. Just configure the interface _before_ and use it _afterwards_. I have it running for ages on two different machines and it just works. Thomas ___ Linux-HA mailing list

Re: [Linux-HA] external/ipmi example configuration

2008-03-27 Thread Thomas Glanzmann
Hello Martin, it is pure luck that I am so bored that I read this list, next time CC me. :-) I have read several postings in the mail archive about the external/ipmi configuration but there are still some questions that bother me. The last posting from Thomas: did this cib-configuration

Re: [Linux-HA] Compiling Heartbeat on Solaris10

2008-02-14 Thread Thomas Glanzmann
Hello Ken, I am having trouble compiling Heartbeat 2.0.7 on a Solaris 10 system. I have tried SunStudio11 and gcc 3.3 and 4.0. Is there any information I can read that might help? It's complaining about Gmain_timeout_funcs in lib/clplumbing/GSource.c, if anyone has seen that before. first of

Re: [Linux-HA] Removing a node from cluster

2008-02-14 Thread Thomas Glanzmann
Hello Franck, Suppose I have a 3 nodes cluster: node1, node2, node3. I want to remove node2 from the cluster to be able to perform various operation on the node2 without any risk of ressources moving to node2. I tried to figure out with the cibadmin or crm_ressource but I don't get it. #

Re: [Linux-HA] resource script question (runlevel config)

2008-02-14 Thread Thomas Glanzmann
Hello Amy, What about something like monit to make sure ssh is up and running and restart if it crashes? thanks for the pointer. A very interesting tool. I was looking for something like that but decided to write something by myself but it sounds great maybe I will give it a try.

[Linux-HA] ClusterIP

2008-02-07 Thread Thomas Glanzmann
Hello, I would like to do a Cluster-IP Setup with SLES 10. A few things are unclear for me. With ClusterIP you have one IP address that is shared on two or more nodes. It useally uses a multicast mac address. Both nodes see all traffic. But when one node goes down how does the other node see that

Re: [Linux-HA] ClusterIP

2008-02-07 Thread Thomas Glanzmann
Hello, thank you a lot for the feedback! Now I understand how the failover works. Has someone a ready to use cib.xml that I can use for testing. I am going to try my luck right now and come back in an hour or so with my findings. It would be nice if someone could comment on them. Thomas

[Linux-HA] propagate value similliar to pingd

2008-02-07 Thread Thomas Glanzmann
Hello, I would like to write a script similiar to pingd that is spawnd and populates a value in the cib that I can build a rule on. What do I have to do to obtain the above. Concrete questions are: - What do I have to put in the cib to spawn such an 'agent'? - How do I propagate

Re: [Linux-HA] ClusterIP

2008-02-07 Thread Thomas Glanzmann
Hello again, here comes by cib.xml for a clusterip. But the ressource stickiness is not working for me. When I shoutdown ha-2, the two clone instances stay on ha-1. Any ideas? Before sending this e-mail I used the following command to set some location constraints: crm_resource -M -r ip0:0 -H

Re: [Linux-HA] ClusterIP

2008-02-07 Thread Thomas Glanzmann
Hallo Lars Uhm, what do you think should happen when you shutdown ha-2 - of course they stat on ha-1 in that case? I meant that I shut it down temporarily and if it comes back again the clones stay both on one node instead of going back again. I don't know what you're saying here ;-) I

Re: [Linux-HA] Samba and High Availability

2008-02-07 Thread Thomas Glanzmann
Hello Christopher, Everything I have read about samba and HA made it seem like this was not possible. Are others doing this too? Can you think of some good tests to try to stress it (short of accessing a database or something). I imagine a fail-over during a large copy operation would fail,

Re: [Linux-HA] DRBD 8.0 under Debian Etch?

2008-02-06 Thread Thomas Glanzmann
Hello Fabiano, Short question: Does anyone here have DRBD8 running with heartbeat under Etch? I do and it works like a charm. Search the archives for the complete config or drop me an e-mail and I resend it to you with a few things you should obey I to get a perfect drbd setup. Thomas

Re: [Linux-HA] About configuring DRBD v8 on HA v2

2008-01-30 Thread Thomas Glanzmann
Hello Stefano, it is not possible to configure drbd in a master/slave through the gui. For a walkthrough use one of the following: http://article.gmane.org/gmane.linux.highavailability.user/22132 Thomas ___ Linux-HA mailing list

Re: [Linux-HA] DRBD and Pingd

2008-01-25 Thread Thomas Glanzmann
Hello Dominik, You can also start pingd from ha.cf with a respawn directive. Just as Steve did it. Works fine here and imho has the advantage of a pingd value being calculated when the constraints are applied (because pingd starts right away and not just when the crm comes alive). I see.

Re: [Linux-HA] DRBD and Pingd

2008-01-24 Thread Thomas Glanzmann
Hello Steve, attached is a working example for a postgres cluster. Put your filesystem, ip, database thing in a ressource group and drop the colocation and order constraints or you have to define your order rules on two directions. See also this thread:

Re: [Linux-HA] DRBD and Pingd

2008-01-24 Thread Thomas Glanzmann
Hi Steve, your cib.xml isn't working because you forget to propagate the pingd values. You have forgot to add the pingd clone ressource to your cib.xml. Common mistake I did it once by myself so your scores don't get propagated: put that in your ressources section in the cib: clone

Re: [Linux-HA] what to do on loss of network

2008-01-24 Thread Thomas Glanzmann
Hello Kettunen, I have SLES10 SP1 HA 2.0.8 split site two node cluster and I've configured pingd clone resource to make resource location constrains. It works very well. My ping node is Iscsi server in third site from where cluster node mounts its resource disk. If I disconnect all

Re: FW: [Linux-HA] what to do on loss of network

2008-01-24 Thread Thomas Glanzmann
Hallo Kettunen, Correction. I ment to say that splitbrain detection should be done when nodes see each other again (even at network level). CRM status messages do move when connection between nodes is back, but other node don't accept messages from other node. I aggree with you. They should

Re: [Linux-HA] ordering constraints and node crash

2008-01-22 Thread Thomas Glanzmann
Hello Marc, If I kill the node hosting postgresr2, postgresr2 migrates to another node, but applisr1 and applisr3 aren't restarted. Is it normal ? What could I do to solve this ? the answer to your question is 'resource group'. A resource group is a container for resources. Every resource in

Re: [Linux-HA] HOWTO: Build a high available iscsi Target using heartbeat, drbd and ietd for ESX Server 3.5

2008-01-21 Thread Thomas Glanzmann
Hello Dejan, Nice effort. Thanks for sharing it. Perhaps you'd like to put this into the wiki.linux-ha.org. If you do, don't forget to pepper the doc with YMMV. I am going to do that. 1. cib is a bit too lean. There are no attributes set for the ietd resource. Well I have that default

Re: [Linux-HA] HOWTO: Build a high available iscsi Target using heartbeat, drbd and ietd for ESX Server 3.5

2008-01-21 Thread Thomas Glanzmann
Trent, I just did a very similar thing, except in my case I am using shared storage (MD3000 - SAS) and theres a bit more fun to that part of it (multipath, stonith, etc) - also I setup heartbeat in v1 mode not CRM mode. nice, I neve had a MD3000 on my hands. I plan to post a

Re: [Linux-HA] OCF test script (ocf-tester)

2008-01-21 Thread Thomas Glanzmann
Hello Jeff, Please find attached the Nagios OCF script I wrote. thank you for sharing. monitor_nagios(){ case ${NAGIOSRUNNING} in yes) if [ -f ${OCF_RESKEY_pid} ]; then echo ${0} MONITOR: running exit 0 fi ;; no) if [ -f

Re: [Linux-HA] OCF test script (ocf-tester)

2008-01-19 Thread Thomas Glanzmann
Hello Jeff, I am attempting to write an OCF compliant script for nagios. I have followed the documentation here: I attached the one I am using. Keep me posted if you do something different. Thomas #!/bin/bash . ${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs export

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-18 Thread Thomas Glanzmann
Hello, You don't need location constraints. okay. Could elaborate please? Does the stonith subsystem automatically know where to put them? Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-18 Thread Thomas Glanzmann
Hello Lars, If the node fails, and the other side needs STONITH, the resource will be started in that partition automatically. The location constraints don't hurt, but you don't need them. STONITH resources get started before any STONITH operation is performed, which has roughly the same

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-18 Thread Thomas Glanzmann
Lars, Assuming that the fencing device can be reached from all nodes, it doesn't matter where they are put. Only if you have, say, a serial power switch which is only reachable from one node do you need location constraints. I have a two node cluster. I use external/ipmi which needs one

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-18 Thread Thomas Glanzmann
Hello Dejan, http://developerbugs.linux-foundation.org/show_bug.cgi?id=1752 According to this, it does matter. There really is a check in stonithd which prevents a node to stonith itself. So, I'd say that there should be a location constraint which says not to run a stonith resource on the

[Linux-HA] Supervise but don't stop a resource

2008-01-18 Thread Thomas Glanzmann
Hello, is it possible with linux-ha to supervice (monitor) a resource and start it when it failed, but do not stop it when heartbeat is stopped? I am thinking about the syslog daemon and sshd. Thomas ___ Linux-HA mailing list

Re: [Linux-HA] Colocations and orders

2008-01-17 Thread Thomas Glanzmann
of tomcat init script shipped with debian. # # Thomas Glanzmann --tg 21:22 07-12-30 # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # This script manages a Heartbeat Tomcat instance # usage: $0 {start|stop|status|monitor|meta

Re: [Linux-HA] Colocations and orders

2008-01-17 Thread Thomas Glanzmann
/resource-agent END ;; esac #!/bin/sh # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # OCF Ressource Agent on top of tomcat init script shipped with debian. # # Thomas Glanzmann --tg 21:22 07-12-30

Re: [Linux-HA] external/ipmi example configuration

2008-01-16 Thread Thomas Glanzmann
Hello, the previous extern/ipmi configuration worked, but I don't know why. However here is one that seems to be follow standard practice: resources primitive id=postgres-01-fencing class=stonith type=external/ipmi provider=heartbeat operations op

Re: [Linux-HA] detecting network isolation

2008-01-16 Thread Thomas Glanzmann
Hello, I have a two-node test cluster. I added a ping statement to each of the nodes to ping the default network. The two nodes are connected to the same network segment and have a crosslink cable between them. When I plug out the cable of the node that is running the service, I see the following

[Linux-HA] Automatic Clenaup of certain resources

2008-01-16 Thread Thomas Glanzmann
Hello, I use Linux HA to monitor some services on a dial in machine. A so called single node lcuster. For example sometimes my dial-in connection or openvpn connection, or IPv6 connectivity does not come. Is there a way to tell Linux-HA to retry a failed resource after a certain amount of time

Re: [Linux-HA] detecting network isolation

2008-01-16 Thread Thomas Glanzmann
Hello, Jan 17 05:50:56 ha-2 heartbeat: [4452]: WARN: node 10.0.0.1: is dead Jan 17 05:50:56 ha-2 heartbeat: [4452]: info: Link 10.0.0.1:10.0.0.1 dead. Jan 17 05:50:56 ha-2 crmd: [4470]: notice: crmd_ha_status_callback: Status update: Node 10.0.0.1 now has status

Re: [Linux-HA] Restart a Resource controlled by Heartbeat

2008-01-14 Thread Thomas Glanzmann
Hello Boroczki, I'd rather use kill -HUP `pidof nagios` (or something similar) to reload the configuration of nagios. this is what I ended up doing. Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-14 Thread Thomas Glanzmann
Lars, Yes. You have more than one primitive within the clone, which doesn't work. Why do you do that? Because there is no documentation, the maintainer doesn't answer to e-mail and this was the only example that I found in the archives. And it seemed to work. But I guess I was just lucky.

Re: [Linux-HA] Running Linux-HA on a single node cluster

2008-01-14 Thread Thomas Glanzmann
Hello, I have 9 machines configured as 6 clusters: ~ and I can't count. But I have a ninth server who does smtp. But it will soon go away and get a ha resource. Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] Running Linux-HA on a single node cluster

2008-01-14 Thread Thomas Glanzmann
Hello Andrew, looks sane enough - though linux-ha is slightly heavy for just monitoring processes in a cluster-of-one. any reason not to make it a four node cluster? I have 9 machines configured as 6 clusters: - 2x apache (ha resources: router; openvpn; nagios; apache +

Re: [Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-14 Thread Thomas Glanzmann
Hello Andrew, does that help? yes it does. I have a test cluster. I will write a pseudo plugin or use the ssh one to simulate the behaviour and come back to you if I have something to work with. I am still not sure how it works, but maybe I simply should start to read source code.

[Linux-HA] ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned.

2008-01-13 Thread Thomas Glanzmann
Hello, could someone tell me what is wrong with that fencing configuration: Jan 13 11:38:48 apache-02 pengine: [13769]: ERROR: clone_unpack: fencing has too many children. Only the first (apache-01-fencing) will be cloned. Jan 13 11:38:48 apache-02 pengine: [13769]: info: process_pe_message:

Re: [Linux-HA] MailTo Resource specified wrong?

2008-01-11 Thread Thomas Glanzmann
Hello Kirby, WARNING: Don't stat/monitor me! MailTo is a pseudo resource agent, so the status reported may be incorrect I guess if I had to guess, I'd probably delete the 'MailTo_6_mon' line... But I don't know if that'll affect the mail I get when heartbeat switches things around If

Re: [Linux-HA] debian and heartbeat

2008-01-10 Thread Thomas Glanzmann
Hello, honestly, i would not use this repository for my upgrades as - at least in the past - major changes have been introduced during the heartbeat 2.1.3 development. for example the constraints were heavily modified. I wouldn't use it for production either. But my point still stands this

Re: [Linux-HA] Monitoring Apache (v2.0.8)

2008-01-09 Thread Thomas Glanzmann
Hello Alon, I would update to 2.1.3 (I am not sure if that is your problem). And make the interval for the monitor operation higher. At the moment it seems to be scheduled each second. Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] debian and heartbeat

2008-01-09 Thread Thomas Glanzmann
Hello Michael, http://www.ultramonkey.org/download/heartbeat/2.1.3/ which Debian Release do you use? Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:

Re: [Linux-HA] debian and heartbeat

2008-01-09 Thread Thomas Glanzmann
Hello Michael, etch. debian_version is 4.0 apt-get update and upgrade done. the packages you try to use are for Debian Sid. You can do of the following things: - put deb http://131.188.30.102/~sithglan/linux-ha-die2te/ ./ into /etc/apt/sources.list and call apt-get update;

Re: [Linux-HA] debian and heartbeat

2008-01-09 Thread Thomas Glanzmann
Hello, Is this a regularly updated repository with the heartbeat ldirectord packages (and only those packages)? yes, it is. But in the future the path will be deb http://131.188.30.102/~sithglan/ha/ Thomas ___ Linux-HA mailing list

Re: [Linux-HA] debian and heartbeat

2008-01-09 Thread Thomas Glanzmann
Hello, btw. the problem was that I build the packages on machine that had a sarge gnutls-dev installed. I upgraded the package and just rolled it out on 9 machines everything is up and running. :-) Thomas ___ Linux-HA mailing list

Re: [Linux-HA] debian and heartbeat

2008-01-09 Thread Thomas Glanzmann
Hello Andrew, http://download.opensuse.org/repositories/server:/ha-clustering/ do you have a apt line to use that location? I tried to make something up but failed. Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] auto_failback off, but the resource group still fails back.

2008-01-09 Thread Thomas Glanzmann
Jason, just to get sure that we're on the same page here: - You have a two node cluster - You have a resource that is running only on one node - When you run the resource on node b, node a reports for that resource a failed monitor? If that is the case then

Re: [Linux-HA] Re: problems with ha, drbd and filesystems resource

2008-01-09 Thread Thomas Glanzmann
Hello Stephan, No ideas about the problem? I think the question was already answered by someone on the list. Heartbeat doesn't support drbd-0.8 at the moment. Eg. you can run a primary/secondary cluster but not a primary/primary cluster. So one who understand what he is doing has to adopt the

Re: [Linux-HA] auto_failback off, but the resource group still fails back.

2008-01-09 Thread Thomas Glanzmann
Hello Jason, 1) It nominates a node as DC (in this case, node2, though I've seen both) 2) The 'failed actions' block get's these lines almost immediately: resource_samba_storage_monitor_0 (node=node2.domain.com, call=3, rc=9): Error resource_samba_storage_monitor_0

Re: [Linux-HA] debian and heartbeat

2008-01-09 Thread Thomas Glanzmann
Hello, http://download.opensuse.org/repositories/server:/ha-clustering/Debian_Etch/Packages the debian folks are good, but not quite that good..see: http://ccrma.stanford.edu/planetccrma/man/man5/sources.list.5.html for details on how to setup a custom apt source. I read the manpage. I

Re: [Linux-HA] Can I use different interfaces in different nodes?

2008-01-09 Thread Thomas Glanzmann
Hello, I want to setup a two nodes httpd cluster with heartbeat, and the configuration listed below: that shouldn't be a problem, just adopt the ha.cf on each node to reflect the network card configuration. And one more question, can I use bcast in VLAN environment? You can. I have it

Re: [Linux-HA] hb2: making xml manageable

2008-01-08 Thread Thomas Glanzmann
Hello, 1. Without restarting or shutting down the cluster, and without editing the cib.xml file how can I make a change to the cluster configuration (i.e. how can I use haresources2cib.py to generate an updated cib.xml and get the cluster to use it without a restart) I use 8 space wide

Re: [Linux-HA] Howto list all available agents and there possible attributes

2008-01-08 Thread Thomas Glanzmann
Hello Simon, I double checked and 2.1.3-2 does include both /usr/lib/stonith/plugins/external/ipmi and /usr/sbin/ciblint I can confirm this. I used your diff, dsc and orig file to build a package for Debian Etch (4.0). I am going to roll out the version tonight on my production cluster (9

Re: [Linux-HA] coding bugfix for lib/plugins/stonith/ipmilan.c

2008-01-08 Thread Thomas Glanzmann
Hello Dejan, Configuration is comparable to the external/ipmi. Just check the parameter names and adjust the stonith type. I see. So there is no need to touch ha.cf? Just add the ipmilan thing to the cib.xml and that's it? Thomas, if you could also do additional testing, that'd be great. I

Re: [Linux-HA] auto_failback off, but the resource group still fails back.

2008-01-08 Thread Thomas Glanzmann
Hello Jason, 1) For the monitor action, I might suggest the docs be updated slightly. According to http://www.linux-ha.org/OCFResourceAgent, 0 for 'running', 7 for 'stopped', and anything else is valid, but indicates an error. I have modified my script to only return '1' on error.

Re: [Linux-HA] Failover iscsi SAN

2008-01-07 Thread Thomas Glanzmann
Hello Michael, could you please send me your ocf resource agent for ietd and the Output of cibadmin -Q without the status section. That is because I want to do such a setup by myself. Have you tested and initiators with that setup. I would like to use it with ESX Server Version 3.5. And would like

Re: [Linux-HA] Howto list all available agents and there possible attributes

2008-01-07 Thread Thomas Glanzmann
Hello Andrew, I believe it was considered too broken to continue shipping. None of us have the required hardware to test/fix/maintain the relevant code. I think you believe wrong. The external/ipmi plugin works out of the box and perfectly fine at least for me. Just the documentation is

Re: [Linux-HA] Failover iscsi SAN

2008-01-07 Thread Thomas Glanzmann
Hello Niels, My personal experience with ietd is that it really doesn't like to be stopped if it is in use. (i.e. kernel panics, kernel hangs etc.). I would do some carefull testing before trying to use this in a heartbeat environment. I just want a proof of concept nor a production system.

[Linux-HA] drbd + ocfs2

2008-01-07 Thread Thomas Glanzmann
Hello, I would like to know how to setup a drbd + ocfs2 installation with two masters? What ocf agent do I have to use for that? Has someone a working example configuration? I would like to use heartbeat-2.1.3. Thomas ___ Linux-HA mailing list

Re: [Linux-HA] Jan 2 23:25:01 postgres-02 tengine: [8736]: ERROR: te_graph_trigger: Transition failed: terminated

2008-01-07 Thread Thomas Glanzmann
Hello Andrew, Thanks - the PE is now smart enough to at least filter out the duplicates :-) thanks a lot for getting rid of this annoying bug. :-) Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org

Re: [Linux-HA] Howto list all available agents and there possible attributes

2008-01-07 Thread Thomas Glanzmann
Hello Simon, Thanks, I'll look into this. Though I was under the impression that the ipmi module was broken. Has it been fixed? there are two ipmi modules: - ipmilan (a c implementation, that is not build by default) - external/ipmi (a shell script) The first one was indeed

Re: [Linux-HA] coding bugfix for lib/plugins/stonith/ipmilan.c

2008-01-07 Thread Thomas Glanzmann
Hello, But it stops. If you have the machine with IPMI interface, could you test my patch? do you have a confugration for me. I have a machine with ipmi and the external/ipmi stonith works for me. If you can walk me through configuring ipmilan I can give it a spin. Thomas

Re: [Linux-HA] problems with ha, drbd and filesystems resource

2008-01-06 Thread Thomas Glanzmann
Hello Stephan, could you please attach your config? cibadmin -Q and drop the status section? Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:

Re: [Linux-HA] Linux-HA Service Monitoring

2008-01-04 Thread Thomas Glanzmann
Hello Jayaprakash, I Place the new script in /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs and execute the following commands. hopefully you did not but from the output I can tell that you didn't. You put it where it belongs. If possible come to online, we discuss in detailed.. My id

Re: [Linux-HA] external/ipmi example configuration

2008-01-04 Thread Thomas Glanzmann
Hello Dominik, How can I test the stonith plugin eg. tell heartbeat to shoot someone? iptables -I INPUT -j DROP Okay. That is obvious. Play dead fish in the water. Lucky me that I don't have a serial heartbeat. Thanks. Thomas ___ Linux-HA

Re: [Linux-HA] external/ipmi example configuration

2008-01-04 Thread Thomas Glanzmann
Hello Dejan, I searched the archive but looked for ipmi in the subject, but now that you mentioned it I searched for external stonith and I found an example. See http://linux-ha.org/ExternalStonithPlugins for an example. You can also search the archive of this list for more examples. I read

  1   2   >