from:"Nikita Michalko"

Re: [Linux-HA] Doubt in HA configuration

2015-05-11 Thread Nikita Michalko


Hi Rajesh,

the pacemaker mailing list has moved since February to users:

http://oss.clusterlabs.org/mailman/listinfo/users

- so  you'll need to subscribe to the 'users' list and repost it there!


Regards


Nikita



On 07.05.2015 17:19, Rajesh S wrote:

Hi Team,

 Sorry for inconvenience.I want to do HA between two different
machines one is HP workshop and another one is VMware virtual machine but
both of the device's are in different IP series as 192.168.70.102 and
192.168.90.130 .Does HA is possible or not.Please advice me.



With Regards,
RAJESH S.

On 7 May 2015 at 18:39, Rajesh S raj...@tevatel.com wrote:


Hi Team,

 I want to do HA between two different machines one is HP
workshop and another one is VMware virtual machine but both of the device's
are in different IP series as 192.168.90.102 and 192.168.90.130 .Does HA is
possible or not.Please advice me.



With Regards,
RAJESH S.


___
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
___
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha


___
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
___
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha

Re: [Linux-HA] Announcing the Heartbeat 3.0.6 Release

2015-02-10 Thread Nikita Michalko


On 10.02.2015 22:24, Lars Ellenberg wrote:

TL;DR:

   If you intend to set up a new High Availability cluster
   using the Pacemaker cluster manager,
   you typically should not care for Heartbeat,
   but use recent releases (2.3.x) of Corosync.

   If you don't care for Heartbeat, don't read further.

Unless you are beekhof... there's a question below ;-)



After 3½ years since the last officially tagged release of Heartbeat,
I have seen the need to do a new maintenance release.

   The Heartbeat 3.0.6 release tag: 3d59540cf28d
   and the change set it points to: cceeb47a7d8f
GREAT !!!   Thank you very much, Lars! Heartbeat is still running on 
some our production clusters ...



The main reason for this was that pacemaker more recent than
somewhere between 1.1.6 and 1.1.7 would no longer work properly
on the Heartbeat cluster stack.

Because some of the daemons have moved from glue to pacemaker proper,
and changed their paths. This has been fixed in Heartbeat.

And because during that time, stonith-ng was refactored, and would still
reliably fence, but not understand its own confirmation message, so it
was effectively broken. This I fixed in pacemaker.



If you chose to run new Pacemaker with the Heartbeat communication stack,
it should be at least 1.1.12 with a few patches,
see my December 2014 commits at the top of
https://github.com/lge/pacemaker/commits/linbit-cluster-stack-pcmk-1.1.12
I'm not sure if they got into pacemaker upstream yet.

beekhof?
Do I need to rebase?
Or did I miss you merging these?

---

If you have those patches,
consider setting this new ha.cf configuration parameter:

# If pacemaker crmd spawns the pengine itself,
# it sometimes forgets to kill the pengine on shutdown,
# which later may confuse the system after cluster restart.
# Tell the system that Heartbeat is supposed to
# control the pengine directly.
crmd_spawns_pengine off



Here is the shortened Heartbeat changelog,
the longer version is available in mercurial:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/shortlog

- fix emergency shutdown due to broken update_ackseq
- fix node dead detection problems
- fix converging of membership (ccm)
- fix init script startup glitch (caused by changes in glue/resource-agents)
- heartbeat.service file for systemd platforms
- new ucast6 UDP IPv6 communication plugin
- package ha_api.py in standard package
- update some man pages, specifically the example ha.cf
- also report ccm membership status for cl_status hbstatus -v
- updated some log messages, or their log levels
- reduce max_delay in broadcast client_status query to one second
- apply various (mostly cosmetic) patches from Debian
- drop HBcompress compression plugins: they are part of cluster glue
- drop openais HBcomm plugin
- better support for current pacemaker versions
- try to not miss a SIGTERM (fix problem with very fast respawn/stop cycle)
- dopd: ignore dead ping nodes
- cl_status improvements
- api internals: reduce IPC round-trips to get at status information
- uid=root is sufficient to use heartbeat api (gid=haclient remains sufficient)
- fix /dev/null as log- or debugfile setting
- move daemon binaries into libexecdir
- document movement of compression plugins into cluster-glue
- fix usage of SO_REUSEPORT in ucast sockets
- fix compile issues with recent gcc and -Werror

Note that a number of the mentioned fixes have been created two years
ago already, and may have been released in packages for a long time,
where vendors have chosen to package them.



As to future plans for Heartbeat:

Heartbeat is still useful for non-pacemaker, haresources-mode clusters.

We (Linbit) will maintain Heartbeat for the foreseeable future.
That should not be too much of a burden, as it is stable,
and due to long years of field exposure, all bugs are known ;-)

The most notable shortcoming when using Heartbeat with Pacemaker
clusters would be the limited message size.
There are currently no plans to remove that limitation.

With its wide choice of communications paths, even exotic
communication plugins, and the ability to run arbitrarily many
paths, some deployments may even favor it over Corosync still.

But typically, for new deployments involving Pacemaker,
in most cases you should chose Corosync 2.3.x
as your membership and communication layer.

For existing deployments using Heartbeat,
upgrading to this Heartbeat version is strongly recommended.

Thanks,

Lars Ellenberg



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh 2.0 released, and moving to Github

2014-04-04 Thread Nikita Michalko

Hi Kristoffer,


Nice work - thank you very much!


Best wishes


Nikita Michalko   


Am Donnerstag, 3. April 2014 18:03:33 schrieb Kristoffer Grönlund:
 Hello everyone,
 
 Today, I have two major announcements to make: crmsh is moving to a
 new location, and I'm releasing the next major version of the crm
 shell!
 
 == Find us at crmsh.github.io
 
 Since the rest of the High-Availability stack is being developed over
 at Github, we thought it would make things easier to move crmsh over
 there as well. This means we're not only moving the website and issue
 tracker, we're also switching from Mercurial to git.
 
 From this release forward, you will find everything crmsh-related at
 
 http://crmsh.github.io, and the source code at
 https://github.com/crmsh/crmsh.
 
 Here are the new URLs related to crmsh:
 
 * Website: http://crmsh.github.io/
 
 * Documentation: http://crmsh.github.io/documentation.html
 
 * Source repository: https://github.com/crmsh/crmsh/
 
 * Issue tracker: https://github.com/crmsh/crmsh/issues/
 
 Not everything has moved quite yet, but the source code and web site
 are in place.
 
 == New stable release: crmsh 2.0
 
 Secondly, we are proud to finally release crmsh 2.0! This is the
 version of crmsh I have been developing since I became a maintainer
 last year, and there are a lot of new and improved features in this
 release.
 
 For a more complete list of changes since the previous version, please
 refer to the changelog:
 
 * https://github.com/crmsh/crmsh/blob/2.0.0/ChangeLog
 
 Packages for several popular Linux distributions (updated soon):
 
 http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/
 
 Zip archive of the tagged release:
 
 * https://github.com/crmsh/crmsh/archive/2.0.0.zip
 
 Here is a short list of some of the biggest changes and features in
 crmsh 2.0:
 
 * *More stable than ever before!* Many bugs and issues have been
   fixed, with plenty of help from the community. At the same time,
   this is a major release with many new features. Testing and pull
   requests are more than welcome!
 
 * *Cluster management commands.* We've added a couple of new
   sub-levels that help with the installation and management of the
   cluster, as well as maintaining and synchronizing the corosync
   configuration across nodes. There are now commands for starting and
   stopping the cluster services, as well as cluster scripts that
   make the installation and configuration of cluster-controlled
   resources a one-line command.
 
 * *Cleaner CLI syntax.* The parser for the configure syntax of
   crmsh has been rewritten, allowing for cleaner syntax, better
   error detection and improved error messages.
 
 * *Tab completion everywhere.* Now tab completion works not only in
   the interactive mode, but directly from bash. In addition, the
   completion back end has been completely rewritten and many more
   commands now have full completion. It's not quite every single
   command yet, but we're getting there.
 
 * *New and improved configuration.* The new configuration file is
   installed in /etc/crm/crm.conf by default or per user if desired,
   and allows for a much more flexible configuration of crmsh.
 
 * *Cluster health evaluation.* As part of the cluster script
   functionality, there is now a cluster health command which
   analyses and reports on low disk space, problems with network
   configuration, firewall configuration issues and more. The best part
   of the cluster health command is that it can work without a
   configured cluster, providing a checklist of issues to amend before
   setting up a new cluster.
 
 * *And wait, there's more!* There is now not only an extensive
   regression test suite but a growing set of unit tests as well,
   support for many new features in Pacemaker 1.1.11 such as resource
   sets in location constraints, anonymous shadow CIBs makes it easier
   to avoid race conditions in scripts, full syntax highlighting for
   the built-in help, the assist sub-command helps with more advanced
   configurations... the list goes on.
 
 Big thanks to everyone who have helped with bug fixes, comments and
 contributions for this release!
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Robert Koeppl ist außer Haus. Robert Koeppl is out of office

2013-11-28 Thread Nikita Michalko


On 28.11.2013 19:29, robert.koe...@knapp.com wrote:

Ich werde ab  28.11.2013 nicht im Büro sein. Ich kehre zurück am
03.12.2013.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Sehr geehrter Herr Koeppl,

bitte stellen Sie ENDLICH  das Versenden solchen Mails an 
linux-ha@lists.linux-ha.org ein - ich WILL NICHT KEINE Mails von Ihnen 
mehr erhalten


TIA!

Nikita Michalko

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Q: ERROR: crm_timer_popped: Election Timeout (I_ELECTION_DC) just popped in state S_RELEASE_DC! (120000ms)

2013-06-11 Thread Nikita Michalko

Also seen in logs with the pacemaker 1.1.5: the point was unequal MTU settings 
between the 
servers ;-)
After setting the same MTU  on both NICs, all was working like a charm!


Have a nice day!


Nikita Michalko   


Am Mittwoch, 5. Juni 2013 14:25:19 schrieb Ulrich Windl:
 Hi!
 
 When the cluster (SLES11 SP2, current) was put in maintenance mode, two
  messages repeat periodically, filling the log: ERROR: crm_timer_popped:
  Election Timeout (I_ELECTION_DC) just popped in state S_RELEASE_DC!
  (12ms) WARN: do_log: FSA: Input I_ELECTION_DC from crm_timer_popped()
  received in state S_RELEASE_DC
 
 crmd: [7285]: info: handle_request: Current ping state: S_RELEASE_DC
 
 Is this expected, or is it yet another bug (pacemaker-1.1.7-0.13.9)?
 
 Regards,
 Ulrich
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Virtual interface (eth0:0) disappeared

2013-05-21 Thread Nikita Michalko

  

Am Dienstag, 21. Mai 2013 00:00:03 schrieb DaveW:
 We are running heartbeat 2.1.3 on CentOS 5.4.  Last Monday AM, I

- Man, so OLD! Any chance to update to the latest version ?


Nikita Michalko

 received a call while getting ready for work.  Our high availability
 server was not responding.  The previous Saturday, our I.T. admins had
 re-configured the network to expand IP address ranges on some subnets.
 For whatever reason, this action caused our main server (in a two-node
 HA configuration) to loose its virtual interface, rendering our
 high-availability server unavailable.
 
 The network worked fine; the nodes could ping each other based on their
 normal IP's and they could ping the ping node, but the virtual IP (the
 one we REALLY care about) was ignored.  Nothing in the logs, no errors,
 nothing.   Just an unresponsive virtual server.  A manual fail-over
 brought it back quickly as the backup took over.  I.T. had done their
 work on Sat and, had I checked our server on Sunday, I would have found
 it unreachable with a normal ping.
 
 When my colleague called me, I asked him what ifconfig looked like.
 He described three interfaces; eth0, eth1 and lo; no eth0:0. I had him
 initiate the manual fail-over.
 
 After pouring over the logs, unable to find anything that indicated a
 problem, I tried to simulate the problem with ifconfig eth0:0 down.
 Sure enough, no fail-over, no errors, nothing; just (once again) an
 unresponsive server.  ifconfig eth0:0 IP_ADDRESS up brought it right
 back (I tried this last Saturday, BTW, when no one was working).  It
 seems that heartbeat (ipfail?) creates this virtual interface when it
 starts, then forgets about it.  I presume that the assumption is that if
 eth0 remains intact, eth0:0 will remain intact, as well.
 
 Am I missing something in the configuration settings or docs?  I find
 nothing about configuring the backup node to monitor the virtual
 address, just the other node (which has a different IP and kept working
 after the network changes).  I am about to set up a service to monitor
 the virtual IP, but I wanted to check with the list, first, to see if
 there's already been something built in that I have not configured
 correctly.  I have used main.company.com and backup.company.com as the
 two hostnames of the nodes.  Both systems have these names in an
 /etc/hosts file, along with the hostname and IP of the virtual server
 and the ping node.
 
 My configuration:
 
 /etc/ha.d/ha.cf:
 
 debugfile /var/log/ha-debug
 logfile/var/log/ha-log
 logfacilitylocal0
 keepalive 2
 deadtime 10
 warntime 3
 initdead 120
 udpport694
 baud9600
 serial/dev/ttyS0
 ucast eth1 10.0.0.1
 ucast eth1 10.0.0.2
 auto_failback off
 node main.company.com backup.company.com
 ping 129.196.140.130
 respawn hacluster /usr/lib/heartbeat/ipfail
 deadping 10
 
 /etc/ha.d/haresources
 
 main.company.com drbddisk::drbd_resource_0
 Filesystem::/dev/drbd0::/usr0::ext3 mysql IPaddr::129.196.140.14 httpd
 smb MailTo::root
 
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Robert Koeppl ist außer Haus. Robert Koeppl is out of office

2013-05-13 Thread Nikita Michalko

Mr. Koeppl: it's absolutly not interesting for me - why get I this message?!


Nikita Michalko
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Robert Koeppl ist außer Haus. Robert Koeppl is out of office

2013-05-13 Thread Nikita Michalko

Danke für die Info, Herr Koeppl, aber diese Mails sind hier nicht gewünscht! 
Bitte nicht 
mehr schicken!


Nikita Michalko


Am Montag, 13. Mai 2013 01:13:20 schrieb robert.koe...@knapp.com:
 Ich werde ab  08.05.2013 nicht im Büro sein. Ich kehre zurück am
 21.05.2013.
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: crmd: [31942]: WARN: decode_transition_key: Bad UUID (crm-resource-25438) in sscanf result (3) for 0:0:crm-resource-25438

2012-08-17 Thread Nikita Michalko

   

Am Donnerstag, 16. August 2012 17:54:06 schrieb EXTERNAL Konold Martin 
(erfrakon, RtP2/TEF72):
 Hi,
 
  What I do is evaluation od SLES11 SP2 (we run SP1) now. So testing
  anything that's not part of SP2 (plu Updates) is not planned right now.
 
  I also think when reporting problems here early might get you mentally
  prepared when the problem is eventually reported via official support.
 
  Maybe also in times of google, other people may be interested to see what
  other people found out.
 
 From my experience with SLES11 SP2 (with all current updates) I conclude
  that actually nobody is seriously running SP2 without local bugfixes.
 

I am also testing  SP2 - and yes, it's true: not yet ready for production ;-(


 E.g. Even the most simple examples from the official SuSE documentation
  don't work as expected.
 
 A trivial example is ocf:heartbeat:exportfs as distributed by SuSE with SP2
  causes unlimited growth of .rmtab files (goes fast in the gigabytes for
  serious NFS servers). I could work around this issue using some shell
  scripting.
 
 There are other issues which are more than annoying and actually make the
  SLES SP2 HA Extension unusable for production systems. E.g. clvmd cannot
  be made less verbose from the cluster configuration. (No
  daemon_options=-d0 does not help!)
 
 Not funny is also the fact that the official SLES 11 SP2 kernels crash
  seriously (when a node rejoins the cluster) when using STCP as recommended
  in the SLES HA documentation and offered via the wizards. It took me a
  while to find out what was going on.
 
 When setting up a system with many (rather simple) resources funny things
  happen due to race conditions all over the place. (can be worked around
  mostly using arbitrary start-delay options.
 
 Oh, did I mention that situations which are actually forbidden by
  constraints (e.g. using a score of INFINITY) actually do happen...
  Depending on the environment this can lead to not so funny effects.
 
 E.g. I defined the following constraints:
 
 colocation c17 inf: p_lsb_ccslogserver p_fs_daten
 order o34 inf: p_fs_daten p_lsb_ccslogserver:start
 
 I can proof from the logs that ccslogserver (an application) got migrated
  from node A to node B while p_fs_daten (a filesystem on top of drbd) was
  definitely still running on node A
 
 Reporting bugs is not possible without a direct support contract. (You must
  enter into a support contract with SuSE before you can even report a bug
  or provide a patch )
 
 Regards
 
 Martin Konold
 (Who used to maintain SuSE Clusters since 2001)
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

Nikita Michalko   
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: crmd: [31942]: WARN: decode_transition_key: Bad UUID (crm-resource-25438) in sscanf result (3) for 0:0:crm-resource-25438

2012-08-17 Thread Nikita Michalko

Hi Lars!

Am Freitag, 17. August 2012 10:36:37 schrieb Lars Marowsky-Bree:
 On 2012-08-17T08:41:15, Nikita Michalko michalko.sys...@a-i-p.com wrote:
  I am also testing  SP2 - and yes, it's true: not yet ready for production
  ;-(
 
 What problems did you find?
 

- e.g. the problem with SLES 11 SP2 kernels crash - the same as described by 
Martin:
 SP2  kernels crash seriously (when a node rejoins the cluster) when using 
STCP as
 recommended in the SLES HA documentation and offered via the wizards.

- and some specific problems with ISP-RAID driver, but those are solved in the 
meantime by reseller 

Regards

Nikita

 Regards,
 Lars
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat question about multiple services

2012-05-09 Thread Nikita Michalko

Am Freitag, 20. April 2012 12:42:16 schrieb sgm:
 Hi,
 I have a question about heartbeat, if I have three services, apache, mysql
  and sendmail,if apache is down, heartbeat will switch all the services to
  the standby server, right?
It's depending on configuration - also possible ...

  If so, how to configure heartbeat to avoid this
  happen?

You can configure your 2 services (mysql and sendmail for example )  with 
colocations  constraints, or as a group - there are many possibilities.
Did you already RTFM (read the f... manuals)?


 Very Appreciated.gm
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

HTH

Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat Failover Configuration Question

2012-04-23 Thread Nikita Michalko

Hi, Net Warrior!


What version of HA/Pacemaker do you use?
Did you already RTFM - e.g. 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained
- or:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch


HTH


Nikita Michalko 

Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior:
 Hi There
 
 I configured heartbeat to failover an IP address  , if I for example
 shutdown one node, the other takes it's ip address, so far so good, now
 my doubt is if there is a way to configure it not to make the failover
 automatically and have someone run the failover manually, can you provide
 any configuration example please? is this stanza the one that does the
 magic?
 
 auto_failback on
 
 
 Thanks for your time and support
 Best regards
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Problem with stonith RA using external/ipmi over lan/lanplus interface

2012-04-17 Thread Nikita Michalko

Glad to hear it - HTH!

Nikita Michalko



Am 16.04.2012 19:10, schrieb Pham, Tom:
 --
 Thanks Nikita,
 The ipmitool work fine now. The password and user I was given from the person 
 who configured the iLO is not correct. I created a new one and it works fine 
 now.

 Message: 4
 Date: Fri, 13 Apr 2012 09:52:06 +0200
 From: Nikita Michalkomichalko.sys...@a-i-p.com
 Subject: Re: [Linux-HA] Problem with stonith RA using external/ipmi
   over lanor lanplus interface
 To: General Linux-HA mailing listlinux-ha@lists.linux-ha.org
 Message-ID:201204130952.06313.michalko.sys...@a-i-p.com
 Content-Type: Text/Plain;  charset=iso-8859-1

 Hi,


 When I tried ipmitool -I lan -U root -H ip -a chassis power cycle. It did not
 work but it worked with -I open interface.
 -  it can't work without IP-adr. (ip=?)

 You must first configure the ipmi( iLO 3) card with ipmitool, e.g.:
 ipmitool lan print
 ipmitool lan set 1 ipaddr 10.10.40.48
 ipmitool lan set 1 netmask 255.255.255.0
 ipmitool lan set 1 defgw ipaddr  10.10.40.1
 ipmitool lan set 1 auth ADMIN PASSWORD
 ipmitool lan set 1 user

 ... etc-  man ipmitool  is your friend ;-)

 What should I do to enable lan/lanplus on SUSE 11?


 Nikita Michalko




 --


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 End of Linux-HA Digest, Vol 101, Issue 13
 *
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Problem with stonith RA using external/ipmi over lan or lanplus interface

2012-04-13 Thread Nikita Michalko

Hi,


 When I tried ipmitool -I lan -U root -H ip -a chassis power cycle. It did not 
work but it worked with -I open interface.
- it can't work without IP-adr. (ip=?)

You must first configure the ipmi( iLO 3) card with ipmitool, e.g.:
ipmitool lan print
ipmitool lan set 1 ipaddr 10.10.40.48
ipmitool lan set 1 netmask 255.255.255.0
ipmitool lan set 1 defgw ipaddr  10.10.40.1
ipmitool lan set 1 auth ADMIN PASSWORD
ipmitool lan set 1 user

... etc- man ipmitool  is your friend ;-)

 What should I do to enable lan/lanplus on SUSE 11?



Nikita Michalko


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Problem with stonith RA using external/ipmi over lan or lanplus interface

2012-04-12 Thread Nikita Michalko

Hi,

did you properly configure BOTH ipmi with ipmitool? And ipmi started?
/etc/init.d/ipmi start
What says the command:
ipmitool -I lan -H IP_OF_OTHER_NODE  -U SOMEUSER -A MD5 -P SOMEPASSWORD power 
status


HTH

Nikita Michalko 


Am Mittwoch, 11. April 2012 23:00:41 schrieb Pham, Tom:
 Hi everyone,
 
 I try to test a 2 nodes cluster with stonith resource using external/ipmi (
  I tried external/riloe first but it does not seem to work) My system has
  HP Proliant BL460c G7 with iLO 3 card Firmware 1.15
 SUSE 11
 Corosync version 1.2.7 ; Pacmaker 1.0.9
 
 When I use the interface lan or lanplus, It failed to start the stonith
  resource. I get the error below external/ipmi[12173]: [12184]: ERROR:
  error executing ipmitool: Error: Unable to establish IPMI v2 / RMCP+
  session  Unable to get Chassis Power Status
 
 However, when I used the interface = open instead lan/lanplus ,it started
  the stonith resource fine. When I tried to kill -9 corosync in node1, I
  expect it will reboot node1 and started all resource on node2. But it
  reboot node1. Someone mentioned that open interface is a local interface
  and only allows to fence itself.
 
 Anyone knows why the lan/lanplus does not work?
 
 Thanks
 
 Tom Pham
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] compiling cluster glue in solaris 10 getting error

2011-12-26 Thread Nikita Michalko

Hmmm, it seems you don't know nothing about Christmas - no wonder ...
I don't know nothing about solaris 10, but obviously  are you missing some
packages for compiling:
- maybe libbz2 + libbz2-devel ?

HTH

Nikita Michalko


Am 26.12.2011 11:57, schrieb salim ep:
 No Response

 hmmm its seems to make me nervous on Linux HA service . :( :((

 On Sun, Dec 25, 2011 at 5:54 PM, salim epeepeesali...@gmail.com  wrote:

 Dear Team,

 am facing an error while compiling an
 /Reusable-Cluster-Components-glue--ebb567a5b758 pakage in solaris 10 X86
   for Linux HA  Service

 Pakage:Heartbeat-3-0-STABLE-3.0.2
 Pakage:Reusable-Cluster-Components-glue--ebb567a5b758



 checking if libnet is required... yes
 checking for libnet-config... no
 checking for t_open in -lnsl... yes
 checking for socket in -lsocket... (cached) yes
 checking for libnet_get_hwaddr in -lnet... yes
 checking for libnet... found libnet1.1
 checking for libnet_init in -lnet... yes
 checking for netinet/icmp6.h... no
 checking for ucd-snmp/snmp.h... no
 checking net-snmp/net-snmp-config.h usability... yes
 checking net-snmp/net-snmp-config.h presence... yes
 checking for net-snmp/net-snmp-config.h... yes
 checking for net-snmp-config... /usr/sfw/bin//net-snmp-config
 checking for special snmp libraries... -R../lib -L/usr/sfw/lib -lnetsnmp
 -lgen -lpkcs11 -lkstat -lelf -lm -ldl -lnsl -lsocket -ladm
 checking For libOpenIPMI version 1.4 or greater... no
 checking curl/curl.h usability... no
 checking curl/curl.h presence... no
 checking for curl/curl.h... no
 checking openhpi/SaHpi.h usability... no
 checking openhpi/SaHpi.h presence... no
 checking for openhpi/SaHpi.h... no
 checking vacmclient_api.h usability... no
 checking vacmclient_api.h presence... no
 checking for vacmclient_api.h... no
 checking bzlib.h usability... yes
 checking bzlib.h presence... yes
 checking for bzlib.h... yes
 checking for BZ2_bzBuffToBuffCompress in -lbz2... yes
 ./configure: line 28817: syntax error near unexpected token `DBUS,'
 ./configure: line 28817: `   PKG_CHECK_MODULES(DBUS, dbus-1, dbus-glib-1)'
 bash-3.00# pwd
 /export/home/salim/Reusable-Cluster-Components-glue--ebb567a5b758

 kindly advice me about this error this is making me sleepless night :(



 --
 ‘Winner make things happen, Lossers let things happen’




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] UNKNOWN version of resource-agents: 2 questions

2011-10-21 Thread Nikita Michalko

That is SLES11/SP1 (64bit of course)

which is  in the package util-linux-2.16-6.8.2.


Nikita Michalko 


Am Freitag, 21. Oktober 2011 13:34:06 schrieb Dejan Muhamedagic:
 On Thu, Oct 20, 2011 at 09:17:55AM +0200, Nikita Michalko wrote:
  Great - thank you, Dejan!
 
 Which platform was that? Maybe we need to fix the which
 requirement.
 
 Cheers,
 
 Dejan
 
  Nikita Michalko
 
  Am Mittwoch, 19. Oktober 2011 17:58:39 schrieb Dejan Muhamedagic:
   Hi,
  
   On Wed, Oct 19, 2011 at 01:56:39PM +0200, Nikita Michalko wrote:
Hi all,
   
I've just succesfully compiled the
ClusterLabs-resource-agents-v3.9.2-65- g46c6990(.zip) from:
https://github.com/ClusterLabs/resource-agents, configured with:
./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc
--enable- fatal-warnings=no
  
   You should add: --with-ras-set=linux-ha
  
After make I've tried make rpm and now I'm facing to the
following errors: ... snip ...
   
gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-
agents-46c6990/ldirectord'
/opt/HA/sourc/ClusterLabs-resource-agents-46c6990/doc
gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource-
agents-46c6990/doc'
gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-
agents-46c6990/doc'
gmake  \
  top_distdir=resource-agents-UNKNOWN
distdir=resource-agents- UNKNOWN \
  dist-hook
gmake[3]: Entering directory
`/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' if test -d .git;
then \ LC_ALL=C
./make/gitlog-to-changelog \ --since=2000-01-01 
resource-agents-UNKNOWN/cl-t; \
rm -f resource-agents-UNKNOWN/ChangeLog.devel;
\
mv resource-agents-UNKNOWN/cl-t resource-agents-
UNKNOWN/ChangeLog.devel;\
fi
echo UNKNOWN  resource-agents-UNKNOWN/.tarball-version
rm -f resource-agents-UNKNOWN/resource-agents.spec  \
cp ./resource-agents.spec
resource-agents-UNKNOWN/resource- agents.spec
gmake[3]: Leaving directory
`/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' find
resource-agents-UNKNOWN -type d ! -perm -777 -exec chmod a+rwx {} \;
-o \ ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \ !
-type d ! -perm -400 -exec chmod a+r {} \; -o \
  ! -type d ! -perm -444 -exec /bin/sh
/opt/HA/sourc/ClusterLabs- resource-agents-46c6990/install-sh -c -m
a+r {} {} \; \
   
|| chmod -R a+r resource-agents-UNKNOWN
   
tardir=resource-agents-UNKNOWN  /bin/sh
/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar
chof - $tardir | GZIP=--best gzip -c
   
resource-agents-UNKNOWN.tar.gz
   
tardir=resource-agents-UNKNOWN  /bin/sh
/opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar
chof - $tardir | bzip2 -9 -c resource- agents-UNKNOWN.tar.bz2
{ test ! -d resource-agents-UNKNOWN || { find resource-agents-UNKNOWN
-type d ! -perm -200 -exec chmod u+w {} ';'  rm -fr
resource-agents-UNKNOWN; }; } gmake[2]: Leaving directory
`/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' gmake[1]:
»resource-agents-UNKNOWN.tar.gz« ist bereits aktualisiert. gmake[1]:
Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990'
rpmbuild --define _sourcedir
/opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _specdir
/opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define
_builddir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990
--define _srcrpmdir
/opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _rpmdir
/opt/HA/sourc/ClusterLabs-resource-agents-46c6990 -ba
resource-agents.spec error: Failed build dependencies:
which is needed by resource-agents-UNKNOWN-1.x86_64
make: *** [rpm] Fehler 1
  
   I guess that this distribution doesn't have the which package.
   So, just remove it from the list of requirements.
  
Q.: 1. why is this  version UNKNOWN?
  
   Perhaps with-ras-set is going to fix that.
  
Q.2.: what is  needed yet by resource-agents-UNKNOWN for  succesfull
build of an RPM?
  
   Did the above help?
  
   Thanks,
  
   Dejan
  
TIA!
   
   
Nikita Michalko
   
   
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
  
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] UNKNOWN version of resource-agents: 2 questions

2011-10-20 Thread Nikita Michalko

Great - thank you, Dejan!


Nikita Michalko   

Am Mittwoch, 19. Oktober 2011 17:58:39 schrieb Dejan Muhamedagic:
 Hi,
 
 On Wed, Oct 19, 2011 at 01:56:39PM +0200, Nikita Michalko wrote:
  Hi all,
 
  I've just succesfully compiled the ClusterLabs-resource-agents-v3.9.2-65-
  g46c6990(.zip) from: https://github.com/ClusterLabs/resource-agents,
  configured with:
  ./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc
  --enable- fatal-warnings=no
 
 You should add: --with-ras-set=linux-ha
 
  After make I've tried make rpm and now I'm facing to the following
  errors: ... snip ...
 
  gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-
  agents-46c6990/ldirectord'
  /opt/HA/sourc/ClusterLabs-resource-agents-46c6990/doc
  gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource-
  agents-46c6990/doc'
  gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-
  agents-46c6990/doc'
  gmake  \
top_distdir=resource-agents-UNKNOWN distdir=resource-agents-
  UNKNOWN \
dist-hook
  gmake[3]: Entering directory
  `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' if test -d .git; then
\ LC_ALL=C
  ./make/gitlog-to-changelog \ --since=2000-01-01 
  resource-agents-UNKNOWN/cl-t; \
  rm -f resource-agents-UNKNOWN/ChangeLog.devel;
  \
  mv resource-agents-UNKNOWN/cl-t resource-agents-
  UNKNOWN/ChangeLog.devel;\
  fi
  echo UNKNOWN  resource-agents-UNKNOWN/.tarball-version
  rm -f resource-agents-UNKNOWN/resource-agents.spec  \
  cp ./resource-agents.spec
  resource-agents-UNKNOWN/resource- agents.spec
  gmake[3]: Leaving directory
  `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' find
  resource-agents-UNKNOWN -type d ! -perm -777 -exec chmod a+rwx {} \; -o \
  ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \ ! -type d !
  -perm -400 -exec chmod a+r {} \; -o \
! -type d ! -perm -444 -exec /bin/sh /opt/HA/sourc/ClusterLabs-
  resource-agents-46c6990/install-sh -c -m a+r {} {} \; \
 
  || chmod -R a+r resource-agents-UNKNOWN
 
  tardir=resource-agents-UNKNOWN  /bin/sh
  /opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar chof
  - $tardir | GZIP=--best gzip -c
 
  resource-agents-UNKNOWN.tar.gz
 
  tardir=resource-agents-UNKNOWN  /bin/sh
  /opt/HA/sourc/ClusterLabs-resource- agents-46c6990/missing --run tar chof
  - $tardir | bzip2 -9 -c resource- agents-UNKNOWN.tar.bz2
  { test ! -d resource-agents-UNKNOWN || { find resource-agents-UNKNOWN
  -type d ! -perm -200 -exec chmod u+w {} ';'  rm -fr
  resource-agents-UNKNOWN; }; } gmake[2]: Leaving directory
  `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990' gmake[1]:
  »resource-agents-UNKNOWN.tar.gz« ist bereits aktualisiert. gmake[1]:
  Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990'
  rpmbuild --define _sourcedir
  /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _specdir
  /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _builddir
  /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _srcrpmdir
  /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define _rpmdir
  /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 -ba
  resource-agents.spec error: Failed build dependencies:
  which is needed by resource-agents-UNKNOWN-1.x86_64
  make: *** [rpm] Fehler 1
 
 I guess that this distribution doesn't have the which package.
 So, just remove it from the list of requirements.
 
  Q.: 1. why is this  version UNKNOWN?
 
 Perhaps with-ras-set is going to fix that.
 
  Q.2.: what is  needed yet by resource-agents-UNKNOWN for  succesfull
  build of an RPM?
 
 Did the above help?
 
 Thanks,
 
 Dejan
 
  TIA!
 
 
  Nikita Michalko
 
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] UNKNOWN version of resource-agents: 2 questions

2011-10-19 Thread Nikita Michalko

Hi all,

I've just succesfully compiled the ClusterLabs-resource-agents-v3.9.2-65-
g46c6990(.zip) from: https://github.com/ClusterLabs/resource-agents, configured 
with:
./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc --enable-
fatal-warnings=no
After make I've tried make rpm and now I'm facing to the following errors:
... snip ...

gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-
agents-46c6990/ldirectord'
/opt/HA/sourc/ClusterLabs-resource-agents-46c6990/doc
gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource-
agents-46c6990/doc'
gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-
agents-46c6990/doc'
gmake  \
  top_distdir=resource-agents-UNKNOWN distdir=resource-agents-
UNKNOWN \
  dist-hook
gmake[3]: Entering directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990'
if test -d .git; then   \
LC_ALL=C ./make/gitlog-to-changelog \
--since=2000-01-01  resource-agents-UNKNOWN/cl-t;  
\
rm -f resource-agents-UNKNOWN/ChangeLog.devel;  
\
mv resource-agents-UNKNOWN/cl-t resource-agents-
UNKNOWN/ChangeLog.devel;\
fi
echo UNKNOWN  resource-agents-UNKNOWN/.tarball-version
rm -f resource-agents-UNKNOWN/resource-agents.spec  \
cp ./resource-agents.spec resource-agents-UNKNOWN/resource-
agents.spec
gmake[3]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990'
find resource-agents-UNKNOWN -type d ! -perm -777 -exec chmod a+rwx {} \; -o \
  ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \
  ! -type d ! -perm -400 -exec chmod a+r {} \; -o \
  ! -type d ! -perm -444 -exec /bin/sh /opt/HA/sourc/ClusterLabs-
resource-agents-46c6990/install-sh -c -m a+r {} {} \; \
|| chmod -R a+r resource-agents-UNKNOWN
tardir=resource-agents-UNKNOWN  /bin/sh /opt/HA/sourc/ClusterLabs-resource-
agents-46c6990/missing --run tar chof - $tardir | GZIP=--best gzip -c 
resource-agents-UNKNOWN.tar.gz
tardir=resource-agents-UNKNOWN  /bin/sh /opt/HA/sourc/ClusterLabs-resource-
agents-46c6990/missing --run tar chof - $tardir | bzip2 -9 -c resource-
agents-UNKNOWN.tar.bz2
{ test ! -d resource-agents-UNKNOWN || { find resource-agents-UNKNOWN -type d ! 
-perm -200 -exec chmod u+w {} ';'  rm -fr resource-agents-UNKNOWN; }; }
gmake[2]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990'
gmake[1]: »resource-agents-UNKNOWN.tar.gz« ist bereits aktualisiert.
gmake[1]: Leaving directory `/opt/HA/sourc/ClusterLabs-resource-agents-46c6990'
rpmbuild --define _sourcedir 
/opt/HA/sourc/ClusterLabs-resource-agents-46c6990 
--define _specdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define 
_builddir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define 
_srcrpmdir /opt/HA/sourc/ClusterLabs-resource-agents-46c6990 --define 
_rpmdir 
/opt/HA/sourc/ClusterLabs-resource-agents-46c6990 -ba resource-agents.spec
error: Failed build dependencies:
which is needed by resource-agents-UNKNOWN-1.x86_64
make: *** [rpm] Fehler 1

Q.: 1. why is this  version UNKNOWN?
Q.2.: what is  needed yet by resource-agents-UNKNOWN for  succesfull build of 
an 
RPM?
 
TIA!


Nikita Michalko 


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Why 'crm resource cleanup' cannot work

2011-09-01 Thread Nikita Michalko

Hi Robin,

I had a similar problem in the past with the old version. You may  want send 
your  configuration and logs?


Regards

Nikita Michalko


Am Mittwoch, 31. August 2011 13:22:36 schrieb robin:
 If they are official stable release, we can consider the upgrade.
 But does the upgrade can resolve my issue?
 
 
 Regards,
 -Robin
 
 At 2011-08-31 17:56:23,Nikita Michalko michalko.sys...@a-i-p.com wrote:
 Am Mittwoch, 31. August 2011 11:23:46 schrieb robin:
  Append the version
 
  [root@master ~]# rpm -qa|grep heartbeat
  heartbeat-3.0.3-2.3.el5
  heartbeat-libs-3.0.3-2.3.el5
  [root@master ~]# rpm -qa|grep pacemaker
  pacemaker-libs-1.0.9.1-1.15.el5
  pacemaker-1.0.9.1-1.15.el5
 
 
  Regards,
  -Robin
 
  At 2011-08-31 16:21:17,Nikita Michalko michalko.sys...@a-i-p.com 
wrote:
  Am Mittwoch, 31. August 2011 09:09:52 schrieb robin:
 Hi All,
  
I've a small cluster using heartbeat+pacemaker, and now I encounter
   a problem as below: When some resource failed, the Failed actions
   will always be shown in 'crm status' even if I fixed the issue and
   run crm resource cleanup linkmon installer-11-00
  
   ++
   Failed actions:
   linkmon_monitor_0 (node=installer-11-00, call=2, rc=1,
status=complete): unknown error linkmon_start_0
   (node=installer-11-00, call=13, rc=1, status=complete): unknown error
   +++
  
   The only way I can do is to restart heartbeat service, but it's not
   the way I wanted. Could anyone have any idea about it?
  
   Regards,
   -Robin
  
  
  
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
  
  - Versions ?
  
  
  Regards
  
  Nikita Michalko
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 Any chance to update to the newest versions - at least pacemaker-1.1.5 and
 heartbeat-3.0.5 ?
 
 
 Regards
 
 Nikita Michalko
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Why 'crm resource cleanup' cannot work

2011-08-31 Thread Nikita Michalko

Am Mittwoch, 31. August 2011 09:09:52 schrieb robin:
   Hi All,
 
  I've a small cluster using heartbeat+pacemaker, and now I encounter a
  problem as below: When some resource failed, the Failed actions will
  always be shown in 'crm status' even if I fixed the issue and run crm
  resource cleanup linkmon installer-11-00
 
 ++
 Failed actions:
 linkmon_monitor_0 (node=installer-11-00, call=2, rc=1,
  status=complete): unknown error linkmon_start_0 (node=installer-11-00,
  call=13, rc=1, status=complete): unknown error +++
 
 The only way I can do is to restart heartbeat service, but it's not the way
  I wanted. Could anyone have any idea about it?
 
 Regards,
 -Robin
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

- Versions ?


Regards

Nikita Michalko
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] help is needed as the stonith_host directive is not happending!

2011-07-25 Thread Nikita Michalko

Am Freitag 22 Juli 2011 18:01:03 schrieb Avestan:
 Hello,
 
 Sorry for the lack of information. You guys are so good that sometimes I
 think you have a crystal-ball. ;o)
 
 As the following shows, I am running Heartbeat and STONITH version 2.0.8
 release 1 on Fedora 7.
 

- Ooops: V.2.0.8 !?  That was very buggy! Any chance to upgrade to V.3.x ?


Nikita Michalko


 [root@shemshak~]# rpm -qa | grep -i heartbeat
 heartbeat-2.0.8-1.fc7
 [root@shemshak~]# rpm -qa | grep -i stonith
 stonith-2.0.8-1.fc7
 
 This is an old system which I built over 2 years ago and still runs like a
 clock. I have recently added two STONITH Devices (APC9225 MasterSwithch
  Plus with APC9617 Network Management card)
 
 and here is the heartbeat configuration file /etc/ha.d/ha.cf:
 
 # Heartbeat logging configuration
 debugfile   /var/log/ha-debug
 logfile /var/log/ha-log
 logfacility local0
 
 # Heartbeat cluster members
 node shemshak
 node dizin
 
 # Heartbeat communication timing
 keepalive 2
 deadtime 32
 initdead 64
 
 # Heartbeat communication paths
 udpport 694
 bcast  eth1
 #ucast eth1 192.168.1.21
 #ucast eth1 192.168.1.22
 #ucast eth0 192.168.1.81
 #ucast eth0 192.168.1.82
 baud 19200
 serial /dev/ttyS0
 
 # Don't fail back automatically - on/off
 auto_failback on
 
 # Monitoring of network connection to default gateway
 ping 192.168.1.1
 
 #respawn hacluster /usr/lib64/heartbeat/ipfail
 
 #STONITH
 stonith_host Testing apcmaster 192.168.1.56 apc apc
 
 Here is also my log file /var/log/ha-log after stopping the heartbeat on
 the primary host by issuing service heartbeat stop command at
 2011/07/22_08:30:48
 
 [root@shemshak ~]# tail -f /var/log/ha-log
 heartbeat[4741]: 2011/07/21_18:36:04 info: Current arena value: 0
 heartbeat[4741]: 2011/07/21_18:36:04 info: MSG stats: 0/190108 ms age 10
 [pid4749/HBWRITE]
 heartbeat[4741]: 2011/07/21_18:36:04 info: ha_malloc stats: 379/5069800
 38076/18447 [pid4749/HBWRITE]
 heartbeat[4741]: 2011/07/21_18:36:04 info: RealMalloc stats: 50112 total
 malloc bytes. pid [4749/HBWRITE]
 heartbeat[4741]: 2011/07/21_18:36:04 info: Current arena value: 0
 heartbeat[4741]: 2011/07/21_18:36:04 info: MSG stats: 0/86408 ms age 20
 [pid4750/HBREAD]
 heartbeat[4741]: 2011/07/21_18:36:04 info: ha_malloc stats: 380/1815007
 38160/18491 [pid4750/HBREAD]
 heartbeat[4741]: 2011/07/21_18:36:04 info: RealMalloc stats: 39660 total
 malloc bytes. pid [4750/HBREAD]
 heartbeat[4741]: 2011/07/21_18:36:04 info: Current arena value: 0
 heartbeat[4741]: 2011/07/21_18:36:04 info: These are nothing to worry
  about.
 
 heartbeat[4741]: 2011/07/22_08:30:48 info: Heartbeat shutdown in progress.
 (4741)
 heartbeat[17136]: 2011/07/22_08:30:48 info: Giving up all HA resources.
 ResourceManager[17146]: 2011/07/22_08:30:48 info: Releasing resource group:
 shemshak 192.168.1.8/24/eth0
 ResourceManager[17146]: 2011/07/22_08:30:48 info: Running
 /etc/ha.d/resource.d/IPaddr 192.168.1.8/24/eth0 stop
 IPaddr[17204]:  2011/07/22_08:30:48 INFO: /sbin/ifconfig eth0:0 192.168.1.8
 down
 IPaddr[17183]:  2011/07/22_08:30:48 INFO:  Success
 heartbeat[17136]: 2011/07/22_08:30:48 info: All HA resources relinquished.
 heartbeat[4741]: 2011/07/22_08:30:49 WARN: 1 lost packet(s) for [dizin]
 [134127:134129]
 heartbeat[4741]: 2011/07/22_08:30:49 info: No pkts missing from dizin!
 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBFIFO process 4744 with
 signal 15
 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBWRITE process 4745
  with signal 15
 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBREAD process 4746 with
 signal 15
 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBWRITE process 4747
  with signal 15
 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBREAD process 4748 with
 signal 15
 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBWRITE process 4749
  with signal 15
 heartbeat[4741]: 2011/07/22_08:30:50 info: killing HBREAD process 4750 with
 signal 15
 heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4749 exited. 7
 remaining
 heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4747 exited. 6
 remaining
 heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4746 exited. 5
 remaining
 heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4745 exited. 4
 remaining
 heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4744 exited. 3
 remaining
 heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4750 exited. 2
 remaining
 heartbeat[4741]: 2011/07/22_08:30:50 info: Core process 4748 exited. 1
 remaining

 heartbeat[4741]: 2011/07/22_08:30:51 info: shemshak Heartbeat shutdown
 complete.
 
 when I check the log file I don't see the directive stonith_host taking
 place!
 
 I know the STONITH demean and the device working as I am able to control
  the device by directly issuing the STONITH commands such as:
 
 stonith -t apcmaster -p 192.168.1.56 apc apc -T off Testing
 stonith -t apcmaster -p 192.168.1.56 apc apc -T on Testing
 
 Thank you for your help.
 
 Avestan

Re: [Linux-HA] help is needed as the stonith_host directive is not happending!

2011-07-21 Thread Nikita Michalko

Hi Avestan,

do you use really V1/haresource? What version of HA? config?
We have no crystall ball anymore ;-)


Nikita Michalko


Am Mittwoch 20 Juli 2011 18:08:56 schrieb Avestan:
 Hello everyone,
 
 I am trying to add a STONITH device into my Linux-HA. I have added the
 stonith_host directive into the configuration file ha.cf as follow:
 
 #stonith_host lashgarak apcmaster 192.168.1.55 apc apc
 #stonith_host dizin apcmaster 192.168.1.56 apc apc
 
 The format of the command that I am using is:
 
 stonith_host {host_name} {stonith_type} {ipadress_stonith} {user}
  {password}
 
 When I shutdown the heartbeat on the primary host, nothing happen. I have
 checked the log files both /etc/log/ha-log and /etc/log/messages and I
 don't see anything in regard with the stonith directive.
 
 I should also mention that the resources which are placed in the
 haresource file are moved from the primary host lashgarak to the
 secondary host dizin with no issue. Currently the only resource that I
 have in the haresource file is the floating IP address.
 
 Thanks,
 
 Avestan
 

-- 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] cluster-glue make error

2011-06-17 Thread Nikita Michalko

Hi all,

I've downloaded the last tarball from http://hg.linux-
ha.org/glue/archive/tip.tar.bz2, configured with:

./configure --prefix=$PREFIX --localstatedir=/var --sysconfdir=/etc --with-
heartbeat --with-stonith --with-pacemaker --with-daemon-user=$CLUSTER_USER --
with-daemon-group=$CLUSTER_GROUP

and now by  make I've got the following error:
... snip ...
libtool: link: ( cd .libs  rm -f libstonith.la  ln -s 
../libstonith.la libstonith.la )
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../include -I../../include -
I../../include -I../../linux-ha -I../../linux-ha -I../../libltdl -
I../../libltdl  -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -
I/usr/include/libxml2  -g -O2 -ggdb3 -O0  -fgnu89-inline -fstack-protector-all 
-Wall -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align -
Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-
security -Wformat-nonliteral -Winline -Wmissing-prototypes -Wmissing-
declarations -Wmissing-format-attribute -Wnested-externs -Wno-long-long -Wno-
strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings -ansi -
D_GNU_SOURCE -DANSI_ONLY -Werror -MT main.o -MD -MP -MF .deps/main.Tpo -c -o 
main.o main.c
cc1: warnings being treated as errors
main.c:408: Fehler: kein vorheriger Prototyp für »setup_cl_log«
gmake[2]: *** [main.o] Fehler 1
gmake[2]: Leaving directory `/root/neueRPMs/ha/sources/Reusable-Cluster-
Components-glue--0ff4e044f1be/lib/stonith'
gmake[1]: *** [all-recursive] Fehler 1
gmake[1]: Leaving directory `/root/neueRPMs/ha/sources/Reusable-Cluster-
Components-glue--0ff4e044f1be/lib'
make: *** [all-recursive] Fehler 1

OS: SLES11/SP1
cluster-glue version: 1.0.7 (Build: 0ff4e044f1be0138e8273a98c9fbee95b643bcf7)

What I'm missing?


TIA!

Nikita Michalko

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Colocation of VIP and httpd

2011-05-24 Thread Nikita Michalko

Hi,

any chance to update to version 3?
2.1.3 is really very old  buggy!


HTH


Nikita Michalko


Am Donnerstag 19 Mai 2011 19:25:54 schrieb 吴鸿宇:
 Hi All,
 
 I have a 2 node cluster. My intention is ensuring the VIP is always on the
 node that has httpd running, i.e. if service httpd on the VIP node is
 stopped and fails to start, the VIP should switch to the other node.
 
 With the configuration below, I observed that when httpd stops and fails to
 start, the VIP is stopped also but is not switched to the other node that
 has healthy httpd. I appreciate any ideas.
 
  cib generated=false admin_epoch=0 have_quorum=true
 ignore_dtd=false num_peers=0 cib_feature_revision=2.0 epoch=28
 num_updates=1 cib-last-written=Thu May 19 08:48:49 2011
 ccm_transition=1
configuration
  crm_config
cluster_property_set id=cib-bootstrap-options
  attributes
nvpair id=cib-bootstrap-options-dc-version name=dc-version
 value=2.1.3-node: */
nvpair id=cib-bootstrap-options-cluster-delay
 name=cluster-delay value=60s/
nvpair id=cib-bootstrap-options-default-resource-stickiness
 name=default-resource-stickiness value=INFINITY/
  /attributes
/cluster_property_set
  /crm_config
  nodes
node id=* uname=node1
 type=normal/
node id=* uname=node2
 type=normal/
  /nodes
  resources
primitive id=vip class=ocf type=IPaddr provider=heartbeat
  operations
op id=vip-check name=monitor interval=3s/
  /operations
  instance_attributes id=*
attributes
  nvpair id=IP1_attr_0 name=ip value=*/
  nvpair id=IP1_attr_1 name=netmask value=19/
  nvpair id=IP1_attr_2 name=nic value=eth0/
/attributes
  /instance_attributes
/primitive
clone id=httpd_clone
  instance_attributes id=7f9ba44b-5157-414d-bf12-cb94cd6bb043
attributes
  nvpair id=httpd-unique name=globally-unique
 value=false/
/attributes
  /instance_attributes
  primitive id=httpd class=lsb type=httpd
operations
  op id=httpd_mon name=status interval=2s timeout=30s/
/operations
  /primitive
/clone
  /resources
  constraints
rsc_colocation id=httpd_on_vip to=httpd_clone from=vip
 score=INFINITY/
rsc_order id=order from=vip to=httpd_clone/
  /constraints
/configuration
  /cib
 
 Thanks a lot,
 Henry
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] serial or bcast

2011-05-23 Thread Nikita Michalko

Hi,


Am Dienstag 24 Mai 2011 02:05:52 schrieb Hai Tao:
 can someone answer this:
 
 if I use ucast, the following ip should be the ip of its local interface or
  of the peer's ip?
 
 
 for example,
 
 ucast eth0 10.0.0.5 (this IP is the local IP or the peer's IP?)

- the  peer's IP!


HTH


Nikita Michalko

 
 Thanks.
 
  From: taoh...@hotmail.com
  To: linux-ha@lists.linux-ha.org
  Date: Mon, 23 May 2011 10:51:06 -0700
  Subject: Re: [Linux-HA] serial or bcast
 
 
  if I use ucast, the following ip should be the ip of its local interface
  or of the peer's ip?
 
 
  Thanks.
 
  Hai Tao
 
   Date: Sun, 22 May 2011 19:17:08 +0200
   From: lars.ellenb...@linbit.com
   To: linux-ha@lists.linux-ha.org
   Subject: Re: [Linux-HA] serial or bcast
  
   On Sun, May 22, 2011 at 12:54:09AM -0700, Hai Tao wrote:
If serial and bcast (or ucast) coexist in the ha.cf file, which
device the heartbeat actually use?
  
   Heartbeat always sends all communication down all available paths.
   Communication paths may fail independently,
   and typically will be recovered as soon as they work again.
  
   Cluster communication is lost only if all paths fail,
   and no working path or communication channel remains.
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] attribute migration-threshold does not exist

2011-01-21 Thread Nikita Michalko

Hi all!

I'm trying to configure a new 2 node cluster with crm and facing with the 
following errors:

crm(bblu)configure# verify
crm_verify[8202]: 2011/01/20_14:49:07 WARN: unpack_nodes: Blind faith: not 
fencing unseen nodes
ERROR: cib-bootstrap-options: attribute migration-threshold does not exist

Configuration:
crm(bblu)configure# show
primitive IPaddr_192_168_150_54 ocf:heartbeat:IPaddr \
op monitor interval=60s timeout=60s \
params ip=192.168.150.54 cidr_netmask=24 
broadcast=192.168.150.63
primitive IPaddr_193_xx_xx_xx ocf:heartbeat:IPaddr \
op monitor interval=60s timeout=60s \
params ip=193.xx_xx_xx cidr_netmask=26 broadcast=193.xx.xx.63
group group_1 IPaddr_193_xx_xx_xx IPaddr_192_168_150_54
location rsc_location_group_1 group_1 \
rule $id=prefered_location_group_1 1: #uname eq bluedam
property $id=cib-bootstrap-options \
stonith-enabled=false \
symmetric-cluster=true \
no-quorum-policy=ignore \
migration-threshold=3 \
stonith-action=reboot \
startup-fencing=false \
stop-orphan-resources=true \
stop-orphan-actions=true \
remove-after-stop=false \
default-action-timeout=110s \
is-managed-default=true \
cluster-delay=60s \
pe-error-series-max=-1 \
pe-warn-series-max=-1 \
pe-input-series-max=-1 \
cluster-infrastructure=Heartbeat 

Versions:
heartbeat-3.0.3
cluster-glue-1.0.1
resource-agents-1.0.3
pacemaker-1.0.10
SLES11/SP1

Is the attribute migration-threshold no more available?

TIA

Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] multipath'ing on debian

2010-12-06 Thread Nikita Michalko

Hi Oliver,

are you sure you are on the right list: [Linux-HA]? 

Cheers!

Nikita Michalko 


Am Montag, 6. Dezember 2010 11:31 schrieb Linux Cook:
 Hi mga pluggers,

 I've just configured multipathing on my debian boxes (Server A and Server
 B) using HP StorageWorks with Dual FCs on each server and can now mount the
 path alias I defined on my multipath configuration. But everytime I write a
 data on Server A, the data are not reflecting on Server B.

 Any help?

 Oliver Cook
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Resource appears to be active on two nodes

2010-12-02 Thread Nikita Michalko

Hi,

Am Donnerstag, 2. Dezember 2010 10:30 schrieb bharat khandelwal:
 Preeti Jain Preeti_8644 at yahoo.com writes:
  Andrew Beekhof andrew at beekhof.net writes:
but still which version i should move, can i do it with
heartbeat 2.1.4 without pacemaker
  
   no, 2.1.4 was never supported

 which version to use now if it 2.1.4 does not support and how to get
 pacemaker but that version should support suse 10 (x86_64)

- we use HA V.2.1.4 on SLES 10 - SP3. Itt's fairly old but for us still 
working ... As posted by Lars Marowsky-Bree:
The 2.1.4 release is for those users who, for some reason, cannot yet
upgrade to the Pacemaker version. It's intended to be the last final
release in the 2.1.4 branch.Please use Pacemaker, if you can.
So google after Announcing: heartbeat 2.1.4 a little...

On SLES11 we use Heartbeat 3.0.2 + Pacemaker - you need to compile the sources 
from repository though:
http://clusterlabs.org/rpm/opensuse-11.1/clusterlabs.repo


HTH

Nikita Michalko

 thanks
 bharat






 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Resource appears to be active on two nodes

2010-12-02 Thread Nikita Michalko


Hi,

Am Donnerstag, 2. Dezember 2010 12:23 schrieb Preeti Jain:
 Nikita Michalko michalko.system at a-i-p.com writes:
   which version to use now if it 2.1.4 does not support and how to get
   pacemaker but that version should support suse 10 (x86_64)
 
  - we use HA V.2.1.4 on SLES 10 - SP3. Itt's fairly old but for us still
  working ... As posted by Lars Marowsky-Bree:
  The 2.1.4 release is for those users who, for some reason, cannot yet
  upgrade to the Pacemaker version. It's intended to be the last final
  release in the 2.1.4 branch.Please use Pacemaker, if you can.
  So google after Announcing: heartbeat 2.1.4 a little...
 
  On SLES11 we use Heartbeat 3.0.2 + Pacemaker - you need to compile the
  sources from repository though:
  http://clusterlabs.org/rpm/opensuse-11.1/clusterlabs.repo
 
  HTH
 
  Nikita Michalko

 So that means no option on SLES 10
 but there i saw many versions which one to use

 is there any pacemaker version available with SLES 10 ?...

- only HA V.2.0.7: but this was very buggy  old !
Why do you stick on that (also very old) SLES10? Any chance to upgrade at 
least to SP3?

Nikita Michalko


 Preeti Jain

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to monitor the nic link status

2010-12-01 Thread Nikita Michalko

Hi Pavlos!

Am Dienstag, 30. November 2010 22:28 schrieb Pavlos Parissis:
 Hi Nikita,

 On 30 November 2010 08:42, Nikita Michalko michalko.sys...@a-i-p.com 
wrote:
  Hi Pavlos,
 
  Am Dienstag, 30. November 2010 05:59 schrieb Pavlos Parissis:
  On 29 November 2010 23:43, Lars Ellenberg lars.ellenb...@linbit.com 
wrote:
   On Mon, Nov 29, 2010 at 10:24:17PM +0800, Mia Lueng wrote:
   Hi:
   I have configured a cluster with two nodes.    Lan setting is
   A
   eth0:  192.168.10.110
   eth1: 172.16.0.1
  
   B
   eth0:192.168.10.111
   eth1: 172.16.0.2
  
   I have configured a resource ip_0  192.168.10.100  on eth0. But when
   I unplug the eth0 link on A,  the resource can not be taken over to B
   and no any log output.
  
   I've checked the /usr/lib/ocf/resource.d/heartbeat/IPaddr2 script and
   found there are  no codes for nic link status checking.
  
   How can i monitor the nic link status to protect the virtual ip
   address? Thanks.
  
   http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Expl
  ain ed/ch09s03s03.html
 
  I am sorry this is not the expected behaviour, at least to me. I
  expect from the IPaddr2 to report a failure in a case the interface is
  not available. What is the point to maintain an IPaddr2 resource on
  interface which is not up? Furthermore, using the ping resource adds
  more complexity and basically utilizes Layer 3 protocol in order to
  monitor a layer 2 device ( the NIC). Using ip link show on the device
  should a very easy way to check link status on the NIC, ip tool
  supports also LOWER_UP.
 
   - what about configure monitor operation of IP in cib.xml - sth. like
  this: resources
       primitive id=IPaddr_194_37_40_42 class=ocf provider=heartbeat
  type=IPaddr
          meta_attributes id=primitive-IPaddr_194_37_40_42meta/
           operations
             op name=monitor interval=60s id=IPaddr_194_37_40_42_mon
  timeout=60s/
           /operations
 
  - it works for me very well ;-)

 Mia has reported on this thread that having monitor enabled is not
 enough for reporting problems on the link.
 What do you mean works for you? Have you tried to pull out the network
 cable from an interface on which you have an IPaddr2 resource running?

- I use the IPaddr RA, not IPaddr2. And of course did I test it thoroughly - 
succesfully ;-)
 You can also use ifconfig ethx down  to simulate cable pull out  ...

 Does that action cause a failure on the IPaddr2 resource?

-  on IPaddr YES!

 Cheers,
 Pavlos
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

-- 
AIP - Dr. Nikita Michalko | Main office at: | Working hours: 8:00-16:30 
Tel:   +43 1 408 35 57-14 | A -1160 Wien| except on Fri: 7:30-14:30
Fax: +43 1 408 35 57-26   | Grundsteing. 40 | michalko.sys...@a-i-p.com
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to monitor the nic link status

2010-11-29 Thread Nikita Michalko

Hi Pavlos,

Am Dienstag, 30. November 2010 05:59 schrieb Pavlos Parissis:
 On 29 November 2010 23:43, Lars Ellenberg lars.ellenb...@linbit.com wrote:
  On Mon, Nov 29, 2010 at 10:24:17PM +0800, Mia Lueng wrote:
  Hi:
  I have configured a cluster with two nodes.    Lan setting is
  A
  eth0:  192.168.10.110
  eth1: 172.16.0.1
 
  B
  eth0:192.168.10.111
  eth1: 172.16.0.2
 
  I have configured a resource ip_0  192.168.10.100  on eth0. But when I
  unplug the eth0 link on A,  the resource can not be taken over to B and
  no any log output.
 
  I've checked the /usr/lib/ocf/resource.d/heartbeat/IPaddr2 script and
  found there are  no codes for nic link status checking.
 
  How can i monitor the nic link status to protect the virtual ip address?
  Thanks.
 
  http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explain
 ed/ch09s03s03.html

 I am sorry this is not the expected behaviour, at least to me. I
 expect from the IPaddr2 to report a failure in a case the interface is
 not available. What is the point to maintain an IPaddr2 resource on
 interface which is not up? Furthermore, using the ping resource adds
 more complexity and basically utilizes Layer 3 protocol in order to
 monitor a layer 2 device ( the NIC). Using ip link show on the device
 should a very easy way to check link status on the NIC, ip tool
 supports also LOWER_UP.

 - what about configure monitor operation of IP in cib.xml - sth. like this:
   resources
  primitive id=IPaddr_194_37_40_42 class=ocf provider=heartbeat 
type=IPaddr
 meta_attributes id=primitive-IPaddr_194_37_40_42meta/
  operations
op name=monitor interval=60s id=IPaddr_194_37_40_42_mon 
timeout=60s/
  /operations

- it works for me very well ;-)

Nikita Michalko


 Cheers,
 Pavlos
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Reusable-Cluster-Components-glue: make error on 32-bit box

2010-10-27 Thread Nikita Michalko

Hi Lars,

thank you for your reply!

Am Mittwoch, 27. Oktober 2010 11:59 schrieb Lars Ellenberg:
 On Fri, Oct 22, 2010 at 02:57:48PM +0200, Nikita Michalko wrote:
  Hi all!
 
  I know itt's annoying to do today something on the 32-bit server
  (SLES11/SP1), but I need it for testing purposes. After configuring the
  package Reusable-Cluster-Components-glue-1.0.6 with:
  ./autogen.sh  ./configure --enable-fatal-warnings=no --prefix=/usr
  --sysconfdir=/etc --sharedstatedir=/var/lib/heartbeat/com
  --localstatedir=/var
   it seems to be an error on make:
 
  /usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld:
  i386:x86-64 architecture of input file
  `../../replace/.libs/libreplace.a(NoSuchFunctionName.o)' is incompatible
  with i386 output

 make clean ?

- didn't help ;-(  -  only small diference at the end:
...
/bin/sh ../../libtool --tag=CC  --tag=CC   --mode=link gcc -std=gnu99  -g -O2 
-ggdb3 -O0  -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return 
-Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement 
-Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral 
-Winline -Wmissing-prototypes -Wmissing-declarations 
-Wmissing-format-attribute -Wnested-externs -Wno-long-long 
-Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings 
-ansi -D_GNU_SOURCE -DANSI_ONLY   -o ipctest ipctest.o 
libplumb.la ../../replace/libreplace.la  ../../lib/pils/libpils.la -lbz2 
-lxml2 -lc -lrt -ldl  -lglib-2.0   -lltdl
libtool: link: gcc -std=gnu99 -g -O2 -ggdb3 -O0 -fgnu89-inline 
-fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast 
-Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels 
-Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline 
-Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute 
-Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith 
-Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY 
-o .libs/ipctest 
ipctest.o  ./.libs/libplumb.so 
/root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib/pils/.libs/libpils.so
 ../../replace/.libs/libreplace.a ../../lib/pils/.libs/libpils.so 
-lbz2 /usr/lib/libxml2.so -lz -lm -lc -lrt -lglib-2.0 /usr/lib/libltdl.so 
-ldl
./.libs/libplumb.so: undefined reference to `uuid_parse'
./.libs/libplumb.so: undefined reference to `uuid_generate'
./.libs/libplumb.so: undefined reference to `uuid_copy'
./.libs/libplumb.so: undefined reference to `uuid_is_null'
./.libs/libplumb.so: undefined reference to `uuid_unparse'
./.libs/libplumb.so: undefined reference to `uuid_clear'
./.libs/libplumb.so: undefined reference to `uuid_compare'
collect2: ld returned 1 exit status
gmake[2]: *** [ipctest] Fehler 1
gmake[2]: Leaving directory 
`/root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib/clplumbing'
gmake[1]: *** [all-recursive] Fehler 1


Regards

Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Reusable-Cluster-Components-glue: make error on 32-bit box

2010-10-27 Thread Nikita Michalko

Am Mittwoch, 27. Oktober 2010 12:52 schrieb Lars Ellenberg:
 On Wed, Oct 27, 2010 at 12:15:34PM +0200, Nikita Michalko wrote:
  Hi Lars,
 
  thank you for your reply!
 
  Am Mittwoch, 27. Oktober 2010 11:59 schrieb Lars Ellenberg:
   On Fri, Oct 22, 2010 at 02:57:48PM +0200, Nikita Michalko wrote:
Hi all!
   
I know itt's annoying to do today something on the 32-bit server
(SLES11/SP1), but I need it for testing purposes. After configuring
the package Reusable-Cluster-Components-glue-1.0.6 with:
./autogen.sh  ./configure --enable-fatal-warnings=no --prefix=/usr
--sysconfdir=/etc --sharedstatedir=/var/lib/heartbeat/com
--localstatedir=/var
 it seems to be an error on make:
   
/usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld:
i386:x86-64 architecture of input file
`../../replace/.libs/libreplace.a(NoSuchFunctionName.o)' is
incompatible with i386 output
  
   make clean ?
 
  - didn't help ;-(  -  only small diference at the end:

 Oh, it did.
 You now just need to install the build dependencies ;-)

 You could, of course, just use pre-build packages.

  ...
  /bin/sh ../../libtool --tag=CC  --tag=CC   --mode=link gcc -std=gnu99  -g
  -O2 -ggdb3 -O0  -fgnu89-inline -fstack-protector-all -Wall
  -Waggregate-return -Wbad-function-cast -Wcast-qual -Wcast-align
  -Wdeclaration-after-statement -Wendif-labels -Wfloat-equal -Wformat=2
  -Wformat-security -Wformat-nonliteral -Winline -Wmissing-prototypes
  -Wmissing-declarations
  -Wmissing-format-attribute -Wnested-externs -Wno-long-long
  -Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings
  -ansi -D_GNU_SOURCE -DANSI_ONLY   -o ipctest ipctest.o
  libplumb.la ../../replace/libreplace.la  ../../lib/pils/libpils.la -lbz2
  -lxml2 -lc -lrt -ldl  -lglib-2.0   -lltdl
  libtool: link: gcc -std=gnu99 -g -O2 -ggdb3 -O0 -fgnu89-inline
  -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast
  -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels
  -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline
  -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute
  -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith
  -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY
  -o .libs/ipctest
  ipctest.o  ./.libs/libplumb.so
  /root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib/pi
 ls/.libs/libpils.so ../../replace/.libs/libreplace.a
  ../../lib/pils/.libs/libpils.so -lbz2 /usr/lib/libxml2.so -lz -lm -lc
  -lrt -lglib-2.0 /usr/lib/libltdl.so -ldl
  ./.libs/libplumb.so: undefined reference to `uuid_parse'

 How about you google that, and follow the first hit?

- Ahh, very well - I forgot already my first attempt on 64-bit box with all 
that dependancies ... Thank you!


Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Reusable-Cluster-Components-glue: make error on 32-bit box

2010-10-22 Thread Nikita Michalko

Hi all!

I know itt's annoying to do today something on the 32-bit server (SLES11/SP1), 
but I need it for testing purposes. After configuring the package  
Reusable-Cluster-Components-glue-1.0.6 with:
./autogen.sh  ./configure --enable-fatal-warnings=no --prefix=/usr 
--sysconfdir=/etc --sharedstatedir=/var/lib/heartbeat/com 
--localstatedir=/var
 it seems to be an error on make:
...
/bin/sh ../../libtool --tag=CC  --tag=CC   --mode=link gcc -std=gnu99 -g -O2 
-ggdb3 -O0  -fgnu89-inline -fstack-protector-all -Wall -Waggregate-return 
-Wbad-function-cast -Wcast-qual -Wcast-align -Wdeclaration-after-statement 
-Wendif-labels -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral 
-Winline -Wmissing-prototypes -Wmissing-declarations 
-Wmissing-format-attribute -Wnested-externs -Wno-long-long 
-Wno-strict-aliasing -Wpointer-arith -Wstrict-prototypes -Wwrite-strings 
-ansi -D_GNU_SOURCE -DANSI_ONLY -g -O2 -ggdb3 -O0  -fgnu89-inline 
-fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast 
-Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels 
-Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline 
-Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute 
-Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith 
-Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY 
-version-info 2:0:0  -o libpils.la -rpath /usr/lib 
pils.lo ../../replace/libreplace.la  -lbz2 -lxml2 -lc -lrt -ldl  -lglib-2.0   
-lltdl
libtool: link: gcc -std=gnu99 -shared  .libs/pils.o  
-Wl,--whole-archive ../../replace/.libs/libreplace.a -Wl,--no-whole-archive  
-lbz2 /usr/lib/libxml2.so -lz -lm -lc -lrt -lglib-2.0 /usr/lib/libltdl.so 
-ldl-Wl,-soname -Wl,libpils.so.2 -o .libs/libpils.so.2.0.0
/usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: 
i386:x86-64 architecture of input file 
`../../replace/.libs/libreplace.a(NoSuchFunctionName.o)' is incompatible with 
i386 output
/usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: final 
link failed: Invalid operation
collect2: ld returned 1 exit status
gmake[2]: *** [libpils.la] Fehler 1
gmake[2]: Leaving directory 
`/root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib/pils'
gmake[1]: *** [all-recursive] Fehler 1
gmake[1]: Leaving directory 
`/root/neueRPMs/ha/303/sourc/Reusable-Cluster-Components-glue-1.0.6/lib'
make: *** [all-recursive] Fehler 1

Configuring  with --build=x86 didn't help

Many thanks in advance for the reply!
...
 
Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] superfluous dependency in heartbeat spec file

2010-10-21 Thread Nikita Michalko

Hi Vadym,

can I apply this also on SLES10/SLES11 ?


TIA

Nikita Michalko 

Am Dienstag, 12. Oktober 2010 14:29 schrieb Vadym Chepkov:
 It was brought up in pacemaker mail list but this applies to heartbeat rpm
 packaging as well. Libraries do not depend on base package, they are
 independent.
 This is how one can install several version of the same library (compat-
 packages) Also it is possible to use heartbeat libraries without using
 heartbeat daemon itself. (if one uses pacemaker with corosync, for
 instance).

 Vadym

 # HG changeset patch
 # User Vadym Chepkov vchep...@gmail.com
 # Date 1286886305 14400
 # Node ID f1aea427d2c01756e06b4b917787c21ee440f24b
 # Parent  82fc843fbcf9733e50bbc169c95e51b6c7f97c54
 Fix package inter-dependencies

 diff -r 82fc843fbcf9 -r f1aea427d2c0 heartbeat-fedora.spec
 --- a/heartbeat-fedora.spec   Mon Oct 04 22:12:37 2010 +0200
 +++ b/heartbeat-fedora.spec   Tue Oct 12 08:25:05 2010 -0400
 @@ -40,6 +40,7 @@
 BuildRequires:which
 BuildRequires:cluster-glue-libs-devel
 BuildRequires:libxslt docbook-dtds docbook-style-xsl
 +Requires: heartbeat-libs = %{version}-%{release}
 Requires: PyXML
 Requires: resource-agents
 Requires: cluster-glue-libs
 @@ -81,7 +82,6 @@
 %package libs
 Summary:  Heartbeat libraries
 Group:System Environment/Daemons
 -Requires: heartbeat = %{version}-%{release}

 %description libs
 Heartbeat library package
 @@ -89,7 +89,7 @@
 %package devel
 Summary:Heartbeat development package
 Group:  System Environment/Daemons
 -Requires:   heartbeat = %{version}-%{release}
 +Requires:   heartbeat-libs = %{version}-%{release}

 %description devel
 Headers and shared libraries for writing programs for Heartbeat


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Emergency reboot by stonith-enabled=false

2010-10-11 Thread Nikita Michalko

Thank you Dejan - it works now (with crm respawn) !

Cheers!

Nikita Michalko

Am Freitag, 8. Oktober 2010 18:38 schrieb Dejan Muhamedagic:
 Hi,

 On Fri, Oct 08, 2010 at 03:35:13PM +0200, Nikita Michalko wrote:
  Hi all!
 
  My very simple 2 nodes test  cluster with PacemakerHeartbeat make me
  some headaches. Here are my versions:
 
  cluster-glue: 1.0.6
  resource-agents: 1.0.3
  Heartbeat STABLE: 3.0.3
  pacemaker: 1.1.3 (all from wiki sources)
  OS: SLES11/SP1
  After succesfully starting Heartbeat  on the first node opter
  (the other node was for test not up- dead) with stonith
  disabled (see my configuration below) did the first node
  reboot. Why on my own node? Do I need stonith on symmetric
  cluster?

 Yes.

  HB_Report attached ...

 stonith-ng failed to connect to the cluster:

 Oct 08 13:03:27 opteron heartbeat: [10872]: WARN: Client [stonith-ng] pid
 10898 failed authorization [no default client auth] Oct 08 13:03:27 opteron
 heartbeat: [10872]: ERROR: api_process_registration_msg: cannot add
 client(stonith-ng) ...
 Oct 08 13:03:27 opteron stonith-ng: [10898]: CRIT: main: Cannot sign in to
 the cluster... terminating

 which made heartbeat reboot. I guess that you can add sth like
 this to ha.cf:

 apiauth stonith-ng  uid=root

 If you want to prevent reboots, use crm respawn.

 Thanks,

 Dejan

  My configuration:
  -- crm(live)# configure show
  node $id=5ac2b85d-802f-40a6-ad0f-38660c4a6fb0 opter
  node $id=caca825d-2fd9-426d-9ed7-8ff9845bc08f aipsles11
  primitive IPaddr_192_168_150_54 ocf:heartbeat:IPaddr \
  op monitor interval=60s timeout=60s \
  params ip=192.168.150.54 cidr_netmask=24
  broadcast=192.168.150.63 primitive IPaddr_19X_XX_XX_54
  ocf:heartbeat:IPaddr \
  op monitor interval=60s timeout=60s \
  params ip=19X.XX.XX.54 cidr_netmask=26
  broadcast=19X.XX.XX.63 primitive ubis_udbmain_3 lsb:ubis_udbmain \
  op monitor interval=120s timeout=110s
  group group_1 IPaddr_19X_XX_XX_54 IPaddr_192_168_150_54 ubis_udbmain_3
  location rsc_location_group_1 group_1 \
  rule $id=prefered_location_group_1 1: #uname eq opter
  property $id=cib-bootstrap-options \
  symmetric-cluster=true \
  no-quorum-policy=ignore \
  migration-threshold=3 \
  stonith-enabled=false \
  stonith-action=reboot \
  startup-fencing=false \
  stop-orphan-resources=true \
  stop-orphan-actions=true \
  remove-after-stop=false \
  short-resource-names=true \
  transition-idle-timeout=3min \
  default-action-timeout=110s \
  is-managed-default=true \
  cluster-delay=60s \
  pe-error-series-max=-1 \
  pe-warn-series-max=-1 \
  pe-input-series-max=-1 \
  dc-version=1.1.3-7e4c0424e331aa2a51cb1efb69e80b5c8e1f8701 \
  cluster-infrastructure=Heartbeat \
  last-lrm-refresh=1284125385
 
  Any ideas/comments?
 
  TIA!
 
  Nikita Michalko
 
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Emergency reboot by stonith-enabled=false

2010-10-11 Thread Nikita Michalko

Am Sonntag, 10. Oktober 2010 20:35 schrieb Lars Marowsky-Bree:
 On 2010-10-08T15:35:13, Nikita Michalko michalko.sys...@a-i-p.com wrote:
  Hi all!
 
  My very simple 2 nodes test  cluster with PacemakerHeartbeat make me
  some headaches. Here are my versions:
 
  cluster-glue: 1.0.6
  resource-agents: 1.0.3
  Heartbeat STABLE: 3.0.3
  pacemaker: 1.1.3 (all from wiki sources)
  OS: SLES11/SP1

 Is there any specific reason why you're using heartbeat?

- only customer support costs ;-)

Regards

Nikita Michalko


 openais/corosync/pacemaker are available as supported packages on the
 SLE 11 SP1 platform ...


 Regards,
 Lars
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Emergency reboot by stonith-enabled=false

2010-10-08 Thread Nikita Michalko

Hi all!

My very simple 2 nodes test  cluster with PacemakerHeartbeat make me some 
headaches. Here are my versions:

cluster-glue: 1.0.6
resource-agents: 1.0.3
Heartbeat STABLE: 3.0.3
pacemaker: 1.1.3 (all from wiki sources)
OS: SLES11/SP1
After succesfully starting Heartbeat  on the first node opter (the other node 
was for test not up- dead) with stonith disabled (see my configuration below) 
did the first node reboot. Why on my own node? Do I need stonith on symmetric 
cluster? HB_Report attached ...

My configuration:
-- crm(live)# configure show
node $id=5ac2b85d-802f-40a6-ad0f-38660c4a6fb0 opter
node $id=caca825d-2fd9-426d-9ed7-8ff9845bc08f aipsles11
primitive IPaddr_192_168_150_54 ocf:heartbeat:IPaddr \
op monitor interval=60s timeout=60s \
params ip=192.168.150.54 cidr_netmask=24 broadcast=192.168.150.63
primitive IPaddr_19X_XX_XX_54 ocf:heartbeat:IPaddr \
op monitor interval=60s timeout=60s \
params ip=19X.XX.XX.54 cidr_netmask=26 broadcast=19X.XX.XX.63
primitive ubis_udbmain_3 lsb:ubis_udbmain \
op monitor interval=120s timeout=110s
group group_1 IPaddr_19X_XX_XX_54 IPaddr_192_168_150_54 ubis_udbmain_3
location rsc_location_group_1 group_1 \
rule $id=prefered_location_group_1 1: #uname eq opter
property $id=cib-bootstrap-options \
symmetric-cluster=true \
no-quorum-policy=ignore \
migration-threshold=3 \
stonith-enabled=false \
stonith-action=reboot \
startup-fencing=false \
stop-orphan-resources=true \
stop-orphan-actions=true \
remove-after-stop=false \
short-resource-names=true \
transition-idle-timeout=3min \
default-action-timeout=110s \
is-managed-default=true \
cluster-delay=60s \
pe-error-series-max=-1 \
pe-warn-series-max=-1 \
pe-input-series-max=-1 \
dc-version=1.1.3-7e4c0424e331aa2a51cb1efb69e80b5c8e1f8701 \
cluster-infrastructure=Heartbeat \
last-lrm-refresh=1284125385

Any ideas/comments?

TIA!

Nikita Michalko 


hb-report_3.tar.bz2
Description: application/tbz
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Complile error Reusable-Cluster-Components-glue-1.0.6

2010-10-07 Thread Nikita Michalko

Thank you Dejan, 

with the option --enable-fatal-warnings=no it works!


Cheers!

Nikita Michalko

Am Mittwoch, 6. Oktober 2010 16:57 schrieb Dejan Muhamedagic:
 Hi,

 On Wed, Oct 06, 2010 at 02:30:34PM +0200, Nikita Michalko wrote:
  Hi all!
 
  After downloading  the sources
  Reusable-Cluster-Components-glue-1.0.6.tar.bz2 from
  http://www.linux-ha.org/wiki/Download and configuring with: ./configure 
  --with-heartbeat
  I have a small problem by compiling  source  with make:
 
  snip---
  ...
  libtool: link: ranlib .libs/libstonith.a
  libtool: link: rm -fr .libs/libstonith.lax
  libtool: link: ( cd .libs  rm -f libstonith.la  ln -s
  ../libstonith.la libstonith.la )
  gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../include -I../../include
  -I../../include -I../../linux-ha -I../../linux-ha -I../../libltdl
  -I../../libltdl  -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include
  -I/usr/include/libxml2  -g -O2 -ggdb3 -O0  -fgnu89-inline
  -fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast
  -Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels
  -Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline
  -Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute
  -Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith
  -Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY
  -Werror -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c
  cc1: warnings being treated as errors
  main.c:64: error: function declaration isn't a prototype
  main.c:78: error: function declaration isn't a prototype
  gmake[2]: *** [main.o] Fehler 1
  gmake[2]: Leaving directory
  `/root/neueRPMs/ha/303/sourc/wiki/Reusable-Cluster-Components-glue-1.0.6/
 lib/stonith' gmake[1]: *** [all-recursive] Fehler 1
  gmake[1]: Leaving directory
  `/root/neueRPMs/ha/303/sourc/wiki/Reusable-Cluster-Components-glue-1.0.6/
 lib' make: *** [all-recursive] Fehler 1
 
  OS: SLES11/SP1, 64bit box
  Maybe some libraries missing?
  TIA!

 Hmpf, no libraries missing, it's something I missed and which got
 fixed a week after the release. The changeset is 8286b46c91e3. At
 any rate you can try:

 ./configure --enable_fatal_warnings=no --with-heartbeat

 I believe that that's the right incantation.

 Thanks,

 Dejan

  Nikita Michalko
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Complile error Reusable-Cluster-Components-glue-1.0.6

2010-10-06 Thread Nikita Michalko

Hi all!

After downloading  the sources Reusable-Cluster-Components-glue-1.0.6.tar.bz2 
from http://www.linux-ha.org/wiki/Download and configuring with:
./configure  --with-heartbeat
I have a small problem by compiling  source  with make:

snip---
...
libtool: link: ranlib .libs/libstonith.a
libtool: link: rm -fr .libs/libstonith.lax
libtool: link: ( cd .libs  rm -f libstonith.la  ln -s 
../libstonith.la libstonith.la )
gcc -std=gnu99 -DHAVE_CONFIG_H -I. -I../../include -I../../include 
-I../../include -I../../linux-ha -I../../linux-ha -I../../libltdl 
-I../../libltdl  -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   
-I/usr/include/libxml2  -g -O2 -ggdb3 -O0  -fgnu89-inline 
-fstack-protector-all -Wall -Waggregate-return -Wbad-function-cast 
-Wcast-qual -Wcast-align -Wdeclaration-after-statement -Wendif-labels 
-Wfloat-equal -Wformat=2 -Wformat-security -Wformat-nonliteral -Winline 
-Wmissing-prototypes -Wmissing-declarations -Wmissing-format-attribute 
-Wnested-externs -Wno-long-long -Wno-strict-aliasing -Wpointer-arith 
-Wstrict-prototypes -Wwrite-strings -ansi -D_GNU_SOURCE -DANSI_ONLY -Werror 
-MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c
cc1: warnings being treated as errors
main.c:64: error: function declaration isn't a prototype
main.c:78: error: function declaration isn't a prototype
gmake[2]: *** [main.o] Fehler 1
gmake[2]: Leaving directory 
`/root/neueRPMs/ha/303/sourc/wiki/Reusable-Cluster-Components-glue-1.0.6/lib/stonith'
gmake[1]: *** [all-recursive] Fehler 1
gmake[1]: Leaving directory 
`/root/neueRPMs/ha/303/sourc/wiki/Reusable-Cluster-Components-glue-1.0.6/lib'
make: *** [all-recursive] Fehler 1

OS: SLES11/SP1, 64bit box
Maybe some libraries missing?
TIA!


Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmd : CCM Connection failed

2010-09-27 Thread Nikita Michalko

Hi,

Am Freitag, 24. September 2010 17:28 schrieb sunitha kumar:
 Hi Nikita,
 Thanks for your response.
 The permissions on cib.xml look right.

 -rw---  1 hacluster haclient 1487 Sep 23 16:32
 /var/lib/heartbeat/crm/cib.xml

 Is this issue fixed in heartbeat-3.0.3 ?

I dont know if that was an issue, but Ihad a similar problem by upgrade from 
V.2.1.3. After installing the new version AND upgrading CIB - problem 
disapeared.
Look at : http://www.linux-ha.org/wiki/Releases.


Cheers!

Nikita Michalko

 thanks,
 -sunitha


 On Thu, Sep 23, 2010 at 11:02 PM, Nikita Michalko
 michalko.sys...@a-i-p.com

  wrote:
 
  Hi,
 
  Am Freitag, 24. September 2010 03:31 schrieb sunitha kumar:
 service heartbeat status
   
heartbeat OK [pid 23886 et al] is running
   
On :
service heartbeat restart, the logs show that CCM Connection failed.
 
  Any
 
pointers?
thnx
-sunitha
   
cib: [3301]: WARN: ccm_connect: CCM Activation failed
cib: [3301]: WARN: ccm_connect: CCM Connection failed 21 times (30
max) crmd: [3305]: info: crm_timer_popped: Wait Timer (I_NULL) just
popped! crmd: [3305]: info: do_cib_control: Could not connect to the
CIB
 
  service:
connection failed
crmd: [3305]: WARN: do_cib_control: Couldn't complete CIB
registration
 
  21
 
times... pause and retry
cib: [3301]: info: ccm_connect: Registering with CCM...
cib: [3301]: WARN: ccm_connect: CCM Activation failed
cib: [3301]: WARN: ccm_connect: CCM Connection failed 22 times (30
max) crmd: [3305]: info: crm_timer_popped: Wait Timer (I_NULL) just
popped! crmd: [3305]: info: do_cib_control: Could not connect to the
CIB
 
  service:
connection failed
crmd: [3305]: WARN: do_cib_control: Couldn't complete CIB
registration
 
  22
 
times... pause and retry
..
   
crmd: [23896]: WARN: do_ccm_control: CCM Activation failed
crmd: [23896]: WARN: do_ccm_control: CCM Connection failed 29 times
(30 max)
crmd: [23896]: info: crm_timer_popped: Wait Timer (I_NULL) just
popped! crmd: [23896]: WARN: do_ccm_control: CCM Activation failed
crmd: [23896]: ERROR: do_ccm_control: CCM Activation failed 30 (max)
times
   
This is on:
pacemaker-mgmt-client-1.99.2-6.1
pacemaker-libs-1.0.5-4.1
pacemaker-1.0.5-4.1
pacemaker-mgmt-1.99.2-6.1
pacemaker-libs-devel-1.0.5-4.1
pacemaker-mgmt-devel-1.99.2-6.1
heartbeat-3.0.0-33.2
heartbeat-devel-3.0.0-33.2
 
  - any chance to upgrade to the latest versions (heartbeat-3.0..3)?
  Are the permissions to the /var/lib/heartbeat/crm/cib.xml OK?
 
  HTH
 
  Nikita Michalko
 
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmd : CCM Connection failed

2010-09-24 Thread Nikita Michalko

Hi,


Am Freitag, 24. September 2010 03:31 schrieb sunitha kumar:
  service heartbeat status
  heartbeat OK [pid 23886 et al] is running
 
  On :
  service heartbeat restart, the logs show that CCM Connection failed. Any
  pointers?
  thnx
  -sunitha
 
  cib: [3301]: WARN: ccm_connect: CCM Activation failed
  cib: [3301]: WARN: ccm_connect: CCM Connection failed 21 times (30 max)
  crmd: [3305]: info: crm_timer_popped: Wait Timer (I_NULL) just popped!
  crmd: [3305]: info: do_cib_control: Could not connect to the CIB service:
  connection failed
  crmd: [3305]: WARN: do_cib_control: Couldn't complete CIB registration 21
  times... pause and retry
  cib: [3301]: info: ccm_connect: Registering with CCM...
  cib: [3301]: WARN: ccm_connect: CCM Activation failed
  cib: [3301]: WARN: ccm_connect: CCM Connection failed 22 times (30 max)
  crmd: [3305]: info: crm_timer_popped: Wait Timer (I_NULL) just popped!
  crmd: [3305]: info: do_cib_control: Could not connect to the CIB service:
  connection failed
  crmd: [3305]: WARN: do_cib_control: Couldn't complete CIB registration 22
  times... pause and retry
  ..
 
  crmd: [23896]: WARN: do_ccm_control: CCM Activation failed
  crmd: [23896]: WARN: do_ccm_control: CCM Connection failed 29 times (30
  max)
  crmd: [23896]: info: crm_timer_popped: Wait Timer (I_NULL) just popped!
  crmd: [23896]: WARN: do_ccm_control: CCM Activation failed
  crmd: [23896]: ERROR: do_ccm_control: CCM Activation failed 30 (max)
  times
 
  This is on:
  pacemaker-mgmt-client-1.99.2-6.1
  pacemaker-libs-1.0.5-4.1
  pacemaker-1.0.5-4.1
  pacemaker-mgmt-1.99.2-6.1
  pacemaker-libs-devel-1.0.5-4.1
  pacemaker-mgmt-devel-1.99.2-6.1
  heartbeat-3.0.0-33.2
  heartbeat-devel-3.0.0-33.2

- any chance to upgrade to the latest versions (heartbeat-3.0..3)?
Are the permissions to the /var/lib/heartbeat/crm/cib.xml OK?

HTH

Nikita Michalko 


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3

2010-09-15 Thread Nikita Michalko


Am Dienstag, 14. September 2010 12:16 schrieb Florian Haas:
 On 2010-09-14 12:13, Nikita Michalko wrote:
  Hi Florian,
 
  thank you very much for the link to webinar - very good work!
  I have tried that on SLES in the meantime, but  facing with the following
  issue:
  Download heartbeat 3.0.3_STABLE from
  http://www.linux-ha.org/wiki/Download starts and then immediately comes
  error:
  Verbindung zu Rechner hg.linux-ha.org ist unterbrochen

 Works just fine for me. Some upstream proxy getting in the way? In case
 anyone else is having this issue, please speak up now.

Really - it was the konqueror : with Firefox it's working fine. Sorry for the 
noise ...

Cheers,

Nikita

 Cheers,
 Florian
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3

2010-09-14 Thread Nikita Michalko

Hi Florian,

thank you very much for the link to webinar - very good work!
I have tried that on SLES in the meantime, but  facing with the following 
issue:
Download heartbeat 3.0.3_STABLE from http://www.linux-ha.org/wiki/Download
starts and then immediately comes error:
Verbindung zu Rechner hg.linux-ha.org ist unterbrochen

Other possibilities?

Regards

Nikita Michalko

Am Freitag, 10. September 2010 13:17 schrieb Florian Haas:
 On 2010-09-10 09:05, Nikita Michalko wrote:
  Am Donnerstag, 9. September 2010 13:31 schrieb Tim Serong:
  On 9/9/2010 at 04:38 PM, Nikita Michalko michalko.sys...@a-i-p.com 
wrote:
  Am Donnerstag, 9. September 2010 07:09 schrieb Tim Serong:
  It looks like Pacemaker in network:ha-clustering builds without
  Heartbeat support (there's no Heartbeat in that repo, so no current
  source for heartbeat-devel).  That being said, I suspect Pacemaker in
  the openSUSE repos hasn't built with Heartbeat support for some time
  (seems to be disabled in the spec file for Pacemaker 1.0.x from the
  openSUSE:11.1 repo, for example).
 
  Does it mean I should build  the new  RPMs from sources?
 
  Maybe..  But if you don't want to have to do that, you might try the
  RPMs from http://www.clusterlabs.org/rpm/ - you may find the openSUSE
  11.1 or 11.2 RPMs are installable on SLES (although I haven't tried this
  myself,
 
  I tried it already of course -  installed with zypper, but then can't
  start it: problem with pacemaker (Signon to CIB failed: connection failed
  ...) In the meantime I installed the pacemaker + heratbeat from scratch
  and just now testing it ...

 Even though you're not on Debian, this webinar may still be of help wrt
 the Heartbeat and Pacemaker upgrade process:

 http://www.linbit.com/en/training/on-demand-webinars/upgrading-to-pacemaker
-on-debian-squeeze/

 Cheers,
 Florian
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3

2010-09-10 Thread Nikita Michalko

Am Donnerstag, 9. September 2010 13:31 schrieb Tim Serong:
 On 9/9/2010 at 04:38 PM, Nikita Michalko michalko.sys...@a-i-p.com wrote:
  Am Donnerstag, 9. September 2010 07:09 schrieb Tim Serong:
   It looks like Pacemaker in network:ha-clustering builds without
   Heartbeat support (there's no Heartbeat in that repo, so no current
   source for heartbeat-devel).  That being said, I suspect Pacemaker in
   the openSUSE repos hasn't built with Heartbeat support for some time
   (seems to be disabled in the spec file for Pacemaker 1.0.x from the
   openSUSE:11.1 repo, for example).
 
  Does it mean I should build  the new  RPMs from sources?

 Maybe..  But if you don't want to have to do that, you might try the
 RPMs from http://www.clusterlabs.org/rpm/ - you may find the openSUSE 11.1
 or 11.2 RPMs are installable on SLES (although I haven't tried this myself,

I tried it already of course -  installed with zypper, but then can't start it:
problem with pacemaker (Signon to CIB failed: connection failed ...)
In the meantime I installed the pacemaker + heratbeat from scratch and just now 
testing it ...

 and I would still generally encourage people on SLES to use SLE HAE,
 although I understand you want to continue to use Heartbeat, which makes
 that a problem :))

 Another possibility, if you (or anyone else) is in a position to get a
 current version of heartbeat building on build.opensuse.org, I can help
 to get it included in the network:ha-clustering repo (I'm just not really

 That would be great !!

 able to do any packaging or testing of it myself).

   The SLE HAE product replaced Heartbeat with openAIS when SLES 11 was
 
  Yes, I know it, we want stay by Heartbeat   Pacemaker in the production
  though ...
 
   released (this is now corosync+openais on SLE 11 SP1), so I'm curious
   to know what OS you're upgrading from, if you previously had heartbeat
   2.1.3 running.
 
  That was the SLES10 SP2, but with the HA version of
  heartbeat-resources-2.1.3-23.1 (pacemaker-heartbeat-0.6.5-8.2)

 OK, understood.

 Regards,

 Tim

Regards,

 Nikita 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3

2010-09-09 Thread Nikita Michalko

Hi Tim,

thank you for reply. See my answer below ...

Am Donnerstag, 9. September 2010 07:09 schrieb Tim Serong:
 On 9/8/2010 at 10:40 PM, Nikita Michalko michalko.sys...@a-i-p.com wrote:
  Helo list!
 
  I am trying now to upgrade heartbeat 2.1.3 (pacemaker 0.6)  to 3.0.3 on
  SLES11/SP1.
  After installing the new RPM's from
  http://download.opensuse.org/repositories/network:/ha-clustering/SLE_11_S
 P1/x 86_64/
   I see the following errors in the ha-log:
  ...
   WARN: do_cib_control: Couldn't complete CIB registration 30 times...
  pause and retry
   ERROR: do_cib_control: Could not complete CIB registration  30 times...
  hard
  error
  ...
  The cib.xml has proper rights (I think):
  -rw--- 1 hacluster haclient 3474 2010-09-08
  12:25 /var/lib/heartbeat/crm/cib.xml
 
  Verifying CIB:
  crm_verify -VVV -x cib.xml
  crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation:
  Upgrading (null)-style configuration to pacemaker-0.6 with no-op
  crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation:
  Upgrading transitional-0.6-style configuration to pacemaker-1.0
  with /usr/share/pacemake
  r/upgrade06.xsl
  crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation:
  Upgrading pacemaker-1.1-style configuration to pacemaker-1.2 with no-op
  crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation:
  Upgraded from none to pacemaker-1.2 validation
  crm_verify[24558]: 2010/09/08_13:20:17 WARN: cluster_status: We do not
  have quorum - fencing and resource management disabled
 
  With crm I can not change anything in cib.xml:
   crm configure
  Signon to CIB failed: connection failed
  Init failed, could not perform requested operations
  ERROR: cannot parse xml: no element found: line 1, column 0
 
  Installed SW/versions:
  heartbeat-3.0.3-2.14

 Where did that version of Heartbeat come from?  It's not present in the
 openSUSE network:ha-clustering repo (actually, there is no version of
 heartbeat present in that repo).

  libgssglue1-0.1-6.22
  libglue2-1.0.6-2.1
  cluster-glue-1.0.6-2.1
  resource-agents-1.0.3-4.2
  pacemaker-1.1.2.1-5.1
 
  My cib.xml and ha-log are attached. I suppose my CIB is wrong.
  How can I update the old cib.xml?
  Could someone point me pls to the right upgrade
  sequence/documentation?

 It looks like Pacemaker in network:ha-clustering builds without Heartbeat
 support (there's no Heartbeat in that repo, so no current source for
 heartbeat-devel).  That being said, I suspect Pacemaker in the openSUSE
 repos hasn't built with Heartbeat support for some time (seems to be
 disabled in the spec file for Pacemaker 1.0.x from the openSUSE:11.1
 repo, for example).
Does it mean I should build  the new  RPMs from sources?


 The SLE HAE product replaced Heartbeat with openAIS when SLES 11 was
Yes, I know it, we want stay by Heartbeat   Pacemaker in the production 
though ...

 released (this is now corosync+openais on SLE 11 SP1), so I'm curious to
 know what OS you're upgrading from, if you previously had heartbeat 2.1.3
 running.

That was the SLES10 SP2, but with the HA version of  
heartbeat-resources-2.1.3-23.1 (pacemaker-heartbeat-0.6.5-8.2)

 Regards,

 Tim

Best regards

 Nikita
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Upgrade heartbeat 2.1.3 to 3.0.3

2010-09-08 Thread Nikita Michalko

Helo list!

I am trying now to upgrade heartbeat 2.1.3 (pacemaker 0.6)  to 3.0.3 on 
SLES11/SP1.
After installing the new RPM's from 
http://download.opensuse.org/repositories/network:/ha-clustering/SLE_11_SP1/x86_64/
 I see the following errors in the ha-log:
...
 WARN: do_cib_control: Couldn't complete CIB registration 30 times... pause 
and retry
 ERROR: do_cib_control: Could not complete CIB registration  30 times... hard 
error
...
The cib.xml has proper rights (I think):
-rw--- 1 hacluster haclient 3474 2010-09-08 
12:25 /var/lib/heartbeat/crm/cib.xml

Verifying CIB:
crm_verify -VVV -x cib.xml
crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading 
(null)-style configuration to pacemaker-0.6 with no-op
crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading 
transitional-0.6-style configuration to pacemaker-1.0 
with /usr/share/pacemake
r/upgrade06.xsl
crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgrading 
pacemaker-1.1-style configuration to pacemaker-1.2 with no-op
crm_verify[24558]: 2010/09/08_13:20:17 notice: update_validation: Upgraded 
from none to pacemaker-1.2 validation
crm_verify[24558]: 2010/09/08_13:20:17 WARN: cluster_status: We do not have 
quorum - fencing and resource management disabled

With crm I can not change anything in cib.xml:
 crm configure
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
ERROR: cannot parse xml: no element found: line 1, column 0

Installed SW/versions:
heartbeat-3.0.3-2.14
libgssglue1-0.1-6.22
libglue2-1.0.6-2.1
cluster-glue-1.0.6-2.1
resource-agents-1.0.3-4.2
pacemaker-1.1.2.1-5.1

My cib.xml and ha-log are attached. I suppose my CIB is wrong.
How can I update the old cib.xml?
Could someone point me pls to the right upgrade
sequence/documentation?



Best regards

Nikita Michalko 

?xml version=1.0 ?
cib admin_epoch=0 epoch=0 num_updates=0
	configuration
		crm_config
			cluster_property_set id=cib-bootstrap-options
attributes
	nvpair id=cib-bootstrap-options-symmetric-cluster name=symmetric-cluster value=true/
	nvpair id=cib-bootstrap-options-no-quorum-policy name=no-quorum-policy value=stop/
	nvpair id=cib-bootstrap-options-default-resource-stickiness name=default-resource-stickiness value=2/
	nvpair id=cib-bootstrap-options-default-resource-failure-stickiness name=default-resource-failure-stickiness value=-6/
	nvpair id=cib-bootstrap-options-stonith-enabled name=stonith-enabled value=false/
	nvpair id=cib-bootstrap-options-stonith-action name=stonith-action value=reboot/
	nvpair id=cib-bootstrap-options-startup-fencing name=startup-fencing value=true/
	nvpair id=cib-bootstrap-options-stop-orphan-resources name=stop-orphan-resources value=true/
	nvpair id=cib-bootstrap-options-stop-orphan-actions name=stop-orphan-actions value=true/
	nvpair id=cib-bootstrap-options-remove-after-stop name=remove-after-stop value=false/
	nvpair id=cib-bootstrap-options-short-resource-names name=short-resource-names value=true/
	nvpair id=cib-bootstrap-options-transition-idle-timeout name=transition-idle-timeout value=3min/
	nvpair id=cib-bootstrap-options-default-action-timeout name=default-action-timeout value=110s/
	nvpair id=cib-bootstrap-options-is-managed-default name=is-managed-default value=true/
	nvpair id=cib-bootstrap-options-cluster-delay name=cluster-delay value=60s/
	nvpair id=cib-bootstrap-options-pe-error-series-max name=pe-error-series-max value=-1/
	nvpair id=cib-bootstrap-options-pe-warn-series-max name=pe-warn-series-max value=-1/
	nvpair id=cib-bootstrap-options-pe-input-series-max name=pe-input-series-max value=-1/
/attributes
			/cluster_property_set
		/crm_config
		nodes/
		resources
			group id=group_1
primitive class=ocf id=IPaddr_193_27_40_54 provider=heartbeat type=IPaddr
	operations
		op id=IPaddr_193_27_40_54_mon interval=60s name=monitor timeout=60s/
	/operations
	instance_attributes id=IPaddr_193_27_40_54_inst_attr
		attributes
			nvpair id=IPaddr_193_27_40_54_attr_0 name=ip value=193.27.40.54/
			nvpair id=IPaddr_193_27_40_54_attr_1 name=cidr_netmask value=26/
			nvpair id=IPaddr_193_27_40_54_attr_3 name=broadcast value=193.27.40.63/
		/attributes
	/instance_attributes
/primitive
primitive class=ocf id=IPaddr_192_168_163_54 provider=heartbeat type=IPaddr
	operations
		op id=IPaddr_192_168_163_54_mon interval=60s name=monitor timeout=60s/
	/operations
	instance_attributes id=IPaddr_192_168_163_54_inst_attr
		attributes
			nvpair id=IPaddr_192_168_163_54_attr_0 name=ip value=192.168.163.54/
			nvpair id=IPaddr_192_168_163_54_attr_1 name=cidr_netmask value=26/
			nvpair id=IPaddr_192_168_163_54_attr_3 name=broadcast value=192.168.163.63/
		/attributes
	/instance_attributes
/primitive
primitive class=lsb id=ubis_udbmain_3 provider=heartbeat type=ubis_udbmain

Re: [Linux-HA] Read only filesystem with Heart beat configuration

2010-07-14 Thread Nikita Michalko

Hi Jayesh !


Am Dienstag, 13. Juli 2010 12:43 schrieb jayesh shinde:
 HI ,

 Thanks for your reply.

 1) Which is the stable version of HA on RHEL 5.2  , Can you please give me
 link. 
Sorry - for RHEL not ( I'm using SLES10), but look at 
http://www.linux-ha.org/wiki/Download
and
http://clusterlabs.org/wiki/Install#From_Source

 2) Do you think  that ready only  is problem of HA   OR this is the 
 problem of EXT3 filesystem etc ?
I was facing to similar problem on AMD/64bit server but without XEN  SAN,
and it was ONLY the problem of  OS  HD's failure.

 3) How do I avoid the master-master 
 condition in case of n/w failover.
Maybe using+configuring a DRBD and Stonith?
Look on the mail list with the above threads ...

HTH

Nikita Michalko


 Regards
 Jayesh Shinde


 --- On Tue, 7/13/10, Nikita Michalko michalko.sys...@a-i-p.com wrote:

 From: Nikita Michalko michalko.sys...@a-i-p.com
 Subject: Re: [Linux-HA] Read only filesystem with Heart beat configuration
 To: General Linux-HA mailing list linux-ha@lists.linux-ha.org
 Date: Tuesday, July 13, 2010, 1:17 PM

 Hi,

 any chance to upgrade to the latest version of HA?
 2.1..4 is very old  buggy!



 HTH

 Nikita Michalko

 Am Dienstag, 13. Juli 2010 09:39 schrieb jayesh shinde:
  Hi ,
 
  Can any one please guide me for my below problem ?
 
  I am using this setup with IBM DS 8300 SAN. + HBA + multipathing.
 
  Your inputs will  be valuable for me.
 
  Regards
  Jayesh Shinde
 
  --- On Mon, 7/12/10, jayesh shinde jayesha_shi...@yahoo.com wrote:
 
  From: jayesh shinde jayesha_shi...@yahoo.com
  Subject: Read only filesystem with Heart beat configuration
  To: linux-ha@lists.linux-ha.org
  Date: Monday, July 12, 2010, 5:24 PM
 
  Dear all ,
 
  I am facing the problem of read only filesystem with Heart beat
  configuration.
 
  Here are my setup details :--
  =
 
  I have installed the 2 physical server RHEL 5.2 64 bit with Xen
  virtualization and IBM BS 8300 SAN. The XEN  kernel verison is
  2.6.18-92.el5xen. The File system is EXT3.
 
  I am using this below setup for a heavy mail server where the POP3 and
  IMAP traffic is very high ( 1.2 lack emails per day) , I am using cyrus
  and postfix and ldap
 
  Each physical server conatin XEN vitual OS. Outof this one is master and
  other slave for me.
 
  The physical server1  IP :-- 192.168.1.1
  The physical server2  IP :-- 192.168.1.2
 
  The Xen VM machine IP under physical server1  :-- 192.168.1.10 ( Master )
  The Xen VM machine IP under physical server2  :--
   192.168.1.20 ( Salve )
 
  The HA folting IP between 2 VM is 192.168.1.30
 
  Both the VM and physical machine communicate to each other via switch and
  not by cross cable.
 
  Inside the both Xen VM I am using the 4 SAN partition ( for email boxs),
  which is accessable and can be mount from 192.168.1.10  192.168.1.20.
 
  I am managing the mounting and unmounting of the SAN partition and
  stoping and starting the services vai script , which is mention in
  /etc/ha.d/haresources
 
  My problem :--
  
 
  1) When 192.168.1.10 is  master with flowting IP 192.168.1.30 , at that
  time every thing work proper. But some time  due to n/w problem  the
  flowting IP 192.168.1.30 get switch to slave server i.e 192.168.1.20. At
  that time while mouting the SAN partiton , the san parititon goes in
  readonly mode , and for correction this require to do fsck -y device
  deriver
 
  i.e failover causing the
   filesystem to go in read only mode.
 
  I observer that this read only problem not come everytime , but it happen
  it messup evrything.
 
  2) When both the server get lost there n/w connection , both assume that
  there respective slave/master  is down and they both server act as
  master-master  server and also mount the SAN partition on each VM server.
 
  So ,
  1) How to avoid the redaonly file system problem ?
  2) How to avoiod  master-master problem , in case of n/w failuer.
  3) What kindof precuosion I should take while doing  mounting and
  umounting SAN prartition. ?
 
  I tried Ipfail and ping in ha.cf but no luck.
 
  I am using Heartbeat version 2.1
 
  heartbeat-2.1.4-9.el5.x86_64.rpm
  heartbeat-pils-2.1.4-9.el5.x86_64.rpm
  heartbeat-stonith-2.1.4-9.el5.x86_64.rpm
 
 
  Here is my ha.cf  (same on both active passive server ):--
  ===
  cat
   /etc/ha.d/ha.cf
  debugfile /var/halogs/ha-debug
  logfile /var/halogs/ha-log
  logfacility local0
  keepalive 2
  deadtime 15
  warntime 10
  initdead 30
  udpport 694
  bcast   eth0
  auto_failback off
  node activeimap1
  node passiveimap1
  #ping 192.168.2.8
  #respawn hacluster /usr/lib/heartbeat/ipfail
  debug 0
 
  Here is my haresource (same on both active passive server )  :--
  
  cat /etc/ha.d/haresources
 
  activeimap1 IPaddr::192.168.1.30  ms4-services
 
  Here is my ms4service (same on both active passive server

Re: [Linux-HA] Read only filesystem with Heart beat configuration

2010-07-13 Thread Nikita Michalko

Hi,

any chance to upgrade to the latest version of HA?
2.1..4 is very old  buggy!



HTH

Nikita Michalko 


Am Dienstag, 13. Juli 2010 09:39 schrieb jayesh shinde:
 Hi ,

 Can any one please guide me for my below problem ?

 I am using this setup with IBM DS 8300 SAN. + HBA + multipathing.

 Your inputs will  be valuable for me.

 Regards
 Jayesh Shinde

 --- On Mon, 7/12/10, jayesh shinde jayesha_shi...@yahoo.com wrote:

 From: jayesh shinde jayesha_shi...@yahoo.com
 Subject: Read only filesystem with Heart beat configuration
 To: linux-ha@lists.linux-ha.org
 Date: Monday, July 12, 2010, 5:24 PM

 Dear all ,

 I am facing the problem of read only filesystem with Heart beat
 configuration.

 Here are my setup details :--
 =

 I have installed the 2 physical server RHEL 5.2 64 bit with Xen
 virtualization and IBM BS 8300 SAN. The XEN  kernel verison is
 2.6.18-92.el5xen. The File system is EXT3.

 I am using this below setup for a heavy mail server where the POP3 and IMAP
 traffic is very high ( 1.2 lack emails per day) , I am using cyrus and
 postfix and ldap

 Each physical server conatin XEN vitual OS. Outof this one is master and
 other slave for me.

 The physical server1  IP :-- 192.168.1.1
 The physical server2  IP :-- 192.168.1.2

 The Xen VM machine IP under physical server1  :-- 192.168.1.10 ( Master )
 The Xen VM machine IP under physical server2  :--
  192.168.1.20 ( Salve )

 The HA folting IP between 2 VM is 192.168.1.30

 Both the VM and physical machine communicate to each other via switch and
 not by cross cable.

 Inside the both Xen VM I am using the 4 SAN partition ( for email boxs),
 which is accessable and can be mount from 192.168.1.10  192.168.1.20.

 I am managing the mounting and unmounting of the SAN partition and stoping
 and starting the services vai script , which is mention in
 /etc/ha.d/haresources

 My problem :--
 

 1) When 192.168.1.10 is  master with flowting IP 192.168.1.30 , at that
 time every thing work proper. But some time  due to n/w problem  the
 flowting IP 192.168.1.30 get switch to slave server i.e 192.168.1.20. At
 that time while mouting the SAN partiton , the san parititon goes in
 readonly mode , and for correction this require to do fsck -y device
 deriver

 i.e failover causing the
  filesystem to go in read only mode.

 I observer that this read only problem not come everytime , but it happen
 it messup evrything.

 2) When both the server get lost there n/w connection , both assume that
 there respective slave/master  is down and they both server act as
 master-master  server and also mount the SAN partition on each VM server.

 So ,
 1) How to avoid the redaonly file system problem ?
 2) How to avoiod  master-master problem , in case of n/w failuer.
 3) What kindof precuosion I should take while doing  mounting and umounting
 SAN prartition. ?

 I tried Ipfail and ping in ha.cf but no luck.

 I am using Heartbeat version 2.1

 heartbeat-2.1.4-9.el5.x86_64.rpm
 heartbeat-pils-2.1.4-9.el5.x86_64.rpm
 heartbeat-stonith-2.1.4-9.el5.x86_64.rpm


 Here is my ha.cf  (same on both active passive server ):--
 ===
 cat
  /etc/ha.d/ha.cf
 debugfile /var/halogs/ha-debug
 logfile /var/halogs/ha-log
 logfacility local0
 keepalive 2
 deadtime 15
 warntime 10
 initdead 30
 udpport 694
 bcast   eth0
 auto_failback off
 node activeimap1
 node passiveimap1
 #ping 192.168.2.8
 #respawn hacluster /usr/lib/heartbeat/ipfail
 debug 0

 Here is my haresource (same on both active passive server )  :--
 
 cat /etc/ha.d/haresources

 activeimap1 IPaddr::192.168.1.30  ms4-services

 Here is my ms4service (same on both active passive server ) :--
 ==

 cat /etc/ha.d/resource.d/ms4-services

 #!/bin/bash
 set -x
 mylist=syslog
 crond
 ldap
 postfix
 saslauthd
 cyrus-imapd
 httpd
 



 stop() {

 for i in $mylist; do
     /etc/init.d/$i
  stop
     /etc/init.d/$i stop
     /etc/init.d/$i stop
   done

 /sbin/ifdown eth0:2
 /sbin/ifdown eth0:3
 /sbin/ifdown eth0:4
 /sbin/ifdown eth0:5

 if
  cd /tmp/
  /bin/mount | grep -e  on /usr/local  /dev/null
     then
     # Kill all processes open on filesystem
     /sbin/fuser -muk /usr/local
     /sbin/fuser -muk /imap
     /sbin/fuser -muk
  /imap1
  /sbin/fuser -muk /imap2

    sleep 3
    /bin/umount /dev/xvdb
     sleep 3
     /bin/umount /dev/xvdd
    sleep 3
     /bin/umount /dev/xvdc
    sleep 3
    
  /bin/umount /dev/xvdf

 fi



 }

 start() {

 sleep 150

 /bin/mount /dev/xvdf /usr/local
 sleep 3
 mount /dev/xvdc /imap1
 sleep 3
 mount /dev/xvdd /imap
 sleep 3
 mount /dev/xvdb /imap2
 sleep 3


 /sbin/ifup eth0:2
 /sbin/ifup eth0:3
 /sbin/ifup eth0:4
 /sbin/ifup eth0:5

Re: [Linux-HA] IPsrcaddr and IPaddr2

2010-07-02 Thread Nikita Michalko

Hi,Ilo!

I'd say - IIRC - you should configure netmask in your cib
(instance_attributes) for all IP adresses.
Sth. like :
nvpair id=IPaddr_192_168_1_2_attr_1 name=cidr_netmask value=24/


HTH

Nikita Michalko


Am Mittwoch, 30. Juni 2010 23:20 schrieb Ilo Lorusso:
 Hi everyone..


 I have a server where im using the following resources running which
 startup and are running fine.

 ClusterIP   (ocf::heartbeat:IPaddr2):   Started
 saamailin0p01.ipnetwork.co.za
 postfix (ocf::heartbeat:postfix):   Started
 saamailin0p01.ipnetwork.co.za

 for the clusterIP address I have assigned and ip address of
 57.24.98.55 which i said is working fine.Now what I want to add to
 this mix is IPsrcaddr, so and traffix that orginates from the server
 will leave with the ip address 57.24.98.55.

 I cant seem to get i working I get a whole bunch of errors in the
 halog .. below is a snippet and as much infomation I could provide as
 possible .. could someone try shed some light as to why the
 IPsrcaddr resource wont start up... Thanks... Regards  Ilo

 /\/\/\
 primitive ClusterIP ocf:heartbeat:IPaddr2 \
 params ip=57.24.98.55 cidr_netmask=27 \
 op monitor interval=7s
 primitive IPsrcaddr ocf:heartbeat:IPsrcaddr \
 params ipaddress=57.24.98.55
 primitive postfix ocf:heartbeat:postfix \
 op monitor interval=60s
 location cli-prefer-ClusterIP ClusterIP \
 rule $id=cli-prefer-rule-ClusterIP inf: #uname eq
 saamailin0p01.ipnetwork.co.za
 colocation postfix-with-ClusterIP inf: postfix ClusterIP
 order start-IPsrcaddr-after-postfix inf: postfix IPsrcaddr
 property $id=cib-bootstrap-options \
 dc-version=1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d782 \
 cluster-infrastructure=Heartbeat \
 stonith-enabled=false \
 no-quorum-policy=ignore
 rsc_defaults $id=rsc-options \
 resource-stickiness=100
 /\/\/\/\

 get these errors in crm_status

 /\/\/\/\\/\/\
 Failed actions:
 IPsrcaddr_start_0 (node=saamailin0s01.ipnetwork.co.za, call=11,
 rc=1, status=complete): unknown error
 IPsrcaddr_start_0 (node=saamailin0p01.ipnetwork.co.za, call=8,
 rc=1, status=complete): unknown error

 \/\/\/\/\/\/\/
 Below are errors I get from my ha-log..
 /\/\/\/\/\/\/\/
 Jun 30 23:08:34 SaaMailIN0p01.ipnetwork.co.za lrmd: [2187]: info: RA
 output: (IPsrcaddr:probe:stderr) ERROR: Cannot use default route w/o
 netmask [57.24.98.55]
 Jun 30 23:08:38 SaaMailIN0p01.ipnetwork.co.za lrmd: [2187]: info: RA
 output: (IPsrcaddr:start:stderr) ERROR: Cannot use default route w/o
 netmask [57.24.98.55]
 IPsrcaddr[2516]:2010/06/30_23:08:38 ERROR: command 'ip route
 replace  dev  src 57.24.98.55' failed
 Jun 30 23:08:39 SaaMailIN0p01.ipnetwork.co.za lrmd: [2187]: info: RA
 output: (IPsrcaddr:stop:stderr) ERROR: Cannot use default route w/o
 netmask [57.24.98.55]


 /\/\/\/\/

 Below is a snippet of : ip route show
 \//\/\/\/
 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast qlen
 1000 link/ether 00:0c:29:78:77:1e brd ff:ff:ff:ff:ff:ff
 inet 57.24.98.50/27 brd 57.24.98.63 scope global eth0
 inet 57.24.98.55/27 brd 57.24.98.63 scope global secondary eth0
 inet6 fe80::20c:29ff:fe78:771e/64 scope link
valid_lft forever preferred_lft forever

 /\//\\/\/\/
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] explain the difference between servers?

2010-05-31 Thread Nikita Michalko

Hi mike,

it seems  to be  no HA-problem anymore though, but:

Am Montag, 31. Mai 2010 01:29 schrieb mike:
 So I've got ldirector up and running just fine and providing ldap high
 availability to  2 backend real servers on port 389.

 Here is the output of netstat on both real servers:
 tcp0  0 0.0.0.0:389
 0.0.0.0:*   LISTEN
 tcp0  0 :::389

 :::*LISTEN

 So I used the same director server to create another highly available
 application jboss running on port 8080. If I look at the director server
 I see the output of ipvsadm shows both real servers alive and well.

 [r...@lvsuat1a ha.d]# ipvsadm
 IP Virtual Server version 1.2.1 (size=4096)
 Prot LocalAddress:Port Scheduler Flags
   - RemoteAddress:Port   Forward Weight ActiveConn InActConn
 TCP  esbuat1.vip.intranet.mydom lc
   - gasayul9300602.intranet.mydom Tunnel  1  0  0
   - gasayul9300601.intranet.mydom Tunnel  1  1  0

 Looks good so far. Now the problem is that I cannot telnet to the VIP on
 port 8080; I get connection refused. If I change the ldirectord.cf to
 listen on port 22, it works perfectly. So this would seem to indicate
 that I have things set up appropriately on the director server. So I
 started poking around on the backend real servers and netstat looks like
 this:

 [supp...@esbuat1b ~]$ netstat -an | grep 8080
 tcp0  0 172.28.185.13:8080
 0.0.0.0:*   LISTEN

- which process is running on this port - i.e.
 lsof -i :8080

 so comparing this to the netstat above that listens on port 389 I see
 that perhaps there is an entry missing, perhaps this:
 tcp0  0 :::8080

 :::*LISTEN

 So I don't claim to be a networking expert and so maybe I've missed
 something in my setup and this is why port 8080 is having issues. Can
 anyone provide me with any pointers or where to go to next? After
 getting the ldap servers working, I figured this would be easy but I'm
 struggling with this one.


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

HTH

Nikita Michalko
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HB Troubles

2010-05-06 Thread Nikita Michalko

Any chance to update at least to V. 2.1.4? 
2.1.3 is very old buggy!


Nikita Michalko

Am Mittwoch, 5. Mai 2010 01:05 schrieb Baird, Josh:
 Hi,



 I have a 2 node HB 2.1.3 cluster running on CentOS 5.  I just upgraded
 the passive node to CentOS 5.4, but the heartbeat packages did not
 change:



 heartbeat-stonith-2.1.3-3.el5.centos

 heartbeat-2.1.3-3.el5.centos

 heartbeat-pils-2.1.3-3.el5.centos



 Now, when I try to start HB on the node, it reports that it is starting,
 but the daemons never actually start:



 r...@fc-fmcln02:~$ service heartbeat start

 logd is already running

 Starting High-Availability services:

 2010/05/04_18:02:53 INFO:  Resource is stopped

 2010/05/04_18:02:53 INFO:  Resource is stopped

[  OK  ]

 r...@fc-fmcln02:~$ ps aux | grep heartbeat

 root  6117  0.0  0.0   3916   696 pts/0S+   18:02   0:00 grep
 heartbeat



 Logs say:



 May  4 18:02:53 fc-fmcln02 heartbeat: [6112]: info: Version 2 support:
 false

 May  4 18:02:53 fc-fmcln02 heartbeat: [6112]: WARN: logd is enabled but
 logfile/debugfile is still configured in ha.cf

 May  4 18:02:53 fc-fmcln02 heartbeat: [6112]: info:
 **

 May  4 18:02:53 fc-fmcln02 heartbeat: [6112]: info: Configuration
 validated. Starting heartbeat 2.1.3

 May  4 18:02:53 fc-fmcln02 heartbeat: [6113]: info: heartbeat: version
 2.1.3

 May  4 18:02:53 fc-fmcln02 heartbeat: [6113]: info: Heartbeat
 generation: 1208455483



 Running /usr/lib/heartbeat/heartbeat -d 1000 shows:





 heartbeat[6122]: 2010/05/04_18:04:00 ERROR: Cannot shmget for process
 status: Invalid argument

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(keepalive,1)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(deadtime,10)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(warntime,5)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(initdead,120)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(udpport,694)

 heartbeat: udpport setting must precede media statementsheartbeat[6122]:
 2010/05/04_18:04:00 debug: add_option(bcast,eth1)

 heartbeat[6122]: 2010/05/04_18:04:00 debug:
 add_option(auto_failback,off)

 heartbeat[6122]: 2010/05/04_18:04:00 debug:
 add_option(node,fc-fmcln01.corp.follett.com)

 heartbeat[6122]: 2010/05/04_18:04:00 debug:
 add_option(node,fc-fmcln02.corp.follett.com)

 heartbeat[6122]: 2010/05/04_18:04:00 info: respawn directive: hacluster
 /usr/lib/heartbeat/ipfail

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(use_logd,yes)

 heartbeat[6122]: 2010/05/04_18:04:00 info: Enabling logging daemon

 heartbeat[6122]: 2010/05/04_18:04:00 info: logfile and debug file are
 those specified in logd config file (default /etc/logd.cf)

 heartbeat[6122]: 2010/05/04_18:04:00 debug:
 add_option(logfile,/var/log/hb.log)

 heartbeat[6122]: 2010/05/04_18:04:00 debug:
 add_option(debugfile,/var/log/heartbeat-debug.log)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=hacluster, gid=null

 heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=hacluster, gid=null

 heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=null, gid=haclient

 heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=root, gid=null

 heartbeat[6122]: 2010/05/04_18:04:00 debug: uid=null, gid=haclient

 heartbeat[6122]: 2010/05/04_18:04:00 debug: Beginning authentication
 parsing

 heartbeat[6122]: 2010/05/04_18:04:00 debug: 16 max authentication
 methods

 heartbeat[6122]: 2010/05/04_18:04:00 debug: Keyfile opened

 heartbeat[6122]: 2010/05/04_18:04:00 debug: Keyfile perms OK

 heartbeat[6122]: 2010/05/04_18:04:00 debug: 16 max authentication
 methods

 heartbeat[6122]: 2010/05/04_18:04:00 debug: Found authentication method
 [sha1]

 heartbeat[6122]: 2010/05/04_18:04:00 info: AUTH: i=1: key = 0x8c52d78,
 auth=0x5c6228, authname=sha1

 heartbeat[6122]: 2010/05/04_18:04:00 debug: Outbound signing method is 1

 heartbeat[6122]: 2010/05/04_18:04:00 debug: Authentication parsing
 complete [1]

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(cluster,linux-ha)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(hopfudge,1)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(baud,19200)

 heartbeat: baudrate setting must precede media
 statementsheartbeat[6122]: 2010/05/04_18:04:00 debug:
 add_option(hbgenmethod,file)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(realtime,true)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(msgfmt,classic)

 heartbeat[6122]: 2010/05/04_18:04:00 debug:
 add_option(conn_logd_time,60)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(log_badpack,true)

 heartbeat[6122]: 2010/05/04_18:04:00 debug:
 add_option(syslogmsgfmt,false)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(coredumps,true)

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(crm,false)

 heartbeat[6122]: 2010/05/04_18:04:00 info: Version 2 support: false

 heartbeat[6122]: 2010/05/04_18:04:00 debug: add_option(autojoin,none

Re: [Linux-HA] HA Stats

2010-04-20 Thread Nikita Michalko

Hi, Hari,

did you already search on the General Linux-HA mailing list ? 
You find sth. about HA at : 
http://www.clusterlabs.org/wiki/Documentation
and at:
http://www.clusterlabs.org/doc 


HTH

Nikita Michalko

Am Montag, 19. April 2010 09:11 schrieb Hari:
 Hi,
   I recently joined this group. Iam working on a HA project for wireless
 controller.
  We are looking at different options. I just wanted to know how is the
 linux HA implemented and some statistical information ...
 1) How is the heartbeat implemented?
 2) What is the retry mechanism used?
 3) What are the different timer values used in this solution?
 4) What are the performance stats (like switchover time in different
 scenarios)?

 Please guide in this.
 Let me know if there are any documents/links related to this.

 Thanks  Regards,
 Srihari.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat unplug ethernet cable

2010-03-16 Thread Nikita Michalko

Hi,

 as already x-times mentioned, we don't have any crystall ball ;-)

- Version, configuration, logs ...???


 Nikita Michalko

Am Montag, 15. März 2010 10:01 schrieb Liang Xiao Zhu:
 Hi all,



 I done everything but heartbeat works only when i use service heartbeat
 stop, when i unplug the ethernet cable from node 1 doesnt work, what's
 wrong? i have to add something in the  http://ha.cf ha.cf?



 Thanks in advance
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] drbd with Linux-HA

2010-02-03 Thread Nikita Michalko

Hi Muhammad,

I know it will not  help you to solve the problem, but anyway: 
where did you install  SLES 10 SP 3 from ? I didn't find it ...

TIA
 
Nikita Michalko



Am Mittwoch, 3. Februar 2010 07:26 schrieb Muhammad Sharfuddin:
 OS: SLES 10 SP 3

 I am running a two node(node1, node2) active passive(standby) Oracle
 cluster via Linux-HA. Oracle is installed on /oracle, and /oracle is
 an 'ext3' filesystem on SAN/LUN.

 At any given time, either all of the resources(IP, Filesystem, and
 Oracle) are either on node1, or node2.

 Now to make a DR, I want to put/implement 'drbd', but in a way that both
 the cluster nodes(node1, and node2) remains mounting the same
 disk/device(SAN Disk), but there will be another machine(node3) which
 should not be the part of Linux-HA cluster, and will be the standby
 oracle machine having its own and separate disk.

 so my drbd configuration should be like /oracle on SAN(/dev/sdb1...
 mounted by either node1 or node2), and /oracle(/dev/sdc1) on node3 will
 be the drbd devices.

 is it possible ?
 any help/document/url will be highly appreciated

 Regards,
 --ms




 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover during server freeze

2009-11-30 Thread Nikita Michalko

Hi Lars,

 we have the same HA-version, without mysql  though. Could  you send me pls 
the forkbomb to test it on our cluster?

TIA!


Nikita Michalko


Am Donnerstag, 26. November 2009 14:37 schrieb Lars Johansen:
 Hi,

 I have a 2 cluster setup running heartbeat 2.1.3.

 Im running a active/passive setup, where I have mysql in master- master
 replication, stored on each nodes filesystem, and I have a DRBD drive with
 NFS.

 Pulling cables, shuttding down server etc. works very well.. the alive
 server takes over..

 However Ive tried to simulate a server freeze by create a forkbomb on the
 active server,  after doing that, I cannot log in with ssh, access nfs or
 mysql, ping works, and I guess since the kernel responds, heartbeat wont
 switch over.

 How can I setup heartbeat so it detects if a server is frozen?


 Greetings,

 LArs
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover during server freeze

2009-11-30 Thread Nikita Michalko

Hi Johan,

thanks for that - so easy can one disable the server ;-)


Nikita Michalko


Am Montag, 30. November 2009 14:11 schrieb Johan Verrept:
 On Mon, 2009-11-30 at 13:11 +0100, Nikita Michalko wrote:
  Hi Lars,
 
   we have the same HA-version, without mysql  though. Could  you send me
  pls the forkbomb to test it on our cluster?

 #!/bin/bash

 while (true); do bash $0; done;



 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] how to Install Oralce10gR2 in Linux HA Environment

2009-11-23 Thread Nikita Michalko

Hi Muhammad,

your problem is here a little bit out of topic though, but anyway: 
you must  change your listener address in the /etc/hosts and in listener.ora 
for use with HA so that it listen also on the common HA-IP-address , e.g.:
( address = ( protocol = tcp ) ( host = HA-VIP ) ( port = 1526)

HTH

Nikita Michalko


Am Samstag, 21. November 2009 13:25 schrieb Muhammad Sharfuddin:
 Hi Guys

 hearbeat package also provides an OCF resource for
 Oracle(/usr/lib/ocf/resource.d/heartbeat/oracle). I want to create a
 cluster of oracle.

 # cat /etc/hosts
 dbnode1 192.168.0.236  ## hostname and physical IP of server1
 dbnode2 192.168.0.238 ## hostname and physical IP of server2
 dbserver 192.168.0.245 ## virtual hostname and IP for cluster

 I ran the Oracle(oracle10gR2) installer on 'dbnode1', and install/place
 every thing(db, oracle binaries) on file system /oracle which is on SAN.
 After installation completes, I unmount the /oracle(SAN), and then mount
 /oracle on 'dbnode2', but oracle gives error and doesnot starts.

 To start Oracle on 'dbnode2', I have to change the hostname from dbnode2 
 to dbnode1.

 is there any specific method to install oracle(any special option I have to
 provide to the 'runInstaller' ) for Linux HA

 Regards





 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] CIB not supported?

2009-06-29 Thread Nikita Michalko

Ok, thank you both for infos!


Greetings


Nikita Michalko


Am Montag, 29. Juni 2009 14:42 schrieb Dejan Muhamedagic:
 Hi,

 On Fri, Jun 26, 2009 at 01:56:29PM +0200, Nikita Michalko wrote:
  Hi Michael,
 
  Am Donnerstag, 25. Juni 2009 18:40 schrieb Michael Hutchins:
   Ok, I got this one fixed, too. :)
 
  - I wonder how ? Can you pls be more specific?

 See below.

  Thanks!
 
  Nikita Michalko
 
   -Original Message-
   From: linux-ha-boun...@lists.linux-ha.org
   [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Michael
   Hutchins Sent: Thursday, June 25, 2009 8:49 AM
   To: General Linux-HA mailing list
   Subject: [Linux-HA] CIB not supported?
  
   So I am still mucking along trying to figure out this stuff. And as I
   am following along the how-to I previously linked to, I am at the crm
   part. I have crm up and running (thanks for the help all) and now I get
   this as soon as I get in the crm shell :
  
   crm(live)# configure
   ERROR: CIB not supported: validator 'transitional-0.6', release '3.0.1'
   ERROR: You may try the upgrade command

 crm configure upgrade (which is cibadmin --upgrade) should do the
 trick. The crm shell doesn't support old XML. The pacemaker 1.0
 can still work with the old Xml in a compatibility mode, but it's
 strongly recommended to upgrade as soon as you can.

 Thanks,

 Dejan

   I didn't get that when I didn't have crm on in my /etc/ha.d/ha.cf
   config file
  
   If I google it, it comes back with nothing meaning full.
  
  
  
  
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] CIB not supported?

2009-06-26 Thread Nikita Michalko

Hi Michael,

Am Donnerstag, 25. Juni 2009 18:40 schrieb Michael Hutchins:
 Ok, I got this one fixed, too. :)

- I wonder how ? Can you pls be more specific?

Thanks!

Nikita Michalko


 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Michael Hutchins
 Sent: Thursday, June 25, 2009 8:49 AM
 To: General Linux-HA mailing list
 Subject: [Linux-HA] CIB not supported?

 So I am still mucking along trying to figure out this stuff. And as I am
 following along the how-to I previously linked to, I am at the crm part. I
 have crm up and running (thanks for the help all) and now I get this as
 soon as I get in the crm shell :

 crm(live)# configure
 ERROR: CIB not supported: validator 'transitional-0.6', release '3.0.1'
 ERROR: You may try the upgrade command

 I didn't get that when I didn't have crm on in my /etc/ha.d/ha.cf config
 file

 If I google it, it comes back with nothing meaning full.




 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Please Help me

2009-06-19 Thread Nikita Michalko

 class=ocf type=IPaddr2
 provider=heartbeat

 instance_attributes id=resource_IP_5_instance_attrs

 attributes

 nvpair id=19cdeb1d-b7c4-4851-99b8-6c62a2a8de39 name=ip
 value=192.168.29.156/

 nvpair id=cd13c341-e4bc-43f6-90af-00f97e3a5800 name=nic value=eth0/

 nvpair id=06d597be-7df9-4546-ae5a-2a0ef088afbb name=cidr_netmask
 value=255.255.255.0/

 nvpair id=78eb447b-12b0-418d-a5ee-c484c24e959e name=mac
 value=00:01:02:03:04:05:06/

 nvpair id=5b417cd5-5a6f-4ba1-bb35-3db4dc839803 name=clusterip_hash
 value=sourceip-sourceport-destport/

 /attributes

 /instance_attributes

 meta_attributes id=resource_IP_5:0_meta_attrs

 attributes

 nvpair id=resource_IP_5:0_metaattr_target_role name=target_role
 value=started/

 /attributes

 /meta_attributes

 /primitive

 /clone

 /resources

 constraints

 rsc_location id=location_ rsc=IP_5

 rule id=prefered_location_ score=0

 expression attribute=#uname id=b958f92c-839a-4666-8698-e0c96d04719b
 operation=eq value=server148/

 /rule

 /rsc_location

 rsc_location id=location_2 rsc=IP_5

 rule id=prefered_location_2 score=0

 expression attribute=#uname id=540476fc-3a35-4dd8-87dc-3be82adb6592
 operation=eq value=server140/

 /rule

 /rsc_location

 rsc_colocation id=colocation_ from=IP_5 to=IP_5 score=INFINITY/

 rsc_order id=order_ from=IP_5 type=after to=IP_5/

 /constraints

 /configuration

 /cib

 [r...@server140 ~]# vi /var/lib/heartbeat/crm/cib.xml

 [r...@server140 ~]# cat /etc/ha.d/ha.cf

 #use_logd on

 #crm yes

 ## Allow to add dynamically a new node to the cluster

 ##autojoin any

 udpport 694

 bcast eth0

 auto_failback on

 mcast eth0 225.0.0.1 694 1 0

 node server140

 node server148

 crm on

 [r...@server140 ~]# r...@server140 ~]# cat /etc/ha.d/ha.cf

 auto_failback on

 mcast eth0 225.0.0.1 694 1 0

 node server140

 node server148

 crm on

 Please you help me

 2009/6/17 Nikita Michalko michalko.sys...@a-i-p.com

  Hi Bui Manh,
 
  Am Mittwoch, 17. Juni 2009 12:40 schrieb Bui Manh Nam:
   thank you very much
   I don't understand the following
   cib.xml file
 
  -  I don't understand what  you  didn't understand exactly: the whole
  cib.xml ? Please be more exact/informative ...
  Which OS do you use? Any logs? Which version of HA? If this is  2.1,
  then I
  strongly advise to upgrade at least to V.2.1.4!
 
   Nikita Michalko
 
  nvpair id=cib-bootstrap-options-dc_deadtime
   name=dc_deadtime val
   ue=0/
/attributes
  /cluster_property_set
/crm_config
nodes
  node uname=server148 type=normal
   id=a07fb162-9071-474c-9a9a-ea4b4ef
   526e7
instance_attributes
   id=nodes-a07fb162-9071-474c-9a9a-ea4b4ef526e7
  attributes
nvpair name=standby
   id=standby-a07fb162-9071-474c-9a9a-ea4b4ef5
   26e7 value=off/
  /attributes
/instance_attributes
  /node
  node uname=server140 type=normal
   id=c7f5251a-3bab-489f-a18f-c1a04ff
   a1591
instance_attributes
   id=nodes-c7f5251a-3bab-489f-a18f-c1a04ffa1591
  attributes
nvpair name=standby
   id=standby-c7f5251a-3bab-489f-a18f-c1a04ffa
   1591 value=off/
  /attributes
/instance_attributes
  /node
/nodes
resources
  clone id=IP_5
meta_attributes id=IP_5_meta_attrs
  attributes
nvpair id=IP_5_metaattr_target_role name=target_role
   value=st
   opped/
nvpair id=IP_5_metaattr_clone_max name=clone_max
   value=2/
nvpair id=IP_5_metaattr_clone_node_max
   name=clone_node_max val
   ue=2/
nvpair id=IP_5_metaattr_resource_stickiness
   name=resource_stick
   iness value=0/
  /attributes
/meta_attributes
primitive id=resource_IP_5 class=ocf type=IPaddr2
   provider=hear
   tbeat
  instance_attributes id=resource_IP_5_instance_attrs
attributes
  nvpair id=19cdeb1d-b7c4-4851-99b8-6c62a2a8de39
 
  name=ip
 
   value
   =192.168.29.156/
  nvpair id=cd13c341-e4bc-43f6-90af-00f97e3a5800
 
  name=nic
 
   valu
   e=eth0/
  nvpair id=06d597be-7df9-4546-ae5a-2a0ef088afbb
   name=cidr_netm
   ask value=255.255.255.0/
  nvpair id=78eb447b-12b0-418d-a5ee-c484c24e959e
 
  name=mac
 
   valu
   e=00:01:02:03:04:05:06/
  nvpair id=5b417cd5-5a6f-4ba1-bb35-3db4dc839803
   name=clusterip
   _hash value=sourceip-sourceport-destport/
/attributes
  /instance_attributes
  meta_attributes id=resource_IP_5:0_meta_attrs
attributes
  nvpair id=resource_IP_5:0_metaattr_target_role
   name=target_ro
   le value=started/
/attributes
  /meta_attributes
/primitive
  /clone
/resources
constraints
  rsc_location id=location_ rsc=IP_5
rule id

Re: [Linux-HA] 2.0.4 / question about Clone stonith Resource

2009-06-19 Thread Nikita Michalko

Hi Alain,

I use the customized version of external/ipmi with an extra configuration 
file (ipmi.cfg) containing the apropriate parameters for ipmitools - see 
attached files ;-)

HTH

Nikita Michalko


Am Dienstag, 16. Juni 2009 13:00 schrieb Alain.Moulle:
 Hi
 I read somewhere that it is possible to have different parameters
 dependant on
 the node where the clone si started , but with regard to GUI , I can't
 find how to do so .

 My goal :
 I would like to declare a stonith resource from type external/ipmi so
 that one instance
 is started on each node of the HA cluster, and will not be migrated in
 case of failover.
 But there are 5 parameters with external/ipmi resource, and notably the
 ipmi adress
 of the adjacent node we would like to be fenced ... so this adress is
 different for
 each node, and moreover what about a cluster with let's say 4 nodes ? Do
 we have
 to declare 3 different stonith resources so that any node could fence
 aany other one
 in the HA cluster ?

 Thanks
 Regards
 Alain
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


ipmi.cfg
Description: application/shellscript


ipmi
Description: application/shellscript
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] cannot start heartbeat resource stopped

2009-06-19 Thread Nikita Michalko

Hi Miguel,

as already x-times mentioned here on the list: PLEASE upgrade ASAP to  at 
least  V.2.1.4 or V2.99! V2.0.8 is buggy!

HTH

Nikita Michalko


Am Donnerstag, 18. Juni 2009 16:03 schrieb Miguel Olivares:
 Hi everybody,

 I try to configure heartbeat but without success becase when i try to
 start heartbeat i cannot get the virtual IP, and i got a message
 Resources is stopped  i followed differers procedures in order to find
 a mistake or something and i looked on the internet. But i don't why i
 have this message , because in the log files everything seems ok, even
 when y try to stop  it doesn't works.

 [r...@sun2 ~]# rpm -qa |grep heartbeat
 heartbeat-2.0.8-2.el4.centos
 heartbeat-gui-2.0.8-2.el4.centos
 heartbeat-pils-2.0.8-2.el4.centos
 heartbeat-stonith-2.0.8-2.el4.centos

 [r...@sun2 ~]# uname -a
 Linux sun2 2.6.9-55.ELsmp #1 SMP Fri Apr 20 17:03:35 EDT 2007 i686
 athlon i386 GNU/Linux

  does anybody help me?

 Thanks

 [r...@sun2 ~]# /etc/init.d/heartbeat start
 logd is already running
 Starting High-Availability services:
 2009/06/18_15:42:35 INFO:  Resource is stopped   [  OK  ]

 [r...@sun2 ~]# /etc/init.d/heartbeat stop
 Stopping High-Availability services:

 [ha-log]

 heartbeat[6142]: 2009/06/18_15:42:35 info: Configuration validated.
 Starting heartbeat 2.0.8
 heartbeat[6143]: 2009/06/18_15:42:35 info: heartbeat: version 2.0.8
 heartbeat[6143]: 2009/06/18_15:42:35 info: Heartbeat generation: 19
 heartbeat[6143]: 2009/06/18_15:42:35 info: G_main_add_TriggerHandler:
 Added signal manual handler
 heartbeat[6143]: 2009/06/18_15:42:35 info: G_main_add_TriggerHandler:
 Added signal manual handler
 heartbeat[6143]: 2009/06/18_15:42:35 info: Removing
 /var/run/heartbeat/rsctmp failed, recreating.
 heartbeat[6143]: 2009/06/18_15:42:35 info: glib: UDP Broadcast heartbeat
 started on port 694 (694) interface eth0
 heartbeat[6143]: 2009/06/18_15:42:35 info: glib: UDP Broadcast heartbeat
 closed on port 694 interface eth0 - Status: 1
 heartbeat[6143]: 2009/06/18_15:42:35 info: glib: ping heartbeat started.
 heartbeat[6143]: 2009/06/18_15:42:35 info: G_main_add_SignalHandler:
 Added signal handler for signal 17
 heartbeat[6143]: 2009/06/18_15:42:35 info: Local status now set to: 'up'
 heartbeat[6143]: 2009/06/18_15:42:37 info: Link
 192.168.1.98:192.168.1.98 up.
 heartbeat[6143]: 2009/06/18_15:42:37 info: Status update for node
 192.168.1.98: status ping
 heartbeat[6143]: 2009/06/18_15:42:37 info: Comm_now_up(): updating
 status to active
 heartbeat[6143]: 2009/06/18_15:42:37 info: Local status now set to:
 'active' heartbeat[6143]: 2009/06/18_15:52:50 WARN: Shutdown delayed until
 current resource activity finishes.

 My configuration files:

 [ha.cf]
 debugfile /var/log/ha-debug
 logfile /var/log/ha-log
 keepalive 2
 deadtime 30
 warntime 10
 initdead 120
 udpport 694
 bcast   eth0
 auto_failback off
 node sun2
 ping 192.168.1.98

 [haresources]
 sun2 192.168.1.249

 [authkeys]
 auth 1
 1 crc
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Please Help me

2009-06-18 Thread Nikita Michalko

Hi Bui Manh,


Am Mittwoch, 17. Juni 2009 12:40 schrieb Bui Manh Nam:
 thank you very much
 I don't understand the following
 cib.xml file

-  I don't understand what  you  didn't understand exactly: the whole 
cib.xml ? Please be more exact/informative ...
Which OS do you use? Any logs? Which version of HA? If this is  2.1, then I 
strongly advise to upgrade at least to V.2.1.4!

 Nikita Michalko


nvpair id=cib-bootstrap-options-dc_deadtime
 name=dc_deadtime val
 ue=0/
  /attributes
/cluster_property_set
  /crm_config
  nodes
node uname=server148 type=normal
 id=a07fb162-9071-474c-9a9a-ea4b4ef
 526e7
  instance_attributes
 id=nodes-a07fb162-9071-474c-9a9a-ea4b4ef526e7
attributes
  nvpair name=standby
 id=standby-a07fb162-9071-474c-9a9a-ea4b4ef5
 26e7 value=off/
/attributes
  /instance_attributes
/node
node uname=server140 type=normal
 id=c7f5251a-3bab-489f-a18f-c1a04ff
 a1591
  instance_attributes
 id=nodes-c7f5251a-3bab-489f-a18f-c1a04ffa1591
attributes
  nvpair name=standby
 id=standby-c7f5251a-3bab-489f-a18f-c1a04ffa
 1591 value=off/
/attributes
  /instance_attributes
/node
  /nodes
  resources
clone id=IP_5
  meta_attributes id=IP_5_meta_attrs
attributes
  nvpair id=IP_5_metaattr_target_role name=target_role
 value=st
 opped/
  nvpair id=IP_5_metaattr_clone_max name=clone_max
 value=2/
  nvpair id=IP_5_metaattr_clone_node_max
 name=clone_node_max val
 ue=2/
  nvpair id=IP_5_metaattr_resource_stickiness
 name=resource_stick
 iness value=0/
/attributes
  /meta_attributes
  primitive id=resource_IP_5 class=ocf type=IPaddr2
 provider=hear
 tbeat
instance_attributes id=resource_IP_5_instance_attrs
  attributes
nvpair id=19cdeb1d-b7c4-4851-99b8-6c62a2a8de39 name=ip
 value
 =192.168.29.156/
nvpair id=cd13c341-e4bc-43f6-90af-00f97e3a5800 name=nic
 valu
 e=eth0/
nvpair id=06d597be-7df9-4546-ae5a-2a0ef088afbb
 name=cidr_netm
 ask value=255.255.255.0/
nvpair id=78eb447b-12b0-418d-a5ee-c484c24e959e name=mac
 valu
 e=00:01:02:03:04:05:06/
nvpair id=5b417cd5-5a6f-4ba1-bb35-3db4dc839803
 name=clusterip
 _hash value=sourceip-sourceport-destport/
  /attributes
/instance_attributes
meta_attributes id=resource_IP_5:0_meta_attrs
  attributes
nvpair id=resource_IP_5:0_metaattr_target_role
 name=target_ro
 le value=started/
  /attributes
/meta_attributes
  /primitive
/clone
  /resources
  constraints
rsc_location id=location_ rsc=IP_5
  rule id=prefered_location_ score=0
expression attribute=#uname
 id=b958f92c-839a-4666-8698-e0c96d0471
 9b operation=eq value=server148/
  /rule
/rsc_location
rsc_location id=location_2 rsc=IP_5
  rule id=prefered_location_2 score=0
expression attribute=#uname
 id=540476fc-3a35-4dd8-87dc-3be82adb65
 92 operation=eq value=server140/
  /rule
/rsc_location
rsc_colocation id=colocation_ from=IP_5 to=IP_5
 score=INFINITY/
rsc_order id=order_ from=IP_5 type=after to=IP_5/
  /constraints
/configuration
  /cib
 [r...@server140 ~]# cat /var/lib/heartbeat/crm/cib.xml
  cib admin_epoch=0 have_quorum=true ignore_dtd=false num_peers=2
 cib_fe
 ature_revision=2.0 generated=true num_updates=1 ccm_transition=2
 dc_uuid
 =a07fb162-9071-474c-9a9a-ea4b4ef526e7 epoch=1306 cib-last-written=Mon
 Jun
 1
 5 12:56:56 2009
configuration
  crm_config
cluster_property_set id=cib-bootstrap-options
  attributes
nvpair id=cib-bootstrap-options-dc-version name=dc-version
 value
 =2.1.3-node: 552305612591183b1628baa5bc6e903e0f1e26a3/
nvpair id=cib-bootstrap-options-stonith-enabled
 name=stonith-enab
 led value=true/
nvpair id=cib-bootstrap-options-no-quorum-policy
 name=no-quorum-p
 olicy value=stop/
nvpair name=last-lrm-refresh
 id=cib-bootstrap-options-last-lrm-re
 fresh value=1245040474/
nvpair id=cib-bootstrap-options-default-resource-stickiness
 name=
 default-resource-stickiness value=INFINYTY/
nvpair id=cib-bootstrap-options-dc_deadtime
 name=dc_deadtime val
 ue=0/
  /attributes
/cluster_property_set
  /crm_config
  nodes
node uname=server148 type=normal
 id=a07fb162-9071-474c-9a9a-ea4b4ef
 526e7
  instance_attributes
 id=nodes-a07fb162-9071-474c-9a9a-ea4b4ef526e7
attributes
  nvpair name=standby
 id=standby-a07fb162-9071-474c-9a9a-ea4b4ef5
 26e7 value=off/
/attributes
  /instance_attributes
/node
node uname=server140 type

Re: [Linux-HA] Please Help me

2009-06-17 Thread Nikita Michalko

Hi bui manh,

Am Mittwoch, 17. Juni 2009 05:31 schrieb nambuim...@vccorp.vn:
 Hi all
 my name is bui manh nam, i have a problem as the following:
 I have already installed heartbeat* on server140 which ip address is
 192.168.29.140, other server148 with the address  192.168.29.148 VIP:
 192.168.29.156
 on server140: 192.168.29.140
 create file eth0:0

 DEVICE=eth0:0
 BOOTPROTO=static
 HWADDR=01:02:03:04:05:06
 IPADDR=192.168.29.156
 NETMASK=255.255.255.0
 ONBOOT=yes

- you don't need to  configure file eth0:0, that does heartbeat for you 
automatically, if correct configured ;-)


 #wget -O /etc/yum.repos.d/CentOS-Testing.repo
 http://dev.centos.org/centos/5/CentOS-Testing.repo #yum --enablerepo
 c5-testing install iptables-mod-CLUSTERIP
 #modprobe ipt_conntrack
 #modprobe ipt_CLUSTERIP
 #iptables -A INPUT -p tcp -d 192.168.29.156 -j CLUSTERIP --new --hashmode
 sourceip --clustermac 01:02:03:04:05:06 --total-nodes 2 --local-node 2
 --hashmode sourceip #service iptables save
 #yum install lighttpd -y
 '==
 I have tested
  it run ok on both servers of the subnet 192.168.29.0/24but when I test in
 other subnet, it is not run. I use other server with the IP address is
 192.168.30.107, I can not ping to 192.168.29.156 and the access to
 http://192.168.29.156/  is impossible. I ping 192.168.29.140 and
 192,168.29.148 ok and http://192.168.29.140/ http://192.168.29.148/ access
 web ok. please help me!
 I'm hearing from you soon!
 many thanks!
- it would help: version HA, configuration (at least cib.xml and ha.cf) ...


Regards 

Nikita Michalko 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] getting started with ha2: 404

2009-06-05 Thread Nikita Michalko

You are right, Malte - there are still running old clusters in 
production ... ;-)

Nikita Michalko

Am Freitag, 5. Juni 2009 09:26 schrieb Malte Geierhos:
 hi

  No, we would recommend you to advance to version 3 or pacemaker as as the
  central component is called now. See: www.clusterlabs.org and doc there.

 Besides your recommendation i personally think that it's a bad idea to
 remove all the documentation about previous versions!
 please undo that.
 What about a small box on top of every page content telling you that
 docu for newer versions can now
 be found at clusterlabs.org  ...

 Specially because you tell everyone  on clusterlabs.org :

 Welcome to the home of Pacemaker - The scalable High-Availability
 cluster resource manager formerly part of Heartbeat.

 Pacemaker makes use of your cluster infrastructure (either OpenAIS or
 Heartbeat) to stop, start and monitor the health of the services (aka.
 resources) you want the cluster to provide. 

 A PART of Heartbeat - so why remove the COMPLETE V2 DOCUMENTATION
 as you wrote Pacemaker makes use of a Cluster Infrastructure wether it
 is openais or heartbeat.
 So there are folks who need to have access to examples and documentation.

 kind regards,
 Malte Geierhos

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat 2.1.4 and 2.9.9 together?

2009-05-05 Thread Nikita Michalko

Hi Andrew,

what is the difference, if I go for crm cluster with 0.6?

TIA
Nikita Michalko 

Am Montag, 4. Mai 2009 08:58 schrieb Andrew Beekhof:
 haresources clusters should be fine.
 for crm clusters it depends if you go for 1.0 or 0.6

 On Fri, May 1, 2009 at 10:32 PM, Mike Sweetser - Adhost

 mik...@adhost.com wrote:
  Hello:
 
  I'm looking to migrate an existing Heartbeat 2.1.4 installation to
  2.9.9.  Would it be possible to upgrade the servers one at a time, which
  would require running one server with 2.1.4 and one server with 2.9.9
  for a short period?  Would there be any incompatibility issues in doing
  so?
 
  Thank You,
  Mike Sweetser
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] which action first

2009-02-05 Thread Nikita Michalko

Hi,

Am Donnerstag, 5. Februar 2009 05:28 schrieb lakshmipadmaja maddali:
 Hi All,

 I want to know that when heartbeat is started, which function does it
 calls first?
 Is it monitor or start?

-  it is  monitor -  look on the ha-log ...


 Waiting for reply,

 Regards,
 Padmaja
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

Regards

Nikita Michalko
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Resource restarting constantly while setting up heartbeat2

2009-01-30 Thread Nikita Michalko

Hi akshat,

 it is quite usefull on this list to send usually logs, configuration files 
etc -  hb_report !

Regards

Nikita Michalko


Am Donnerstag, 29. Januar 2009 18:28 schrieb akshat kansal:
 Hi all,



 I am facing a issue while setting up heartbeat version 2.0 using cib.xml



 *Issue: The hearbeat resources are starting and stopping constantly and
 becoming stable after a certain amount of time.*

 * *

 *Please help me out with this issue
  *

 *Regards*

 *Akshat*
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Resource restart after switching to stand-by mode

2008-12-16 Thread Nikita Michalko

Hi Jakub,

 I dont know too much about Stonith/Ipmi, but I noticed the following in your 
cib.xml:
...
  nvpair id=ipmistonith0_userid name=userid value=stoop/
...  nvpair id=ipmistonith1_userid name=userid value=stoop/
- look at this :   
^^^


HTH

Nikita Michalko


Am Dienstag, 16. Dezember 2008 13:58 schrieb Jakub Kuźniar:
 Thank you very much for help

  So the problem is not that they migrated, but that this caused a
  restart of the resources already on the second node?

 Yes, that's right. There is an unnecessary restart of resources on second
 node.

  If so, add the interleave meta attribute to the clones and set it to
  true.

 I have added this meta attribute for clone resources Xenconfig_cloneset and
 Xendata_clonest. I also attached a new version of CIB with this attribute
 added. But there was no change. Still resources one the second node are
 restarted whenever first node is switched into stand-by mode. I have erased
 CIB content and added resources configuration again. Still the same result.


 Jakub

 On Tuesday 16 of December 2008 11:48:07 Andrew Beekhof wrote:
  On Tue, Dec 16, 2008 at 00:33, Jakub Kuźniar jakub.kuzn...@s4.pl wrote:
   Hi everybody,
  
   I have recently updated heartbeat 2.0.8 to heartbeat 2.1.4. I'am
   running two node Xen cluster using OCFS2. With heartbeat 2.0.8 my
   configuration worked fine, but after upgrade strange things started to
   happen. When one of the node was switched to stand-by mode the virtual
   machines were migrated (live) to the second node, forcing the restart
   of the virtual machines running on second node.
 
  So the problem is not that they migrated, but that this caused a
  restart of the resources already on the second node?
 
  If so, add the interleave meta attribute to the clones and set it to
  true.
 
   When the first node was then switched back online, the failed over VMs,
   where migrated back to first node, once again causing the restart of
   VMs running on the second node. This behaviour seems strange to me. I'
   am probably making some mistake in the configuration. I would be very
   grateful for any help. I attach also configuration part of cib and
   ha.cf file.
  
   Thank you for response
  
   Jakub
  
  
  
  
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to configure network timeout

2008-12-15 Thread Nikita Michalko

Hi Christian,

have a  look at  operations in cib.xml - you can set there for example:
...
op id=IPaddrX_mon interval=600s name=monitor timeout=600s /


HTH

regards

Nikita Michalko

Am Montag, 15. Dezember 2008 13:57 schrieb Christian Ratzlaff:
 Hi everyone,

 we had a network-problem last week. There were small network outages and
 both HA-Clusters went crazy... This was not necessary as I see it. There
 must be a way to set the timeout to about 5 minutes globally. I just found
 the timeout for starting resources but this is not the same, right? I want
 to make it possible that there is a timeout within the heartbeat itself.

 Where can I configure it?

 kind 

 Christian

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HA gets stuck at: Stopping high available services...

2008-12-12 Thread Nikita Michalko

Hi Mario,

 what platform/version on ?

Nikita Michalko


Am Dienstag, 9. Dezember 2008 10:48 schrieb Darren Mansell:
 On Tue, 2008-12-09 at 08:53 +0100, m...@bortal.de wrote:
  Hello List,
 
  currently my 2nd node is down for hardware maintance.  Now i tried to
  reboot my 1st node but it gets stuck at:
  Stopping high available services... ( i only waited 5mins, buts thats
  too long anyway)
 
  Here is my config file:
  
 
  logfile /var/log/ha-log
  debugfile /var/log/ha-debug
  keepalive 500ms
  deadtime 5
  warntime 3
  initdead 20
  ucast eth2 10.0.0.1
  ucast eth2 10.0.0.2
  ucast eth0 10.11.12.1
  ucast eth0 10.11.12.2
  auto_failback off
  node node01
  node node02
  debug 1
 
 
  Any idea why its getting stuck? Is it trying to resolve something?
 
  Thanks,
  Mario

 Mine does it a lot too. I have to kill -9 the heartbeat master control
 process.

 Darren
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat 2.1.3 avoid resource failover

2008-11-03 Thread Nikita Michalko

Hi  Fulvio,

 I don't see any attachment on your mail ... Forgotten ?


 Nikita Michalko


Am Montag, 3. November 2008 11:15 schrieb fulvio fabiani:
 Hi all,
 we have a clustered installation of Heartbeat 2.1.3 that manages Apache /
 VIP resource in Active / Standby configuration.

 What we observe at machine reboot is the Failover of Apache / VIP resource
 group to a preferred destination (node_02). How can we avoid this
 behavior and let the resource to don't operate auto failback operation?

 We already configure auto_failback=off option on ha.cf and try with
 default_resource_stickiness param in cib.xml but the behavior is the same

 In attach the used cib.xml

 Thanks a lot
 Fulvio Fabiani
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] What to do when file-system becomes read-only due to harddisk error and the heartbeat still alive?

2008-11-03 Thread Nikita Michalko

Hi Michele,

 thank you for you answer, but:

 I've tried also the second form of mount command  - i.e: 
mount -n -o remount -t reiserfs /dev/sda1 /
   without succes - I became only error message: 
 Input/output error
There was an SCSI-error in dmesg and it was not possible to read or write
anything on the system then ...


Regards

Nikita Michalko 


Am Samstag, 1. November 2008 15:39 schrieb Michele Mase':
 Hoping this should be useful ...
 If root filesystem is readonly, the option
 mount -o rw,remount / shouldn't work because /etc is readonly too and the
 mtab shouldn't be written.
 According to http://www.unixguide.net/linux/faq/04.15.shtml (and the man
 mount too)

 Root File System Is Read-OnlyRemount it. If /etc/fstab is correct, you can
 simply type:

 mount -n -o remount /

 If /etc/fstab is wrong, you must give the device name and possibly the
 type, too: e.g.

 mount -n -o remount -t ext2 /dev/hda2 /

 To understand how you got into this state, see, (EXT2-fs: warning:
 mounting unchecked file system.)

 have you tried it in this manner?

 Michele


 On Fri, Oct 31, 2008 at 11:48 AM, Nikita Michalko
 [EMAIL PROTECTED]

  wrote:
 
  Hi Jonas,
 
  Am Donnerstag, 30. Oktober 2008 11:07 schrieb Jonas Andradas:
   Hello,
  
   have you tried remounting the file system in read-write mode? 
   Something like:
  
   mount -o rw,remount /
  
   ¿Forcing a node switch also fails, due to the read-only state?
 
   -  we have tried almost everything - but with no succes. There was
  impossible
  to run anything (not shutdown, reboot -NOTHING!) - almost only with
  error: Input/Output error
 
  Regards
 
 
  Nikita Michalko
 
   On Mon, Oct 13, 2008 at 03:31, Teh Tze Siong [EMAIL PROTECTED] wrote:
I have been playing with HA+DRBD on 2 servers and each with one hard
 
  disk
 
installed. The server is hosting the database for other application
servers.
   
   
   
Recently the application has failed and problem lies in the database
server,
the primary node. I've checked the server, network still up, I still
 
  can
 
read the files and access to database but I cannot update anything
and not even allowed to run a shutdown -h now or turn off the
network interface - The file system has become read-only mode and
heartbeat is still alive. I have no choice but to request the
datacenter guy to
 
  unplug
 
the network cables at the primary node, wait until I confirmed
 
  secondary
 
node has taken over, then only power down the primary node.
   
   
   
HA+DRBD works well for network failure or power failure on primary
node but when it comes to this half-dead situation, it is driving
me
 
  crazy.
 
As the server is installed in remote datacenter, is there a better
way for me to counter act this half-dead situation?
   
   
   
   
   
Thanks,
   
Zev
   
   
   
Disclaimer
 
  -
 
   --- This e-mail and any files transmitted with
 
  it
 
are intended only for the use of the addressee. If it contains
confidential information, the addressee
is expressly prohibited from distributing the e-mail and legal action
 
  may
 
be
taken against you. Access by any other persons to this e-mail is
unauthorized.
If you are not the intended recipient, any disclosure, copying, use,
or distribution is prohibited and may be unlawful. This e-mail and
any
 
  files
 
and/or
attachments, shall not constitute binding legal obligation with the
company.
The company shall also not be responsible for any computer virus
transmitted
with this e-mail.
 
  -
 
   --- MCSB Systems (M) Berhad ([EMAIL PROTECTED])
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] What to do when file-system becomes read-only due to harddisk error and the heartbeat still alive?

2008-10-31 Thread Nikita Michalko

Hi Jonas,


Am Donnerstag, 30. Oktober 2008 11:07 schrieb Jonas Andradas:
 Hello,

 have you tried remounting the file system in read-write mode?  Something
 like:

 mount -o rw,remount /

 ¿Forcing a node switch also fails, due to the read-only state?

 -  we have tried almost everything - but with no succes. There was impossible 
to run anything (not shutdown, reboot -NOTHING!) - almost only with error: 
Input/Output error

Regards


Nikita Michalko

 On Mon, Oct 13, 2008 at 03:31, Teh Tze Siong [EMAIL PROTECTED] wrote:
  I have been playing with HA+DRBD on 2 servers and each with one hard disk
  installed. The server is hosting the database for other application
  servers.
 
 
 
  Recently the application has failed and problem lies in the database
  server,
  the primary node. I've checked the server, network still up, I still can
  read the files and access to database but I cannot update anything and
  not even allowed to run a shutdown -h now or turn off the network
  interface - The file system has become read-only mode and heartbeat is
  still alive. I have no choice but to request the datacenter guy to unplug
  the network cables at the primary node, wait until I confirmed secondary
  node has taken over, then only power down the primary node.
 
 
 
  HA+DRBD works well for network failure or power failure on primary node
  but when it comes to this half-dead situation, it is driving me crazy.
 
 
 
  As the server is installed in remote datacenter, is there a better way
  for me to counter act this half-dead situation?
 
 
 
 
 
  Thanks,
 
  Zev
 
 
 
  Disclaimer
 
  -
 --- This e-mail and any files transmitted with it
  are intended only for the use of the addressee. If it contains
  confidential information, the addressee
  is expressly prohibited from distributing the e-mail and legal action may
  be
  taken against you. Access by any other persons to this e-mail is
  unauthorized.
  If you are not the intended recipient, any disclosure, copying, use, or
  distribution is prohibited and may be unlawful. This e-mail and any files
  and/or
  attachments, shall not constitute binding legal obligation with the
  company.
  The company shall also not be responsible for any computer virus
  transmitted
  with this e-mail.
 
  -
 --- MCSB Systems (M) Berhad ([EMAIL PROTECTED])
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] What to do when file-system becomes read-only due to harddisk error and the heartbeat still alive?

2008-10-30 Thread Nikita Michalko

Hi Teh,

 we had some similar error: 
all FS on the system HD were set to read-only due to SCSI error, HA was still 
alive, but no action possible - no logs etc. due to read-only ...
Only possibility was to turn the server off and on.
In this situation is also Stonith unusable ;-)



Best regards 

Nikita Michalko


Am Montag, 13. Oktober 2008 04:31 schrieb Teh Tze Siong:
 I have been playing with HA+DRBD on 2 servers and each with one hard disk
 installed. The server is hosting the database for other application
 servers.



 Recently the application has failed and problem lies in the database
 server, the primary node. I've checked the server, network still up, I
 still can read the files and access to database but I cannot update
 anything and not even allowed to run a shutdown -h now or turn off the
 network interface - The file system has become read-only mode and heartbeat
 is still alive. I have no choice but to request the datacenter guy to
 unplug the network cables at the primary node, wait until I confirmed
 secondary node has taken over, then only power down the primary node.



 HA+DRBD works well for network failure or power failure on primary node but
 when it comes to this half-dead situation, it is driving me crazy.



 As the server is installed in remote datacenter, is there a better way for
 me to counter act this half-dead situation?





 Thanks,

 Zev



 Disclaimer
 ---
- This e-mail and any files transmitted with it are
 intended only for the use of the addressee. If it contains confidential
 information, the addressee is expressly prohibited from distributing the
 e-mail and legal action may be taken against you. Access by any other
 persons to this e-mail is unauthorized. If you are not the intended
 recipient, any disclosure, copying, use, or distribution is prohibited and
 may be unlawful. This e-mail and any files and/or attachments, shall not
 constitute binding legal obligation with the company. The company shall
 also not be responsible for any computer virus transmitted with this
 e-mail.
 ---
- MCSB Systems (M) Berhad ([EMAIL PROTECTED])
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] What to do when file-system becomes read-only due to harddisk error and the heartbeat still alive?

2008-10-30 Thread Nikita Michalko


Hi Dejan,

 it was a long time ago already and I can't find the logs and configuration 
files of my colleague (no more here) any more , but I think it was something 
on the MB of server (IPMI ?). Sorry - can't say exactly ;-(

Regards

Nikita Michalko 


Am Donnerstag, 30. Oktober 2008 13:53 schrieb Dejan Muhamedagic:
 Hi,

 On Thu, Oct 30, 2008 at 10:58:54AM +0100, Nikita Michalko wrote:
  Hi Teh,
 
   we had some similar error:
  all FS on the system HD were set to read-only due to SCSI error, HA was
  still alive, but no action possible - no logs etc. due to read-only ...
  Only possibility was to turn the server off and on.
  In this situation is also Stonith unusable ;-)

 Hmm, what kind of stonith device do you have? Unless it's a
 stonith device which depends on the operating system and the
 other node is fine, it should work.

 Thanks,

 Dejan

  Best regards
 
  Nikita Michalko
 
  Am Montag, 13. Oktober 2008 04:31 schrieb Teh Tze Siong:
   I have been playing with HA+DRBD on 2 servers and each with one hard
   disk installed. The server is hosting the database for other
   application servers.
  
  
  
   Recently the application has failed and problem lies in the database
   server, the primary node. I've checked the server, network still up, I
   still can read the files and access to database but I cannot update
   anything and not even allowed to run a shutdown -h now or turn off
   the network interface - The file system has become read-only mode and
   heartbeat is still alive. I have no choice but to request the
   datacenter guy to unplug the network cables at the primary node, wait
   until I confirmed secondary node has taken over, then only power down
   the primary node.
  
  
  
   HA+DRBD works well for network failure or power failure on primary node
   but when it comes to this half-dead situation, it is driving me
   crazy.
  
  
  
   As the server is installed in remote datacenter, is there a better way
   for me to counter act this half-dead situation?
  
  
  
  
  
   Thanks,
  
   Zev
  
  
  
   Disclaimer
   ---
   - This e-mail and any files transmitted
   with it are intended only for the use of the addressee. If it contains
   confidential information, the addressee is expressly prohibited from
   distributing the e-mail and legal action may be taken against you.
   Access by any other persons to this e-mail is unauthorized. If you are
   not the intended recipient, any disclosure, copying, use, or
   distribution is prohibited and may be unlawful. This e-mail and any
   files and/or attachments, shall not constitute binding legal obligation
   with the company. The company shall also not be responsible for any
   computer virus transmitted with this e-mail.
   ---
   - MCSB Systems (M) Berhad ([EMAIL PROTECTED])
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Weird issue with Heartbeat

2008-10-13 Thread Nikita Michalko

Hi Daren,

which version of HA do you use ? 

Am Montag, 13. Oktober 2008 12:30 schrieb Daren Tay:
 Hi guys,



 I have heartbeat running on 2 machines, say db1 (master) and db2.



 I realize the VIP is activated on both sides at the same time!

 And to recreate the problem, I just have to down heartbeat on db1, with db2
 taking over the VIP.



 Then when I up db1, the VIP will appear on db1, but db2 will still have the
 VIP.



 Logs shows nothing weird. any idea what areas I could troubleshoot?
Maybe it is anyway interesting though ;-)




 My ha.cf is as follows



 # Time between heatbeats in seconds

 keepalive 1



 # Node is pronouced dead after 15 seconds

 deadtime 15



 # Prevents the master node from re-aquiring cluster resouces after a
 failover

 auto_failback on

 #auto_failback off



 # Port for udp (default)

 udpport 694



 # Use a udp heartbeat over the eth2 interface. Old version uses udp.

 bcast eth2



 use_logd yes

 debugfile /var/log/ha/ha.debug

 logfile /var/log/ha/ha.log



 # First node of the cluster

 node db1.domain.com
Did the uname -n  return the same name as here on  the db1 node ?



 # Second node of the cluster

 node db2.domain.com
Did the uname -n  return the same name as here on  the db2 ?







 Thanks!
 Daren

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

-- 
 Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How can a group of Resource be Migrated After one resource in the group fail ?

2008-09-22 Thread Nikita Michalko

Am Freitag, 19. September 2008 09:14 schrieb heartbeat:
 Hi all,
I found that if one of the resource failed in the group, none of the
 resources of the group can be migrated to the other node.My cib's key
 configuration is as follows:
  ..
  configuration
  crm_config
cluster_property_set id=cib-bootstrap-options
  attributes
  nvpair id=cib-bootstrap-options-default-resource-stickiness
 name=default-resource-stickiness value=0/ nvpair
 id=cib-bootstrap-options-default-resource-failure-stickiness
 name=default-resource-failure-stickiness value=-INFINITY/
 
   group id=group1
  primitive id=ap_1_ip_217 class=ocf type=IPaddr2
 provider=heartbeat instance_attributes id=ap_1_ip_attr_1
  attributes
nvpair id=ap_1_ipaddr_1 name=ip value=10.1.41.217/
nvpair id=ap_1_nic_1 name=nic value=eth1:1/
nvpair id=ap_1_mask_1 name=netmask value=25/
  /attributes
/instance_attributes
  /primitive
  primitive id=ap_1_ip_217_agent class=ocf type=myagent
 provider=heartbeat operations
  op id=1 name=monitor interval=5s timeout=4s
 on_fail=restart/ /operations
  /primitive
/group
  ..
  How can a group of Resource be Migrated After one resource in the group
 fail ? Thanks in advance!

Hi heartbeat: !

Did you already read following :

http://clusterlabs.org/mw/Image:Configuration_Explained.pdf
http://www.linux-ha.org/ClusterInformationBase/ResourceGroups

* don't set default-resource-stickiness to INFINITY
* Grab this script: 
http://hg.clusterlabs.org/pacemaker/dev/raw-file/tip/contrib/showscores.sh 
  and use it to determine SCORE
* set default-resource-failure-stickiness = SCORE / 3

Try to set the value for default-resource-failure-stickiness and 
default-resource-stickiness to something reasonable -  not -/+INFINITY 
and not 0 !

Nikita Michalko
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Active node gives up resources when passive fails - not good

2008-09-17 Thread Nikita Michalko

Am Dienstag, 16. September 2008 10:36 schrieb Wayne Gemmell:
Hi Wayne,

just a dumb question:

 Hi all

 I'm running 2.1.3 on Ubuntu hardy. My ha.cf looks like below. This is the
 first time I'm using unicast connections and so far there is wierdness
 going on. shutdown by ipmi because of high ambient temp. They've been
 having aircon problems in our hosting center and when the ambient temp gets
 to around 38 deg C my passive server shuts down (No I don't know how to
 stop this from happening.) 
-   is this maybe so configured in BIOS ?

 The active server then follows suite and gives 
 up all its resources. This happened again at 2am this morning and I really
 need to get to the bottom of this. I'm trying broadcast coms to see if it
 makes a difference but I though I'd post here for ideas.

 udpport 694
 ucast eth5 10.40.0.1
 ucast eth5 10.40.0.2
 debug 1
 use_logd on
 keepalive 2
 warntime 15
 deadtime 30
 initdead 45
 auto_failback off
 node nikki
 node asia
 crm off

HTH

Nikita Michalko
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: AW: AW: [Linux-HA] Heartbeat 2.1.3-23.1 (where is the stable version HA/pacemaker)

2008-08-28 Thread Nikita Michalko

Hi Maddin,

Am Mittwoch, 27. August 2008 14:29 schrieb Mega Mailingliste:
 Hi Andreas, Hi all,

 this is exactly what I mean, where are the 2.1.3-23.1 heartbeat/pacemaker
 packages?

- I have the RPMs for SLES10, if you want ...

HTH


Nikita Michalko

 I need this packages urgently for a productive-cluster and won't try 2.4.1
 or 2.99 which is still beta.

 Thanks a lot
 Maddin

  -Ursprüngliche Nachricht-
  Von: [EMAIL PROTECTED] [mailto:linux-ha-
  [EMAIL PROTECTED] Im Auftrag von Andreas Mock
  Gesendet: Mittwoch, 27. August 2008 13:31
  An: General Linux-HA mailing list
  Betreff: Re: AW: [Linux-HA] Heartbeat 2.1.3-23.1 (where is the stable
  version HA/pacemaker)
 
  Hi Michael, i all,
 
  but that does not answer the very first question:
  Which version is the current stable version of HA/pacemaker?
 
  What I understood so far:
 
  a) 2.1.3 with 0.6.6 is stable but cannot be found packaged in the
  know location http://download.opensuse.org/repositories/server:/ha-
  clustering/
 
  b) In that location is the combination of HA2.99.0/pacemaker 0.6.6
  which is
  declared beta.
 
  c) HA Version 2.1.4 is a full HA stack as we were used to before the
  pacemaker
  split off. This was done especially for SLES customers.
 
  Have I understood it right?
  Where is the stable HA/pacemaker combination?
 
  Best regards
  Andreas Mock
 
   -Ursprüngliche Nachricht-
   Von: Michael Schwartzkopff [EMAIL PROTECTED]
   Gesendet: 27.08.08 11:48:20
   An: General Linux-HA mailing list linux-ha@lists.linux-ha.org
   Betreff: Re: AW: [Linux-HA] Heartbeat 2.1.3-23.1
  
   Am Mittwoch, 27. August 2008 10:55 schrieb Mega Mailingliste:
Nobody who knows where i can get this??
Is 2.99 beta or not?
  
   Yes. It declared beta.
   --
   Dr. Michael Schwartzkopff
   MultiNET Services GmbH
   Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
   Tel: +49 - 89 - 45 69 11 0
   Fax: +49 - 89 - 45 69 11 21
   mob: +49 - 174 - 343 28 75
  
   mail: [EMAIL PROTECTED]
   web: www.multinet.de
  
   Sitz der Gesellschaft: 85630 Grasbrunn
   Registergericht: Amtsgericht München HRB 114375
   Geschäftsführer: Günter Jurgeneit, Hubert Martens
  
   ---
  
   PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
   Skype: misch42
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] installing on ubuntu 6.06

2008-08-13 Thread Nikita Michalko

Hello Sander,

 I really don't want to bother you, but it would be great, if you can send me 
the complete documentation of what you've done on Ubuntu Server 8.04 LTS. 
I 'll give it try ...

Thanks !

Nikita Michalko


Am Dienstag, 12. August 2008 14:58 schrieb Sander van Vugt:
 Just to encourage you to do an upgrade: I've recently succesfully
 implemented DRBD+iSCSI+Heartbeat on Ubuntu Server 8.04 LTS. If it helps,
 I can send you the complete documentation of what I've done.

 And my two cents about the stability of 8.04: I didn't like the
 (in)stability of 7.04 Server, but after having done a lot of things with
 8.04 server, I have to admit that I'm starting to like it. :-)

 Best regards,
 Sander

 On Tue, 2008-08-12 at 11:40 +0100, Matthew Macdonald-Wallace wrote:
  On Mon, 11 Aug 2008 21:59:33 -0600 (MDT)
 
  RYAN vAN GINNEKEN [EMAIL PROTECTED] wrote:
   sorry to bump an old post but i has still not solved this problem yet
 
  Seeing as 8.04 is now a LTS version of Ubuntu in the same way the 6.06
  was, is upgrading to a newer version of Ubuntu an option?
 
  If you are still having issues, I would seriously consider a re-install.
 
  I don't want to get into a distro war here, I would just like to say
  that I consider Ubuntu to be an excellent (if not the best when it
  comes to user-friendly aspects) Desktop Distribution and that I use
  Debian (or gentoo if severe customisation is required!) as a server
  distribution as I believe that they are more stable.
 
  Kind regards,
 
  Matt

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Linux-ha for firewalls

2008-08-13 Thread Nikita Michalko

Hi Christof,

 could you send me your last hb_report and a copy of your whitepaper in 
DOC-Format ? I will take a look ...


Thanks !

Nikita Michalko


Am Donnerstag, 7. August 2008 17:26 schrieb [EMAIL PROTECTED]:
 I apologize in advance for the top posting and the horrible web based
 e-mail  I'm on the road.


 I wrote a whitepaper/book about building Internet firewalls using Linux
 based systems, and have been keeping it up until relatively recently.  It
 includes a chapter on using Heartbeat in order to manage an active/passive
 firewall setup.  The book itself is centered around RHEL/CentOS, but the
 majority of it would work for pretty much any Linux distribution.

 The main reason I haven't been keeping it up is that I am working on the
 Second Edition of it.  The original was based around the 4.x version of
 RHEL/CentOS.  The new version will be based around the 5.x version.


 Another important note is that in the old version, it uses 2.0.8 of
 Heartbeat.  The new version will be using 2.1.3, but the config files, at
 least as far as a firewall is concerned, look like they will be the same.

 I'd be more than happy to send you a copy.  I can either send you the PDF
 of it or the DOC version of it.


 Dear list members,

 at the moment I try to setup a linux cluster of 2 firewalls that should
 both be online and only one should run virtual ip addresses of all
 network segments.

 My configuration looks like the following:
   master fw is linux (uname) and slave is idefix. I generated a
 resource group called grp_vips that contains all virtual  ip
 resources (rsc_int_vip and rsc_ext_vip). If I reboot the master (linux)
 idefix takes over all resources and   everything is ok, but if I
 shutdown a resource (rsc_int_vip) on the master the second resource
 (rsc_ext_vip) migratesto the slave (idefix) and the first resource
 (rsc_int_vip) stays at the master (linux) as unmanaged. Attached are the
 ha.cf and cib.xml files of my configuration.

 What I want to achieve is:
   - one dedicated master (linux), only, if there are problems
 switch to the slave (idefix)
   - if the master comes back (or only the interface that was gone)
 the whole group should migrate
   back to the primary master (linux)
   - if one resource of the group goes down, the whole group should
 be migrated to the slave
   (collocated = true of the group is already set)
   - if possible, the slave should become master (to always have
 the master where the resources are running

 One feature I detected also with my init scripts on Opensuse 10.3 is
 that heartbeat sometimes (80%)
 does not start because the network is not ready. I downloaded heartbeat
 rpms from the linux-ha download
 site and I'm using heartbeat 2.1.3.

 Any hints how I can achieve what I want are highly appreciated.

 Thank you for your help.

 Best regards

 Christof

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Linux-HA Leadership Announcement

2008-06-25 Thread Nikita Michalko

Hi Alan,

 THANK YOU VERY MUCH FOR YOUR GREAT WORK !
 GOOD LUCK !


Nikita Michalko

Am Dienstag, 24. Juni 2008 15:26 schrieb Alan Robertson:
 After more than 10 years as the Linux-HA project leader,  I've decided to
 create a new leadership structure.

 One of my original success criteria for the project was that it eventually
 would not need me.   In the last few years, it has seemed more and more
 likely that we'd reached this plateau of success - and the time has come to
 put that supposition to the test.

 Effective today, I am appointing a team of three people to lead and govern
 the project going forward.  These three outstanding people have proved
 themselves key contributors to the project, and are ready and willing to
 take over the reins of leadership - and lead the project into the future.

 These people are: Keisuke MORI [EMAIL PROTECTED]
 Dave Blaschke [EMAIL PROTECTED]
 Lars Marowsky-Bree [EMAIL PROTECTED]

 As for me, my current assignment in IBM doesn't permit me to spend full
 time on the project, but I will continue to promote and contribute to
 the project as time permits.  Should future circumstances permit it, I
 expect that I will increase my efforts the project again.

 Congratulations to Mori-san, Dave and Lars!   They're working out their new
 roles, scheduling releases, and so on.   Expect to hear from them soon!

 -- Alan Robertson
[EMAIL PROTECTED]
Linux-HA founder, Linux-HA project leader emeritus
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] need help

2008-04-17 Thread Nikita Michalko

Am Mittwoch, 16. April 2008 17:29 schrieb Philip Michael D Vargas:
 Good day to all,

 I'm new here, i need some help regarding HA for a linux. Do you have any 
 howto article for this? I'm using fedora, freebsd and ubuntu


 Your help is much appreciated.

 ---
 Linux Registered User # 413558
 [EMAIL PROTECTED]  (philip at comclark dot com)
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
Hi, Philip,

   you can start at http://wiki.linux-ha.org/ and also at 
http://www.linux-ha.org//HeartbeatTutorials

HTH

Regards
-- 

Nikita Michalko  
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat emergency shutdown

2008-03-19 Thread Nikita Michalko

Hi Manas,

  can you check if your  local firewall is blocking access to port 649 (or 
another HA-used port) ? Maybe some networking problems ?

HTH

Nikita Michalko


Am Montag, 17. März 2008 14:15 schrieb Manas Garg:
 Hi,

 We have a two nodes setup running heartbeat version 2.0.8-1. On one node,
 heartbeat exited saying Emergency Shutdown. It was restarted. After the
 restart, the heartbeat on the other node exited giving roughly the same
 reason. Can someone please help us identify the issue. If these are known
 bugs and if those bugs have been fixed in later releases?

 Any help would be greatly appreciated.

 The nodes configuration:

 sh-3.00# uname -a
 Linux S-FL2-PLS-NAC 2.6.17-1.2142_FC4smp #1 SMP Sat Aug 12 08:16:08 EDT
 2006 i686 i686 i386 GNU/Linux

 Following are the logs from the first node:

 Mar  3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
 is filling up (197 messages in queue)
 Mar  3 14:47:05 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
 is filling up (198 messages in queue)
 Mar  3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
 is filling up (199 messages in queue)
 Mar  3 14:47:06 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
 is filling up (200 messages in queue)
 Mar  3 14:47:10 S-FL2-PLS-NAC last message repeated 7 times
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7
 for s-fl2-sls-nac.yardi.com: seqno too low
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
 s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207,
 lowseq=7,ackseq=0,lastmsg=6
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 7
 for s-fl2-sls-nac.yardi.com: seqno too low
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
 s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =207,
 lowseq=7,ackseq=0,lastmsg=6
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Message hist queue
 is filling up (200 messages in queue)
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
 for s-fl2-sls-nac.yardi.com: seqno too low
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
 s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
 lowseq=8,ackseq=0,lastmsg=7
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
 for s-fl2-sls-nac.yardi.com: seqno too low
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
 s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
 lowseq=8,ackseq=0,lastmsg=7
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
 for s-fl2-sls-nac.yardi.com: seqno too low
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
 s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
 lowseq=8,ackseq=0,lastmsg=7
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: Cannot rexmit pkt 8
 for s-fl2-sls-nac.yardi.com: seqno too low
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: fromnode =
 s-fl2-sls-nac.yardi.com, fromnode's ackseq = 0
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist information:
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hiseq =208,
 lowseq=8,ackseq=0,lastmsg=7
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: ERROR: lowseq cannnot be
 greater than ackseq
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist-ackseq =10,
 old_ackseq=0
 Mar  3 14:47:10 S-FL2-PLS-NAC heartbeat: [5057]: info: hist-lowseq =201,
 hist-hiseq=208, send_cluster_msg_level=0
 Mar  3 14:47:10 S-FL2-PLS-NAC ccm: [5284]: ERROR: Lost connection to
 heartbeat service. Need to bail out.
 Mar  3 14:47:10 S-FL2-PLS-NAC cib: [5285]: ERROR:
 cib_ha_connection_destroy: Heartbeat connection lost!  Exiting.
 Mar  3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: ERROR: Disconnected with
 heartbeat daemon
 Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: CRIT: crmd_ha_msg_dispatch:
 Lost connection to heartbeat service.
 Mar  3 14:47:10 S-FL2-PLS-NAC mgmtd: [5290]: ERROR: Lost connection to
 heartbeat service.
 Mar  3 14:47:10 S-FL2-PLS-NAC stonithd: [5287]: notice:
 /usr/lib/heartbeat/stonithd normally quit.
 Mar  3 14:47:10 S-FL2-PLS-NAC crmd: [5289]: info: mem_handle_func:IPC
 broken, ccm is dead before the client!
 Mar  3 14:47:10 S-FL2-PLS-NAC attrd: [5288]: CRIT: attrd_ha_dispatch

Re: [Linux-HA] Strange behavior of the resource group on 2 nodes cluster

2008-03-11 Thread Nikita Michalko

Am Montag, 10. März 2008 16:12 schrieb Dominik Klein:
 Nikita Michalko wrote:
  Hi all!
 
   I have some troubles with HA V2.1.3 on SLES10 SP1, two-node cluster with
  1 resource group=2 resources.  Intended is  forced failover of the group
  on the third failure of any resource in the group; one node is preferred
  over the other (see attached configuration).
  After start are resources running on the preferred node (demo), as
  expected, but with 1 failcount and  with following score (script
  showscores): Resource  Score  Node   Stick.
  Failcount  Fail.-Stickiness IPaddr_193_27_40_57   0  dbora   
  2   0   -3 IPaddr_193_27_40_57   2  demo 
2   0   -3 ubis_udbmain_13  -INFINITYdbora 
   2   0   -3 ubis_udbmain_13   INFINITY   
  demo   2   1-3
 
  Score of the first resource (IPaddr_193_27_40_57)  is 2 as expected
  (group resource_stickiness=1) , but the second resource has score
  INFINITY- why ? Because of  added colocation constraint for group ?

 In a colocated group (which is the default), all subsequent resources
 are tied to the group's first resource with a score of INFINITY.

 To not allow them to run on another node but the node the first resource
 is run on, they also get -INFINITY for any other node.


Thank you very much, Dominik, for your reply - but how can I then achieve the 
Intended  behavior: group failover  on the third failure ?

Best regards

Nikita Michalko
AIP


 Regards
 Dominik
 ___ 
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Strange behavior of the resource group on 2 nodes cluster

2008-03-11 Thread Nikita Michalko

Am Dienstag, 11. März 2008 10:39 schrieb Dominik Klein:
  In a colocated group (which is the default), all subsequent resources
  are tied to the group's first resource with a score of INFINITY.
 
  To not allow them to run on another node but the node the first resource
  is run on, they also get -INFINITY for any other node.
 
  Thank you very much, Dominik, for your reply - but how can I then achieve
  the Intended  behavior: group failover  on the third failure ?

 Although I cannot explain it score-wise, as you can only see INFINITY
 for the group resources, this should work. Just let a resource in the
 group fail a couple of times and see what happens. Works for me. I'll
 have Andrew explain this when he's back from Australia :)

 Regards
 Dominik
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


Hi Dominik,

 I tried to let the resource in the group fail a couple of times,
but after the 2-nd try will the failcount for both resources NOT increased. It 
shows after each try (with ifconfig eth0:x down ) still the same:

Resource Score Node  Stickin. Failc.  Fail.-Stick.
IPaddr_193_27_40_57  0 dbora  2 0   -3
IPaddr_193_27_40_57  1 demo   2 1   -3
ubis_udbmain_13  -INFINITY dbora  2 0   -3
ubis_udbmain_13  INFINITY  demo   2 1   -3

Thank you again for your time !

Regards

 Nikita Michalko 
AIP
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Strange behavior of the resource group on 2 nodes cluster

2008-03-11 Thread Nikita Michalko

Am Dienstag, 11. März 2008 13:21 schrieb Dominik Klein:
  Hi Dominik,
 
   I tried to let the resource in the group fail a couple of times,
  but after the 2-nd try will the failcount for both resources NOT
  increased.

 Did you wait for the cluster to restart the resource after you produced
 the failure before causing another failure?


Yes, of course 

  It
  shows after each try (with ifconfig eth0:x down ) still the same:
 
  Resource Score Node  Stickin. Failc.  Fail.-Stick.
  IPaddr_193_27_40_57  0 dbora  2 0   -3
  IPaddr_193_27_40_57  1 demo   2 1   -3
  ubis_udbmain_13  -INFINITY dbora  2 0   -3
  ubis_udbmain_13  INFINITY  demo   2 1   -3

 Regards
 Dominik
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


Regards
 Nikita 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] ha_msg_addraw_ll: illegal field

2008-03-10 Thread Nikita Michalko

Hi Tom, 

 only one tip from me:  don't use serial line for heartbeat - better try only 
network interfaces !

Am Samstag, 8. März 2008 02:15 schrieb Tom Brown:
 OS: Debian Etch 4.0r3
 Kernel: vanilla kernel 2.6.24.3
 DRBD: 8.2.5
 Heartbeat: 2.1.3

 I was testing fail-overs between two nodes: fs01 and fs02. I've
 alternated rebooting the nodes several times. I saw the errors below
 show up in /var/log/ha-log, on fs01, once. There didn't seem to be any
 side effects after the errors showed up. Any ideas what these errors
 mean?

 Thanks,
 Tom


 heartbeat[1631]: 2008/03/07_15:57:41 WARN: glib: TTY write timeout on
 [/dev/ttyS0] (no connection or bad cable? [see documentation])
 heartbeat[1631]: 2008/03/07_15:57:41 info: glib: See
 http://linux-ha.org/FAQ#TTYtimeout for details
 heartbeat[1567]: 2008/03/07_15:57:50 WARN: node fs02: is dead
 heartbeat[1567]: 2008/03/07_15:57:50 info: Dead node fs02 gave up
 resources.
 heartbeat[1567]: 2008/03/07_15:57:50 info: Link fs02:/dev/ttyS0 dead.
 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: ha_msg_addraw_ll: illegal
 field
 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: ha_msg_addraw():
 ha_msg_addraw_ll failed
 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: NV failure (string2msg_ll):
 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: Input string: [
 t=NS_rexmit

 t=status
 st=init
 dt=7d00
 protocol=1
 src=fs02
 (1)srcuuid=XndbW/0xSR2wusgEzHcirQ==
 seq=1
 hg=47d19b04
 ts=47d1d6ac
 ld=0.83 0.21 0.07 1/59 1610
 ttl=3
 auth=1 af6ef5df
 
 ]
 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: sp=
 t=status
 st=init
 dt=7d00
 protocol=1
 src=fs02
 (1)srcuuid=XndbW/0xSR2wusgEzHcirQ==
 seq=1
 hg=47d19b04
 ts=47d1d6ac
 ld=0.83 0.21 0.07 1/59 1610
 ttl=3
 auth=1 af6ef5df
 

 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: depth=0
 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: MSG: Dumping message with 1
 fields
 heartbeat[1567]: 2008/03/07_15:58:37 ERROR: MSG[0] : [t=NS_rexmit]


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

HTH

Nikita Michalko 
AIP
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Strange behavior of the resource group on 2 nodes cluster

2008-03-10 Thread Nikita Michalko

Hi all!

 I have some troubles with HA V2.1.3 on SLES10 SP1, two-node cluster with 1 
resource group=2 resources.  Intended is  forced failover of the group on the 
third failure of any resource in the group; one node is preferred over the 
other (see attached configuration).
After start are resources running on the preferred node (demo), as expected, 
but with 1 failcount and  with following score (script showscores):
Resource  Score  Node   Stick. Failcount  Fail.-Stickiness
IPaddr_193_27_40_57   0  dbora2   0   -3
IPaddr_193_27_40_57   2  demo2   0   -3
ubis_udbmain_13  -INFINITYdbora   2   0   -3 
ubis_udbmain_13   INFINITYdemo   2   1-3

Score of the first resource (IPaddr_193_27_40_57)  is 2 as expected (group 
resource_stickiness=1) , but the second resource has score INFINITY- 
why ? Because of  added colocation constraint for group ?



Nikita Michalko  
AIP
 cib admin_epoch=0 epoch=2 generated=false have_quorum=false 
ignore_dtd=false num_peers=1 cib_feature_revision=2.0 num_updates=1 
cib-last-written=Thu Mar  6 15:21:33 2008
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options
 attributes
   nvpair id=cib-bootstrap-options-symmetric-cluster 
name=symmetric-cluster value=true/
   nvpair id=cib-bootstrap-options-no-quorum-policy 
name=no-quorum-policy value=stop/
   nvpair id=cib-bootstrap-options-default-resource-stickiness 
name=default-resource-stickiness value=2/
   nvpair 
id=cib-bootstrap-options-default-resource-failure-stickiness 
name=default-resource-failure-stickiness value=-3/
   nvpair id=cib-bootstrap-options-stonith-enabled 
name=stonith-enabled value=false/
   nvpair id=cib-bootstrap-options-stonith-action 
name=stonith-action value=reboot/
   nvpair id=cib-bootstrap-options-startup-fencing 
name=startup-fencing value=true/
   nvpair id=cib-bootstrap-options-stop-orphan-resources 
name=stop-orphan-resources value=true/
   nvpair id=cib-bootstrap-options-stop-orphan-actions 
name=stop-orphan-actions value=true/
   nvpair id=cib-bootstrap-options-remove-after-stop 
name=remove-after-stop value=false/
   nvpair id=cib-bootstrap-options-short-resource-names 
name=short-resource-names value=true/
   nvpair id=cib-bootstrap-options-transition-idle-timeout 
name=transition-idle-timeout value=5min/
   nvpair id=cib-bootstrap-options-default-action-timeout 
name=default-action-timeout value=110s/
   nvpair id=cib-bootstrap-options-is-managed-default 
name=is-managed-default value=true/
   nvpair id=cib-bootstrap-options-cluster-delay 
name=cluster-delay value=60s/
   nvpair id=cib-bootstrap-options-pe-error-series-max 
name=pe-error-series-max value=-1/
   nvpair id=cib-bootstrap-options-pe-warn-series-max 
name=pe-warn-series-max value=-1/
   nvpair id=cib-bootstrap-options-pe-input-series-max 
name=pe-input-series-max value=-1/
   nvpair id=cib-bootstrap-options-dc-version name=dc-version 
value=2.1.3-node: 552305612591183b1628baa5bc6e903e0f1e26a3/
   nvpair id=cib-bootstrap-options-last-lrm-refresh 
name=last-lrm-refresh value=1204812151/
 /attributes
   /cluster_property_set
 /crm_config
 nodes
   node id=7ce70870-4126-4bb7-b263-221a9e7efc7e uname=dbora 
type=normal/
   node id=aa15721d-a88a-4ec8-9e01-cc7eeb780f79 uname=demo 
type=normal/
 /nodes
 resources
   group id=group_1
 meta_attributes id=ma-group1
   attributes
 nvpair name=target_role id=ma-group1-1 value=started/
 nvpair name=resource_stickiness id=ma-group1-2 value=1/
 nvpair name=resource_failure_stickiness id=ma-group1-3 
value=-1/
   /attributes
 /meta_attributes
 primitive class=ocf id=IPaddr_193_27_40_57 provider=heartbeat 
type=IPaddr
   operations
 op id=IPaddr_193_27_40_57_mon interval=60s name=monitor 
timeout=60s/
   /operations
   instance_attributes id=IPaddr_193_27_40_57_inst_attr
 attributes
   nvpair id=IPaddr_193_27_40_57_attr_0 name=ip 
value=193.27.40.57/
   nvpair id=IPaddr_193_27_40_57_attr_1 name=cidr_netmask 
value=26/
   nvpair id=IPaddr_193_27_40_57_attr_3 name=broadcast 
value=193.27.40.63/
 /attributes
   /instance_attributes
 /primitive
 primitive class=lsb id=ubis_udbmain_13 provider=heartbeat 
type=ubis_udbmain
   operations
 op id=ubis_udbmain_13_mon interval=120s name=monitor 
timeout=110s/
   /operations
 /primitive
   /group
 /resources
 constraints
   rsc_location id=rsc_location_group_1 rsc=group_1
 rule id=prefered_location_group_1 score=1

Re: [Linux-HA] Solving a strange split-brain with drbd and ha

2008-02-21 Thread Nikita Michalko

Hello  Balabam,


Am Donnerstag, 21. Februar 2008 09:25 schrieb Balabam:
 Hello,

   I've two nodes working in a split brain configuration and I'm not able to
 solve this problem. My config is:

 [EMAIL PROTECTED] ~]# crm_mon
 Defaulting to one-shot mode
 You need to have curses available at compile time to enable console
  mode

 
 Last updated: Fri Feb 15 11:05:05 2008
 Current DC: rman1c (875afc12-b88e-4940-9816-218d2a5911c3)
 2 Nodes configured.
 2 Resources configured.
 
 Node: rman1a (4d7bd4ec-c121-4b13-a2d4-aec820ea36d5): online
 Node: rman1c (875afc12-b88e-4940-9816-218d2a5911c3): online
 Master/Slave Set: ms-drbd0
 drbd0:0 (heartbeat::ocf:drbd):  Started rman1a
 drbd0:1 (heartbeat::ocf:drbd):  Master rman1c
 Resource Group: Oracle
 V_IP(heartbeat::ocf:IPaddr2):   Started rman1c
 FS  (heartbeat::ocf:Filesystem):Started rman1c
 Ora_DB  (heartbeat::ocf:oracle):Started rman1c
 Ora_LSNR(heartbeat::ocf:oralsnr):   Started rman1c
 Failed actions:
 FS_start_0 (node=rman1a, call=13, rc=1): Error

 group Oracle is running on node where drbd is in master mode. If I
  cleanup FS on, heartbeat stop resources on rman1c and try to remount on
  rman1a making a Fail.

 I've attached cib..xml and log of messages..


- REALLY ?? I don't see any attachment !
Check again !

Bye !

Nikita Michalko


 Thanks
 Stefano



 -

 -
 L'email della prossima generazione? Puoi averla con la nuova Yahoo! Mail
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat 2.1.3 error

2008-02-14 Thread Nikita Michalko

Am Donnerstag, 14. Februar 2008 12:51 schrieb maike:
 Hi people, i update the heartbeat for last version, but when i started the
 heartbeat
 an error is issued?

 heartbeat[5612]: 2008/02/14_09:47:59 WARN: Managed /usr/lib/heartbeat/cib
 process 5630 exited with return code 1.
 heartbeat[5612]: 2008/02/14_09:47:59 EMERG: Rebooting system.  Reason:
 /usr/lib/heartbeat/cib


 Someone help-me?

- yes, I would, but: config/log files and cib.xml missed :-(
-- 
Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Documentation/help for forced resource fail-over

2008-01-31 Thread Nikita Michalko

Hi all,

 can somebody point me to the right V2.1.3 documentation or help for 
configuration of forced fail-over of resources ? I want to configure 2-node 
symmetric cluster with 7 resources so that after 3 failures on node A will 
resources stop on node A and then start on node B. How should I set the 
default-resource-stickiness and default-resource-failure-stickiness ?

Any help will be very appreciate !
-- 
Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Going from V.2.0.7 to V.2.1.3

2008-01-25 Thread Nikita Michalko

Hi all,

 I've got problem after installing HA V.2.1.3 on SLES10 32-bit box. The old 
V.2.0.7  with the same configuration - 2 Nodes Active/Active, with crm - is 
running fine (with small problems), but now - with 2.1.3 - I get following 
errors:
lrmd ... info: RA output: (IPaddr2_193_27_40_56:monitor:stderr) 26: unknown 
interface: No such device
...
/usr/lib/heartbeat/findif version 2.1.3 Copyright Alan Robertson

Usage: /usr/lib/heartbeat/findif [-C]
Options:
-C: Output netmask as the number of bits rather than as 4 octets.
Environment variables:
OCF_RESKEY_ipip address (mandatory!)
OCF_RESKEY_cidr_netmask netmask of interface
OCF_RESKEY_broadcast broadcast address for interface
OCF_RESKEY_nic   interface to assign to

IPaddr2[24260]:  ERROR: [/usr/lib/heartbeat/findif -C] failed
- see attached config and log files.
The requested IP address 193.27.40.56 is free, not up. 
crm_verify -V -x cib.xml  - OK (without errors)
What kind of configuration error or something else could it be ?
What did I wrong ?
Thanks in advance for any help !


AIP - Nikita Michalko 



# HA-Services
dbfix 193.27.40.56/26/193.27.40.63 ubis_up_mkctab ubis_nserv ubis_mserv 
demo 193.27.40.57/26/193.27.40.63 ubis_udbmain 

# Mit privatem NW - derzeit nicht möglich im Zi.17 !
#dbfix 193.27.40.56/26/193.27.40.63 192.168.163.56/26/193.27.40.63 
ubis_up_mkctab ubis_nserv ubis_mserv 
#demo 193.27.40.57/26/193.27.40.63 192.168.163.57/26/192.168.163.63   
ubis_udbmain 

# aipdemo 193.27.40.52/26/193.27.40.63 192.168.163.52/26/193.27.40.63 
ubis_applmain  - das führt zu 4-fachen IP-Zuteilung : 
eth0,eth0:1,eth0:2,eth0:3 !!
#dbfix   193.27.40.55/26/193.27.40.63 ubis_applmain 
#server1 193.27.40.181 192.168.163.181   ubis_up_mkctab ubis_nserv ubis_mserv 
ubis_fax
#server1 193.27.40.53/26/193.27.40.63 192.168.163.53/26/193.27.40.63   
ubis_up_mkctab ubis_nserv ubis_mserv ubis_fax
#demo 193.27.40.53/26/193.27.40.63 aip_haservice
#opteron 193.27.40.182/26/193.27.40.63 192.168.163.182/26/192.168.163.63   
ubis_udbmain 
#opteron 193.27.40.54/26/193.27.40.63 192.168.163.54/26/192.168.163.63   
ubis_udbmain 
logfile /var/log/ha-log
debugfile /var/log/ha-debug
debug 0
logfacility local1
#logfacility kern
#use_logd yes
#udpport 694
udpport 708
bcast eth0
#bcast eth1
coredumps true
auto_failback on
keepalive 5
warntime 10
deadtime 15
initdead 180
node demo
node dbfix
crm yes
#stonith external/aipst /etc/ha.d/stonith.ssh
#respawn hacluster /usr/lib64/heartbeat/ccm
#respawn hacluster /usr/lib64/heartbeat/ipfail
#apiauth stonithd uid=root
#respawn root /usr/lib64/heartbeat/hbagent


pe-warn-17_NM.bz2
Description: BZip2 compressed data


ha-log-NM.gz
Description: GNU Zip compressed data
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] How to set up group's score/attributes to force failover

2007-10-10 Thread Nikita Michalko

Hallo all !

 In HA V2.1.2 (R2) have I to set up 2 nodes with 3 resource groups so that
every group will failover after 3 (monitor)failures to the another node.
I looked already at http://www.linux-ha.com/v2/faq/forced_failover, but
but that formula only applies to single resources.
Where can I find the apropriate docu/formula for groups to achieve this ?

Thank you for your tips !

 Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

97 matches

Mail list logo