Re: [Linux-HA] Help: can't find IPv6addr in heartbeat 3.0.4 package

2017-03-24 Thread Dimitri Maziuk
PS. sorry, it's the resource-agents rpm, not EPEL, so keep in mind
redhat may update it and overwrite your custom build if you're not careful.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
___
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha

Re: [Linux-HA] Heartbeat packages for Redhat-7

2015-04-04 Thread Dimitri Maziuk

On 2015-04-03 17:03, Lars Ellenberg wrote:


You can use heartbeat 3.0.6 (if you only use haresources mode).


You can google for ticket # but basically epel heartbeat maintainer 
replied to my rfa with I don't use heartbeat anymore so no. I meant to 
post that here but forgot.


So there is no heartbeat rpm for el7 in the usual repos. Does 
clusterlabs have one?


Dimitri

___
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
___
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha


Re: [Linux-HA] Announcing the Heartbeat 3.0.6 Release

2015-02-10 Thread Dimitri Maziuk
On 02/10/2015 03:24 PM, Lars Ellenberg wrote:
...
 After 3½ years since the last officially tagged release of Heartbeat,
 I have seen the need to do a new maintenance release.

Yay! Thank you Lars.

 - heartbeat.service file for systemd platforms

RFA submitted to EPEL-7.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat

2015-01-20 Thread Dimitri Maziuk
On 01/20/2015 01:34 PM, Ron Croonenberg wrote:
 Hello,
 
 I have an ether net connection that connects all hosts in a cluster and
 the nodes also have an IB connection. I want the failover host to take
 over when an IB connection goes down on a host. Is there an example for
 how to do this? (I am using ipmi for shutting down hosts etc).
 
 A cluster I am using has 8 nodes and want to do fail over in pairs of
 two.  in the ha.cf file do I mention all the hosts or just the host and
 it's fail over, per pair?

Do you have 4 separate active-passive pairs or a cluster of 8 nodes? If
it's the latter, I think you want pacemaker, not heartbeat. Dunno what
pacemaker might have for monitoring an IB connection, with heartbeat R1
I'd do something like grep for LinkUp in the output of ibstat.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Support for DRDB

2015-01-16 Thread Dimitri Maziuk
On 01/16/2015 11:19 AM, Digimer wrote:

 When RHEL 6 was released, Red Hat wanted to reduce their support
 overhead a lot. So many things that used to be supported were dropped.
 DRBD, unlike most other dropped programs, is still supported, just not
 by RH directly. They worked out an agreement with LINBIT to allow
 officially supported RHEL systems to be fully supported when they ran
 DRBD.

Ah, OK. We moved from RHEL to Centos a few years back so I'm not quite
up to date on official RedHat's offerings anymore. I do know they bought
ceph recently and now have their own shiny! cloudy! replicated iscsi
block device, so ...

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Support for DRDB

2015-01-16 Thread Dimitri Maziuk
On 01/16/2015 05:33 PM, Digimer wrote:

 1. CentOS replicates RHEL, warts and all.

Not exactly. E.g. gluster is a for-pay RHEL add-on. There's some kind of
gluster rpm in Centos but it's pretty much disfunctional: you have to
remove that, add the upstream gluster repo and get your gluster from there.

 2. DRBD is an HA technology, ceph/gluster are cloud technologies. 
...
 from what I've gathered, ceph/gluster shine brightest when they're
 on top of many nodes. Their goal is, first, scalability and resource
 utilization. DRBD's is, first, data protection.

More or less. IRL they apparently shine on top of a 10Gb network. Or
better, three 10Gb networks. DRBD works just fine over a crossover piece
of cat-5e.

 Again, my understanding only, I could be wrong.

Oh, it's a pure speculation on my part. Any resemblance to the actual
RedHat is purely coincidental and all that. ;)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Q: ping (ocf:pacemaker:ping) from specific address?

2014-09-04 Thread Dimitri Maziuk
On 09/04/2014 09:23 AM, Ulrich Windl wrote:

 2) You can use an IP address as interface for -I
 3), but you cannot use a hostname (resolving to the same IP address as in 2)):
 ping: unknown iface hostname

Uhmm... ping -I `host hostname | awk '{print $NF;}'` ?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Application level HA using heartbeat.. ??

2014-08-01 Thread Dimitri Maziuk
On 08/01/2014 03:47 AM, N, Ravikiran wrote:
 Hi,
 
 ... I had configured httpd in my haresources file and I
 manually stopped httpd using service httpd stop. Although this stops
 httpd service, Heartbeat doesn't recognize this.

Correct, that is how haresorces mode works. It is well documented -- or
was until linux-ha folks removed the documentation from the web because
heartbeat's old.

 ... why I should provide a script in resources.d/ to start, stop and
 find status of the application. Also, how can I achieve application
 level HA using heartbeat

If you don't want to roll your own scripts, use new corosync/pacemaker
hotness instead of old and busted heartbeat/haresources.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] getting proper sources

2014-06-07 Thread Dimitri Maziuk
On 06/06/2014 05:47 PM, Lars Marowsky-Bree wrote:

 So I'd appreciate it if you'd not make those claims; I admit to feeling
 slighted.

The claim that prompted this was that the level of support a centos user
gets is for pacemaker: 50% chance that the Lars over there will ask if
he's a paying SuSe customer; for heartbeat: 100% chance that Digimer
will tell him to install pacemaker.

If you don't work here and you ask for help with my code, there's 50%
chance that out of the goodness of my heart I'll help you out in my
copious free time. I admit that.

If someone tries to deny that about me, please provide a counter-example
to prove (in mathematical sense) they're full of it. I promise I won't
feel slighted.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] getting proper sources

2014-06-02 Thread Dimitri Maziuk
On 06/02/2014 03:42 PM, Jay G. Scott wrote:
 On Thu, May 29, 2014 at 12:43:49PM -0500, Dimitri Maziuk wrote:

 What do you intend to run in HA mode?

 3.  bind/named/dns, possible some fortran programs.

Uhmm... why not just run 2 nameservers?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] getting proper sources

2014-05-29 Thread Dimitri Maziuk
On 05/29/2014 12:01 PM, Jay G. Scott wrote:

 what's the answer for ...  Centos, I guess...?  And it does
 embarrass me to have to ask that.

Pacemaker/corosync -- 2+-node clusters, active-active clusters, active
development. Support for free is 50% chance Lars will ask you if you're
a paying Suse customer.

Heartbeat 'R1' (i.e. as long as you don't use 'crm' mode) -- simple,
stupid, has been rock solid (and, consequently, untouched) for years.
2-node active/passive clusters only, DIY external resource monitoring
(mon), the level of support is: Digimer will tell you upgrade to
pacemaker.

What do you intend to run in HA mode?
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] getting proper sources

2014-05-28 Thread Dimitri Maziuk
On 05/28/2014 04:05 PM, Jay G. Scott wrote:
 
 Greetings,
 
 I'm a noob.  If this isn't the right place to ask this,
 let me know.  I took general configuration questions
 to include compiling.
 
 OS = RHEL6 (I hope) x86_64

On centos there's heartbeat in EPEL and pacemaker in standard repo. Not
sure what RH is doing with their channels this week, so... presumably
yum list \*pace\* should work?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] two node cluster with postfix - how to get system mails from both nodes

2014-01-23 Thread Dimitri Maziuk
On 01/23/2014 11:14 AM, Christian Richter wrote:
 Hello,
 
  
 
 i'm looking for the right way to integrate postfix in my 2 node cluster.

The right way is don't. Read e.g.
http://serverfault.com/questions/303554/how-to-build-a-high-availability-postfix-system

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] does heartbeat 3.0.4 use IP aliases under CentOS 6.5?

2014-01-04 Thread Dimitri Maziuk

On 1/4/2014 10:39 AM, Lars Marowsky-Bree wrote:

On 2014-01-03T20:56:42, Digimer li...@alteeve.ca wrote:


causing a lot of reinvention of the wheel. In the last 5~6 years, both teams
have been working hard to unify under one common open-source HA stack.
Pacemaker + corosync v2+ is the result of all that hard work. :)


Yes. We know finally have one stack everywhere. Yay!


Yah, ein stack uber alles! You'd note that even the cpu market has 
settled on two kinds of music, not one...


Dima

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] does heartbeat 3.0.4 use IP aliases under CentOS 6.5?

2014-01-03 Thread Dimitri Maziuk
On 01/03/2014 04:44 PM, Brian Reichert wrote:

 Anyway, my question is: is this behavior (not using IP aliases) a
 feature of heartbeat 3.0.x, or is this an artifact of the CentOS
 plumbing the heartbeat invokes?  I didn't see anything in the
 changelog in a quick perusal.

I think it's a side-effect of RedHat going all entrerprisey with
OpenStack (RDO) and having to bite the pacemaker bullet as a result. I'm
told heartbeat resource scripts have been just wrappers around
pacemaker's resource-agents for some time -- but RedHat's packages
haven't been touched in years. Until now.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] FYI: resource-agents-3.9.2-40.el6.x86_64 kills heartbeat-3.0.4

2013-11-28 Thread Dimitri Maziuk

On 2013-11-27 20:15, Jefferson Ogata wrote:


It's nicer, however, when Red Hat takes a conservative position with the
Tech Preview. They could have shipped a minimal set of resource agents
in the first place, so people would have a better idea what they had to
provide on their own end, instead of pulling the rug out with nary a
mention of what they were doing.


The other issue is epel packaging: unsupported obsolete heartbeat-3.0.4 
should arguably not depend on bleeding edge tech preview packages in the 
first place. If anything, it should conflict with pacemaker  co.
However, if epel maintainer has no time to make 30-second edit to 
/etc/init.d script I don't expect him to repackage the whole thing right 
in my lifetime.


Dima


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] FYI: resource-agents-3.9.2-40.el6.x86_64 kills heartbeat-3.0.4

2013-11-27 Thread Dimitri Maziuk
Just so you know:

RedHat's (centos, actually) latest build of resource-agents sets $HA_BIN
to /usr/libexec/heartbeat. The daemon in heartbeat-3.0.4 RPM is
/usr/lib64/heartbeat/heartbeat so $HA_BIN/heartbeat binary does not exist.

(And please hold the upgrade to pacemaker comments: I'm hoping if I
wait just a little bit longer I can upgrade to ceph and openstack -- or
retire, whichever comes first ;)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] FYI: resource-agents-3.9.2-40.el6.x86_64 kills heartbeat-3.0.4

2013-11-27 Thread Dimitri Maziuk
On 11/27/2013 06:29 PM, Jefferson Ogata wrote:
 On 2013-11-28 00:12, Dimitri Maziuk wrote:

 Hey, upgrading to pacemaker wouldn't necessarily help. Red Hat broke
 that last month by dropping most of the resource agents they'd initially
 shipped. (Don't you love Technology Previews?)

Dunn about previews -- this stuff's been running rock solid here for
years -- but I do appreciate the part where you don't find out what they
fscked up until sometime much later you need to restart the thing
because of libc or kernel upgrade... And then go figure which update of
what package broke it.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Installing Heartbeat 3.0.5 in RHEL 6.3

2013-09-18 Thread Dimitri Maziuk
On 09/18/2013 01:43 PM, Emilio López wrote:
 Hi I'm having issues compiling and installing Heartbeat 3.0.5 from source
 at RHEL 6.3 and I'm running with some issues I cannot solve.
... Any suggestions?

$ yum list heartbeat
...
Available Packages
heartbeat.x86_64 3.0.4-1.el6 epel

(centos 6.4)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Master/Slave status check using crm_mon

2013-06-12 Thread Dimitri Maziuk
On 06/12/2013 04:35 PM, Andrew Beekhof wrote:
 
 On 13/06/2013, at 2:49 AM, John M john332...@gmail.com wrote:
 
 Dear All,

  I will try to setup pacemaker cluster in the coming weeks. Before that I
 have to complete the configuration using heartbeat 2.1.4.
  I would really appreciate if you could suggest the configuration for
 Master/Slave scenario mentioned in my previous mail.
 
 Given the number and type of bugs related to m/s resources that have
been fixed in the last 5 years, I cannot in good conscience make such
suggestions.
 Analogy: You don't build a skyscraper on quickstand.

Analogy: don't build a skyscraper, do a log cabin.

If you have to do 2.1.4, use haresources mode. You'll need to do your
own monitoring (use mon).

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Master/Slave status check using crm_mon

2013-06-12 Thread Dimitri Maziuk
On 06/12/2013 05:27 PM, Andrew Beekhof wrote:

 Yes, but haresources doesn't support the master/slave concept does it.
 So not very good advice.

You could argue master+slave is the only concept it does support.
;)
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Q: sbd using libxslt in SLES11 SP2?

2013-05-03 Thread Dimitri Maziuk
On 05/03/2013 12:24 AM, Ulrich Windl wrote:
 I wonder how they could fly to the moon without XML ;-)

They couldn't. That's why they had to film the whole thing on a sekret
base in Arizona.

Kids these days...
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] DRBD NetworkFailure

2013-04-23 Thread Dimitri Maziuk
On 04/23/2013 02:20 PM, Greg Woods wrote:
... The two nodes are connected by a crossover
 cable, and that is the link used for DRBD replication. So it seems as
 though the only possibilities are a flaky NIC or a flaky cable, but in
 that case, wouldn't I see some sort of hardware error logged? Anybody
 else ever seen something like this?

If you pull the cable, you may get 'eth1 link down' somewhere (console,
/var/log/messages), or not. If you have a hardware error on the NIC
something should crash I think, though I don't remember ever seeing that.

I've heartbeat (as in 3.0.4 w/ haresources) pings going over the
crossover cable as well, so I don't specifically monitor that link or
drbd status. I do monitor eth0 (for 'link detected' in the output of
ethtool).

I also have nagios checking drbd for UpToDate/UpToDate, but that's not
part of the cluster.

HTH
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Multiple instances of heartbeat

2013-03-15 Thread Dimitri Maziuk
On 3/14/2013 11:15 AM, Alberto Alonso wrote:

 That's what I thought. The emails from 2009 seemed to indicate
 that it was possible to run multiple instances.

I've always had difficulties with the concept: the way I see it if your 
hardware fails you want *all* your 200+ services moved. If you want them 
independently moved to different places, you're likely better off with a 
full cloud solution. If you want them moved while hardware's still up 
you're probably looking for load balancing, not HA.

I'm sure you can patch heartbeat to replace all hardcoded stuff with 
config file settings. Or use pacemaker's ability to manage service 
groups more or less independently. I'm not sure why you'd want to use 
either that way.

Dima

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Multiple instances of heartbeat

2013-03-15 Thread Dimitri Maziuk
On 03/15/2013 10:08 AM, Lars Marowsky-Bree wrote:

 You're contradicting yourself ;-) Pacemaker in fact gives you the
 management you suggest for the cloud use case - whether the services
 are handled natively or encapsulated into a VM.

Yeah, I suppose. I meant going Open/CloudStack.
(We get to write buzzword-compliant funding proposals, or I don't get to
eat. So my perspective is skewed towards the hottest shiny du jour...)

 And the concept of HA clusters predates the cloud slightly.

Relevant if you're looking at maintenance/upgrade on an existing
cluster. Patching heartbeat to manage 200 services independently sounds
like a new project.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Multiple instances of heartbeat

2013-03-15 Thread Dimitri Maziuk
On 03/15/2013 11:55 AM, Lars Marowsky-Bree wrote:
...
 Right. Thankfully, we already have that, it's called pacemaker ;-)

Which brings me back to my original problem with the concept: I can
think of only one reason to failover services (as opposed to
hardware), and that is your daemons are crashing all the time during
normal operation. If I needed a solution for that, HA would be fairly
low on my list of things to look at.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Multiple instances of heartbeat

2013-03-15 Thread Dimitri Maziuk
On 03/15/2013 12:59 PM, GGS (linux ha) wrote:

 Unfortunately I'm not at liberty to discuss the full architecture 
 or what they are doing without written permission, which would
 make it clear why we are going the path we are.

Yeah, I suspected something like that. Hopefully I won't ever need to
know. ;-)

(I'd still argue that a full vm solution should have less maintenance
overhead in the long run -- or at least it looks that way now.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Multiple instances of heartbeat

2013-03-15 Thread Dimitri Maziuk
On 03/15/2013 01:20 PM, GGS (linux ha) wrote:

 Virtualization has a huge penalty on performance, specially
 at the IO level. At another place we do Xen and KVM with up to
 40 VMs/server and when there is any kind of IO (disk specially) going
 on things slow down to a crawl.

I'm yet to find anything that can deal with i/o. I recently spent a
couple of weeks poking at ceph, it doesn't live up to the sales brochure
either... I expect if you can roll out a dedicated 10GBe network for
your iscsis you might get usable i/o speeds. :(

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Multiple instances of heartbeat

2013-03-14 Thread Dimitri Maziuk
On 3/14/2013 9:44 AM, GGS (linux ha) wrote:

 ... I always laugh when people talk
 about having hundreds or thousands of servers, because
 switching to a stack model and proper utilization
 of hardware resources can save a ton of money.

The flip side is when caps on the mobo go dry you lose 50-100 
services/stacks instead of just one.

Dima


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] achieve failover time less then 1 sec

2013-03-06 Thread Dimitri Maziuk
On 03/05/2013 05:57 AM, Rocky Patel wrote:
 Hi,
 
 I configure heartbeat with normal IP failover and it is working fine.
 
 with below configuration, I get 3 sec failover time.

Counting from what?

From my /var/log/messages (centos 6.3  EPEL heartbeat):

15:53:52 cuttlefish ResourceManager(default)[1960]: info: Running
/etc/ha.d/resource.d/IPaddr 144.92.167.233 start
...
15:53:52 cuttlefish
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_144.92.167.233)[2267]:
INFO:  Success

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] 'Tie-breaker' facility and quorum/membership question

2013-01-23 Thread Dimitri Maziuk
On 1/22/2013 10:31 PM, Digimer wrote:

 A stray iptables rule would knock you out without dropping the link layer.

Presumably it wasn't there when you configured, tested, and burned in 
the node.

Tie-breaker ain't gonna help with gremlins living in the cluster, 
self-modifying firewalls, or publishing root password on facebook. To 
name a few.

Dima

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] 'Tie-breaker' facility and quorum/membership question

2013-01-23 Thread Dimitri Maziuk
On 1/22/2013 7:39 PM, David Lang wrote:

 I've also had network connectivity restored (switch got rebooted,
 someone noticed a loose cable and plugged it in, etc)

 What I would do would be to look into defining a ping node that could be
 used as the tie-breaker (but that ping node needs to be HA as well)

I mentioned nagios: have external host monitor the cluster and 
individual nodes.

 The real question that you need to answer before going down this road is
 how much damage you suffer in a split-brain situation.

Yep. If you have a read-only service you can get around that by having 
two copies of the data (each node has its own copy). If it's read-write 
you need to add drbd to the mix and start worrying about split-brain, 
stonith, and tie breakers.

Dima


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] 'Tie-breaker' facility and quorum/membership question

2013-01-21 Thread Dimitri Maziuk
On 1/20/2013 7:09 PM, Alex Sudakar wrote:
 I'm setting up a very basic two-node active/passive cluster using
 Pacemaker 1.1.7 and Corosync 1.4.1 under Red Hat Enterprise Linux 6.3.
   The cluster is running a web application that needs to be accessed by
 our separate LAN of desktops.

 With only two nodes comprising the cluster I believe a quorum is
 impossible, so I've set no-quorum-policy to 'ignore'.  However I was
 wondering if there is a possibility of using one or more 'tie breaker'
 devices/resources to determine a proper quorum?  I _think_ I've seen
 mention of such a thing in passing in this list; I'm not sure.

I simply grep for 'link detected' in the output of `ethtool eth0` -- if 
it's not there, the node is off-line and can shoot itself. (Although I 
don't do that: it's off the net, it's not bothering anyone, and I have 
nagios elsewhere monitoring the hosts. If the link comes back on its 
own, that could cause a problem, but that hasn't happened to me yet.)

Keep It Simple Stupid.
Dima

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] possibly silly configuration - comments please

2012-12-27 Thread Dimitri Maziuk
On 12/26/2012 10:17 PM, Miles Fidelman wrote:

 Does this make sense, or is it totally crazy?

Simple stupid is usually the best. Twisty maze of little layers of 
indirection tends to be fragile and unmaintainable.

Over here 95% of downtime is caused by maintenance reboots (kernel/libc 
upgrades) and 95% of hardware failures are dying disks -- no downtime 
there as they're raided. The other 5% is basically not worth the effort 
-- in terms of my time and hardware costs it's cheaper to let it break 
and if needed pull an overnighter picking up the pieces. (Obviously, 
partitioning your services so one server doesn't take them all out helps 
too.)

So what is it that you're trying to protect against with your HA cluster?


Dima


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Q: NFS cross mounting

2012-12-21 Thread Dimitri Maziuk
On 12/20/2012 2:31 PM, Lars Marowsky-Bree wrote:
 On 2012-12-20T10:50:11, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:

 Well, they'll be wrong unless there's a way nfs-export an unmounted
 filesystem. A reasonable question would be why not symlink mount point
 - export point ;)

 That'd miss the use case of being able to switch-over the server to
 another node.

Fair enough.

(I don't run anything that uses the fs on our nfs servers. For an 
occasional non-root login (nfs-mounted /home) symlinks work.

Part of the reason I saw the switch-overs hang on device busy on an 
nfs mount, so I Just Don't Do That(tm).)

Dima


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Q: NFS cross mounting

2012-12-20 Thread Dimitri Maziuk
On 12/20/2012 1:55 AM, Ulrich Windl wrote:
 Lars Marowsky-Bree l...@suse.com schrieb am 19.12.2012 um 18:29 in 
 Nachricht
 20121219172929.gg29...@suse.de:
 On 2012-12-19T10:59:12, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de 
 wrote:

 Unfortunately there was an update to SLES11 SP2 release notes recently that
 forbid this kind of setup:

 That's a generic SLES question; I suggest using the SLES online forum
 (forums.suse.com) for this or support.

 I guess the typical forum user will ask me why I won't mount the
 file
system directly on the node instead of using the NFS server to export
it, and the NFS client to import (mount) it ;-)

Well, they'll be wrong unless there's a way nfs-export an unmounted 
filesystem. A reasonable question would be why not symlink mount point 
- export point ;)

Dima

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] master/slave drbd resource STILL will not failover

2012-12-05 Thread Dimitri Maziuk
On 12/05/2012 12:05 PM, Robinson, Eric wrote:

 I believe the problem is that when I do 'crm node offline' Pacemaker
is fully stopping the drbd service. This causes drbd on the secondary to
go into a WFConnection state. It refuses to promote to primary in that
state.

Probably not relevant, but ISTR there were problems with using the same
interface for drbd and pacemaker. You're not doing that by any chance?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] master/slave drbd resource STILL will not failover

2012-12-05 Thread Dimitri Maziuk
On 12/05/2012 01:36 PM, Robinson, Eric wrote:

 I think the more revelant issue is that Pacemaker is fulling
 stopping
drbd, which canses the standby to go into a WFConnection state, so it
refuses to promote.

I was thinking drbd losing packets and thus falling back to WFC rather
than pacemaker ordering a full stop. But in the latter case you could
probably find the stop action in the RA and replace it with (e.g.)
logger 'AIE ***I did not want this***' and then see what gets logged.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Some questions about High Availability

2012-12-03 Thread Dimitri Maziuk
On 12/03/2012 12:03 PM, Rodrigo Abrantes Antunes wrote:

 Simple setup: put it all on one filesystem and put that on DRBD. See
http://www.drbd.org/users-guide-8.3/s-heartbeat-r1.html[1]

Also recommended: drbdlinks.

HTH
--
Dimitri Maziuk
Programmer/sysadminBioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

 I was reading about DRDB but it seems that it's not what I want. With DRDB
 you need to create a separate partition and syncronize this between nodes,
 but what I need is to syncronize / with all files in the disks just like
 raid1, and these disks are already using lvm with ext3 filesystem. I need
 this because I don't want to care about configuring 2 servers, i just want
 all configs from one server to be replicated in the other (like vmware HA
 where you don't need to care about the backup vm), if I need to configure
 2 servers I could just do a script copying files from one server to the

This is why I mentioned drbdlinks: at failover it can rename e.g.
/etc/httpd to /etc/httpd.drbdlinks and make a symlink /etc/httpd -
/drbd/etc/httpd on the active node  back on the passive one. (You run
it after drbd but before apache, obviously.)

Or make a vm and run that in ha container as digimer suggested.

You obviously can't completely share / because each node has to have
enough of the kernel+init+everything else to boot up and start your ha
stack. You'll still need to configure physical nodes to run the vm on,
too. If you don't like it, pay amazon to do physical stuff for you.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Some questions about High Availability

2012-12-03 Thread Dimitri Maziuk
On 12/03/2012 02:34 PM, Rodrigo Abrantes Antunes wrote:

 these files need to be synced. For this I can use drbdlinks, but I didn't
 understood well when to run it. If the primary node fails how it will
 rename the files?

Drbd is raid-1 over the network, everything is synced. The filesystem is
mounted on one node only -- using haresources

primary.node.dom drbddisk::share_name \
Filesystem::/dev/drbd0::/mount/point::ext4::rw,noatime \
drbdlinks \
IP.AD.DR/CIDR \
httpd \
tomcat6 \
vsftpd \
mon

This will start drbd resource share_name, mounts it on /mount/pont,
then runs drbdlinks to create the symlinks, then brings up the ip
address and apache/tomcat/ftp services, and finally starts up mon that
does service monitoring. If the primary node fails, this will happen on
the secondary.

What happens on the primary and how you deal with it is another
question. Drbdlinks, specifically, comes with an init.d script that
checks its links at boot-up and restores files to the default state
(i.e. your primary will have a bunch of dangling symlinks after it fails
and until that script is run -- that does not happen during controlled
takeover).

So ideally you want to power off the failed node as part of the process,
but that takes extra hardware.

 I was thinking about rsync trought heartbeat's network interface to
 sincronize them too, without drbd, what do you think?

Drbd is easier and safer.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Some questions about High Availability

2012-11-30 Thread Dimitri Maziuk
On 11/30/2012 01:08 PM, Rodrigo Abrantes Antunes wrote:
 Hi, I'm new to heartbeat, just started to read about it a week ago. I'm
 using ubuntu, I installed heartbeat 2.1.4 in my two nodes. Here is ha.cf of
 them:
 
   logfacility local0
   keepalive 2
   deadtime 5
   udpport 694
   bcast   eth5
   auto_failback on
   nodenode1
   nodenode2

I think you may need crm no as well.

What's your eth5 connected to?

   And here is haresources (I'm testing only apache, the ip address don't
 matter):
   node1 apache2
 
   I stoped heartbeat service in node1 and then apache started in node2 ,
 then I started heartbeat service in node1 and apache started in node1 but
 stayed active in node2. Shouldn't the one in node2 be stopped?

Yes it should be. Read the logs (tail -f on one console while stopping
heartbeat on another).

 Another
 question: If I stop apache service in one node it don't start in the other
 node, shouldn't it be started since apache is down? How heartbeat monitors
 the process to see if it is down or up?

No because in your configuration heartbeat doesn't monitor services.
Install mon and write a custom alert script that does
 /usr/share/heartbeat/hb_standby all
when process dies.

 What I want to ask is wich
 one is the best to syncronize ALL data between the 2 nodes including data
 that may be in use

Simple setup: put it all on one filesystem and put that on DRBD. See
http://www.drbd.org/users-guide-8.3/s-heartbeat-r1.html

Also recommended: drbdlinks.

HTH
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: pcs or crmsh?

2012-11-16 Thread Dimitri Maziuk
On 11/16/2012 02:56 AM, Lars Marowsky-Bree wrote:

 Sure. And I guess the only way to find out is to see how it unfolds.
 I've said my piece (for now), and I don't want to continue pissing you
 off ;-) But I also needed to get it off my chest.

Guys,

the technically sound option for going forward is to teach the init
replacement du jour, a service monitoring subsystem, and a cluster
communication subsystem talk to each other, and arrive at a
half-properly designed cluster operating system.

GUI manglement crap is the eye candy that people dishing out funds
understand. For people actually running clusters, they'll use it once to
configure the cluster, and then (ideally) not touch any of it for years
to come. So in terms of cost/benefit to the actual admins, a clean set
of logically organized readable config files that can be simply copied
to the new node is worth 50 GUIs -- restful, xmlless,
buzzword-du-jour-compliant, and
django-asynchronous-json-script-framework-template driven.

Bickering over whether one of those things is better than another is bad
for your nerves and that's all it is. Actually believing it is bad for
your karma: you'll get reborn as a cockroach or worse: a copyright lawyer.

IMO
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Question about linux cluster choice for Nagios Mysql

2012-11-15 Thread Dimitri Maziuk
On 11/15/2012 10:11 AM, Digimer wrote:
 On 11/15/2012 02:52 AM, julien.marie...@soget.fr wrote:
 Hello,

 I have to secure a homemade monitoring solution mainly based on Nagios 2.x
 and MySQL 5.1.

 I must deploy an active / passive cluster with automated switch of
 services. 2 servers will be located on two different datacentres and
 connected by an optical fiber (which will be channeled through the lifeline
 + cluster replication data).
 
 What you are trying to do is called a stretch cluster. If you want
 automatic failover, you will have some significant challenges. Mainly,
 when a node stops responding, it needs to be put into a known state to
 ensure that the same service isn't offered twice or that shared storage
 is not happening without coordination.
 
 This is done using fencing, and fencing only really useful when it uses
 an independent network path. So dual links are needed. Now that
 probability of failing both links at the same time is real (someone digs
 without looking, for example) would break the cluster's fencing, leaving
 the nodes hung until there is human intervention.
 
 Stretch clustering requires very careful planning and rarely is worth it.

So where do nagios and mysql come into the picture?

 Tests were carried out with products DRBD (8.3.7)  Heartbeat (3.0.3) using
 the official Debian mirrors.
 
 DRBD 8.3.7 is *very* old. Heartbeat is deprecated and has no future
 development planned.

Which doesn't mean you shouldn't use heartbeat for simple stupid
2-node active/passive 'haresources' cluster. You shouldn't use *if* you
need more than simple stupid. The good news is it's not changing to
something not entirely dissimilar every 18 months, unlike everything
that's been developed since.

DRBD is old but our public servers have been running 8.3 for quote some
time now without problems.

(Our centos 5 servers have been running heartbeat 2.1.4 and drbd 8.3.8
for years now.)

 I wanted to get your opinion on the various security products such cluster
 (HA / Pacemaker / Corosync / keepalived / OpenSVC ...) to point me towards
 the most efficient and adapted according to my needs.

Where'd security products come from? Do you mean you nagios+mysql
setup is doing some sort of security monitoring? The good thing about
heartbeat is it's not being developed anymore. So what you've learned
about it remains relevant.

 The future of open source clustering is on corosync + pacemaker. I would
 start by learning more about them.

I would wait a year. They'll come up with something else and you'll have
to unlearn the old busted coronary+zapper and learn about the new
shiny+hotness instead.

But for the most part: what is you're trying to actually do?

Using drbd for database replication is suboptimal, especially over
non-local links. You really want transactional replication and if mysql
doesn't do it, switch to the one that does.

As for nagios, why not set up two independent ones monitoring everything
and each other? I suspect you can go a lot with a few lines of perl to
make sure you don't get double the e-mail.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Question about linux cluster choice for Nagios Mysql

2012-11-15 Thread Dimitri Maziuk
On 11/15/2012 02:03 PM, Dimitri Maziuk wrote:

Apologies for bad cun-n-paste:

 Where'd security products come from? Do you mean you nagios+mysql
 setup is doing some sort of security monitoring? The good thing about
 heartbeat is it's not being developed anymore. So what you've learned
 about it remains relevant.

-- the second sentence wasn't supposed to be there.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: cib_replace failed?

2012-11-14 Thread Dimitri Maziuk
On 11/14/2012 01:46 AM, Ulrich Windl wrote:

 This recommendation is against best practices: The FQHN is usually the first
 name in /etc/hosts, aliases (short names) following. Probably it's better to
 fix the application rather than fiddling with /etc/hosts.

It is actually worse than that: for as long as I remember RH has
included a trap for young players where if you edit /etc/hosts all sorts
of interesting things may happen after next reboot. Or rpm update.
Depending on your choice of editor and phase of the moon.

None of my RH6 machines have the hostname in /etc/hosts anymore anyway,
all there is is localhost and localhost6. (And I think RHEL5 install
scripts may or may not put it there dep. on the install mode: DVD vs
netboot or something.)

*Using the hostname* is against best practices. Reading it from
/etc/hosts is an automatic F on unix network programming.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: cib_replace failed?

2012-11-14 Thread Dimitri Maziuk
On 11/14/2012 11:12 AM, Robinson, Eric wrote:
 It is actually worse than that: for as long as I remember RH 
 has included a trap for young players where if you edit 
 /etc/hosts all sorts of interesting things may happen after 
 next reboot. Or rpm update.
 Depending on your choice of editor and phase of the moon.

 None of my RH6 machines have the hostname in /etc/hosts 
 anymore anyway, all there is is localhost and localhost6. 
 (And I think RHEL5 install scripts may or may not put it 
 there dep. on the install mode: DVD vs netboot or something.)

 --
 Dimitri Maziuk
 
 I only have 30 or so RHEL servers (5.X and 6.X) but they all have the 
 hostnames in /etc/hosts using the format...
 
 a.b.c.d   thishost.domain.com thishost

I have 29 centos 6 servers and workstations and none of them has it.
They're all a) installed off netinst, b) had valid dns records for their
ips at install time, and c) I run system-config-network first thing
(after 'selinux off' + reboot) to create all the hardlinks I alluded to
in the previous e-mail.

 *Using the hostname* is against best practices. Reading it 
 from /etc/hosts is an automatic F on unix network programming.

 
 What's the point of having...
 
 hosts: files,dns
 
 ..in nsswitch.conf if it is against best practices? I might be
misunderstanding your meaning.


It's two points: 1) computers don't get names, they get numbers. If you
can't use ip address, 2) consider the relationship between what you read
from /etc/hosts and the reality seen by your network stack when that
nsswitch.conf line reads

hosts: dns, ldap

If you have to have a single (no multiple ip or macs, no cname aliases)
unique (no shared ip or rr dns) host id for your application, you
better make your own and put it in your app's config file.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: cib_replace failed?

2012-11-14 Thread Dimitri Maziuk
On 11/14/2012 11:56 AM, Dimitri Maziuk wrote:

 I have 29 centos 6 servers and workstations and none of them has it.
 They're all a) installed off netinst, b) had valid dns records for their
 ips at install time, and c) I run system-config-network first thing
 (after 'selinux off' + reboot) to create all the hardlinks I alluded to
 in the previous e-mail.

PS. I bet it's b) -- if hostname already resolves their installer
doesn't bother writing it out to
/etc/sysconfig/networking/profiles/default/except/on/even/fridays/ifup-ethX
or whatever they use as the primary location for that stuff.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: pcs or crmsh?

2012-11-14 Thread Dimitri Maziuk
On 11/14/2012 01:17 PM, Lars Marowsky-Bree wrote:

 must bang my head against a wall must not scream MUST NOT SCREAM

Look at the bright side: at least you work with computer people. They
could've been *life scientists* (who are in some respects a huge step up
from sheet-metal production company owners/operators but in surprisingly
many respects aren't).

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] cib_replace failed?

2012-11-02 Thread Dimitri Maziuk
On 11/02/2012 10:38 AM, Robinson, Eric wrote:
 From looking at my logs, it seems that the nodes are repeatedly
 joining the cluster and then leaving. I never get a DC. Following is
 what I get when I start Pacemaker. I actually get hundreds of PAGES
 of log entries per second... 

One has to wonder if the cause of problem is your systems are bogged
down by iowait resulting from all that logging and are e.g. dropping
packets.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] resource monitor timeout, Killing with signal SIGTERM (15).

2012-10-24 Thread Dimitri Maziuk
On 10/23/2012 08:28 PM, Andrew Beekhof wrote:

 I'm happy you have something that works for you.
 Although even if you're using it in haresources mode, your resource
 agents are still years out of date.

It doesn't have resource agents (that's one of its pluses in my book).

 No-one *has* to run 2.1.4.

Get a real job  you'll find out that you sometimes have to run specific
things at specific versions. Or you may install epel packages but not
elrepo ones. And other stupid crap like that.

Back when I went through that exercise hand-crafted XML was the only
thing google could find for 2.1.4 crm mode. The difference now is that
haresources option is much harder to find.

So we don't know what OP's doing and why. Again the right answer here
is not seriously.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] resource monitor timeout, Killing with signal SIGTERM (15).

2012-10-24 Thread Dimitri Maziuk

PS. but for the most part, like you said: you *have* people stuck on
2.1.4 and you keep supporting them much as you hate it. I don't think OP
had any indication of it being a new deployment or an old system they're
stuck with, so, yes. Seriously.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] resource monitor timeout, Killing with signal SIGTERM (15).

2012-10-24 Thread Dimitri Maziuk
On 10/24/2012 02:22 PM, Lars Marowsky-Bree wrote:
 On 2012-10-24T13:23:09, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:
 
 PS. but for the most part, like you said: you *have* people stuck on
 2.1.4 and you keep supporting them much as you hate it.
 
 Yes, but on SLES10, that was an actually shipping version with full
 support.
 
 EPEL has different policies than RHEL. Those are pretty clearly
 advertised; heck, I work for SUSE, and even I know that ;-) I think
 you're being a bit unfair in the expectations here.

I think the OP didn't say either I am creating a new cluster using
2.1.4 nor I have this 2.1.4 cluster inflicted upon me so the
ass-umption that he can just drop everything and upgrade is the
unwarranted one.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] resource monitor timeout, Killing with signal SIGTERM (15).

2012-10-19 Thread Dimitri Maziuk
On 10/19/2012 03:33 AM, Andrew Beekhof wrote:
 On Fri, Oct 19, 2012 at 6:22 PM, Thanachit Wichianchai
 thanachi...@googlemail.com wrote:
 Hello Linux-HA community,


 Current Setup:

 Linux HA Version: 2.1.4
 
 Seriously?
 Stop now and upgrade to something more recent.  I beg you.

RHEL 5 will be in production phase until 2017. People will be stuck with
2.1.4 until 2017. Get used to it already.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] OS System update in live cluster ?

2012-09-05 Thread Dimitri Maziuk
On 09/05/2012 01:26 AM, Stefan Schloesser wrote:

 my problem with the rolling upgrade is the drbd partition. If you
migrate the service its data will move too. If you then restart the
cluster and migrate back the data will not be in an upgraded state and
thus not match the binary. Hence you can't be sure your service will
start on the recently upgraded node with the old data version.

I think if your requirements are zero downtime and a database engine
that's not compatible with itself, your only option is to not upgrade.
Otherwise one or the other has to go.

HTH
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Leave Apache running on both active and passive nodes?

2012-08-22 Thread Dimitri Maziuk
On 08/22/2012 12:21 PM, Jon Heese wrote:

 I renew my call for anyone who knows of a way to leave a resource
running on all nodes at once. Are there any developers on this list that
may know of more esoteric options for the OCF and/or LSB resource types,
or do I have to join the developer list for that?

With heartbeat-r1 you simply copied /etc/init.d/httpd to
/etc/ha.d/resource.d/httpd_foo (to avoid name conflict), changed stop)
in httpd_foo  to be a noop and ran httpd_foo resource.

I'm sure you can do this with RA scripts, however, tracing the details
through the maze of shell includes and obscure variables in them is one
of the reasons I'm sticking to heartbeat-r1 in the first place.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Leave Apache running on both active and passive nodes?

2012-08-22 Thread Dimitri Maziuk
On 08/22/2012 04:22 PM, Jon Heese wrote:
 Can you talk them into buying a couple of SSDs?

 That won't work, and here's why:
 
 1. No (zero, zilch, nada) budget for this project.

Ah, yes: time I spend on this at my not so low hourly wage is cheaper
than any budget for the project. I see you work here too.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] IP Clone

2012-08-21 Thread Dimitri Maziuk
On 08/20/2012 06:01 PM, David Lang wrote:

 ANYCAST has severe limitations on what you can do with it, but CLUSTERIP is 
 far 
 more flexible and can work in just about any local active/active problem.

Apples have severe limitations on the amount of orange juice you can
squeeze out of them, but oranges are far more juicy.

-- in other words, that is misleading at best.

Anycast is a router hack so it works over *routed* networks. Clusterip
is *link-layer* broadcast so it works on single ethernet segment.

One is for keeping core dns servers operational if the Internet breaks,
the other is for when ldirectord is too hard.

One is for when multiple servers won't all reply at once because only
one of them is visible to the reachable network, the other has a fixed
rule that decides which server answers which clients.

And so on.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Leave Apache running on both active and passive nodes?

2012-08-21 Thread Dimitri Maziuk
On 08/21/2012 12:48 PM, Jon Heese wrote:

 
 Everything's working properly on failover and all that, but I'd like
for the apache2 service to remain running on both nodes all the time,
but for a failover to be triggered if it goes down on the active node.

Why? Most daemons do not re-bind when another ip address is added to the
system. So once your ip's migrated you need to restart apache (kill -HUP
may do it, I'm not sure).

In general if you want to do that you also want to clone the ip --
currently being discussed in the other thread, take a look.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Leave Apache running on both active and passive nodes?

2012-08-21 Thread Dimitri Maziuk
On 08/21/2012 01:11 PM, Jon Heese wrote:
 In testing I've found that as long as Apache binds to 0.0.0.0, any IP 
 takeovers will work smoothly without Apache restarts.

Useful to know, thanks.

 1. We monitor the Apache services on these two hosts with a service
monitoring app that can't be told only one of these two must be running.

I monitor apache on cluster ip and separately monitor heartbeat on the
nodes (via net-snmp's proc extension).

 3. For some reason, it takes a good 5-10 seconds for Apache to start
after a failover occurs. The IPAddr2 and MailTo resources start within 2
seconds, but Apache takes longer.

I'd repost with apache takes 5-10s to failover subject line. I get 5
sec failovers with drbd and mysql and tomcat in addition to apache, so I
think you do have an issue there. (But I run my active/passive clusters
on heartbeat-R1, so can't help with your setup.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Leave Apache running on both active and passive nodes?

2012-08-21 Thread Dimitri Maziuk
On 08/21/2012 03:06 PM, David Lang wrote:
 On Tue, 21 Aug 2012, Marcus Bointon wrote:

 I've done this for years with haproxy by allowing non-local binding:
 
 I've been doing the same thing. Failovers work much faster when all you need 
 to 
 do is to move the IP and not start/stop software
 
 I've never figured out why people use heartbeat to 'manage' the web front 
 end when it can just stay running. If haproxy is pointing at multiple web 
 servers, it can deal with monitoring, failover and balancing for them.
 
 One legitimate reason for doing this is that you can then have heartbeat 
 'monitor' the webserver and if the webserver dies, initiate a failover.

Running more than just apache would be another: I usually also have
vsftpd and document root on drbd drive.

 I still have most systems using the version 1 style haresources config. It's 
 great for doing the simple failover scenario easily.

Me too, only s/most/all/.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] IP Clone

2012-08-20 Thread Dimitri Maziuk
On 08/20/2012 04:19 PM, Yount, William D wrote:
 No ideas?

You lost me at I would like the IP address to run on both servers at
the same time -- IME pacemaker not letting you do that is a feature.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] IP Clone

2012-08-20 Thread Dimitri Maziuk
On 08/20/2012 05:01 PM, Yount, William D wrote:
 I am trying to set up an Active/Active cluster. I have an
Active/Passive cluster up and running.

I don't remember seeing a clear explanation of when, where, and why
you'd actually want an active/active cluster. I never needed one myself,
so can't really help you there.

 I don't understand how it could be called an Active/Active cluster
 if you aren't allowed to run the IP address on two servers at once.

You are not allowed to run the IP address on two servers at once, full
stop. Complain to Rob Kahn and Vint Cerf.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] IP Clone

2012-08-20 Thread Dimitri Maziuk
On 8/20/2012 7:32 PM, Andrew Beekhof wrote:
 On Tue, Aug 21, 2012 at 8:49 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote:

 You are not allowed to run the IP address on two servers at once, full
 stop. Complain to Rob Kahn and Vint Cerf.

 Thats not strictly true.

In the same way it's not strictly true that every phone must have a 
unique number: if you don't mind other people getting your important 
calls then by all means.

CLUSTERIP which you presumably mean by fun with iptables is basically 
Jack gets all calls from even area codes and Jill: from odd area 
codes. Yeah, you cold do that, I just can't imagine why.

Because the commonly given rationale for all this is load balancing and 
-- well, duh -- there are load balancers for that. They don't require 
same ip address on multiple hosts either.

Dima

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: open-iscsi won't automatically log in to set up SBD device

2012-08-09 Thread Dimitri Maziuk
On 08/09/2012 03:10 AM, alain.mou...@bull.net wrote:
 Still don't agree, but the good thing is that everyone has the choice 
 between both configuration autostart or noautostart ;-)

It's what you're trying to ha against:
- if your hardware spontaneously reboots every high tide and gremlins
are running inside messing with software installs and configs, then of
course you don't want autostart.

- If your main concern is hardware failure and main reason for reboot: a
manual os update, then you probably want autostart.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] DRBD and automatic sync

2012-08-03 Thread Dimitri Maziuk
On 08/03/2012 02:35 AM, Elvis Altherr wrote:
 Am 03.08.2012 09:32, schrieb emmanuel segura:
 are you using ext3 for drbd active/active? UM
...
 yes.. woud i better use GFS2 or OFCS (which both dosen't work under 
 kernel 3.x) ?
 
 Or which is the best file system porpouse for my case?

If you're on 3.x why not use ceph and ditch the whole drbd/pacemaker
thing altogether? It can't be worse than ext3 on dual-primary drbd using
haresources mode.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Fencing

2012-08-02 Thread Dimitri Maziuk
On 08/02/2012 05:28 PM, Yount, William D wrote:
 So I have been giving some thought to my fencing agent, as it seems
 a
proper fencing solution is integral to any cluster. I only have access
to basic Optiplex 990 and 960 desktops which is what I have built my
cloud out of. I have been using the fence_pcmk agent but that doesn't
seem to be a great solution. It can send shutdown and reboot commands to
other nodes, but the nodes usually get hung-up in their shutdown scripts
waiting for NFS shares to unmount or for VMs to turn off.
 
 One agent that seems to come up a lot is OpenIPMI. I have done some
reading on it and I can't quite get an understanding of if this is a
software solution that doesn't need any hardware implementation such as
a special IPMI management board. Can anyone let me know if I need
hardware that specifically supports IPMI?

Our 7xx series optiplexes don't have IPMI boards. Dunno about 9x0
series, but I highly doubt it: it's something they put in machines
that's supposed to run headless, not in desktops.

Aside from networked PDUs (last I looked TrippLite had better bang/buck
options than APC on just just about anything), the other option is to
cut the node off at the switch and let it sit until someone comes in to
kick it. Assuming you don't have multiple redundant NICs, you do have
the someone, and so on and so forth.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Pacemaker and software RAID using shared storage.

2012-07-20 Thread Dimitri Maziuk
On 7/20/2012 4:42 AM, Caspar Smit wrote:
 Hi Dimitri,

 I got some test results for you.

 I built the setup as described (two servers with LSI 9200-8e cards,
 One Supermicro 847E26-RJBOD1, connected expander 1 to server 1 and
 expander 2 to server 2).

 When I plug in a SAS disk both servers see the disk at the same time :)
 I can create MD sets on server 1, do a mdadm -S /dev/mdX on server 1
 and a mdadm -A /dev/mdX on server 2 without any problem.

 When using SATA disks the setup DOESN'T work, only expander 1 (server
 1) is able to see the disk.

Hmm. I wonder if you can hot-plug it on faliover somehow, e.g. via 
/proc/scsi/scsi.

Dima
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] HA for NFS without DRBD

2012-07-11 Thread Dimitri Maziuk
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

dj mko...@gmail.com wrote:

Greetings!

I am new to setting up Linux Clusters. I am setting up a two node HA
cluster (RHEL 5.8) for Postgres. We like to have NFS
mount fail over from node 1 to 2.

My understanding is that DRBD needs shared DISK  we don't have that.
How
can we set up the HA with Hearbeat
for NFS with out DRBD?

NFS *mount* or NFS *export*? For the former you'll just need an lsb-like script 
that does a mount on 'start' and umount on'stop' -- although I can't imagine 
why not just use the automounter. For the latter you need to copy the data from 
one node to the other.

Dima

- --
Sent from my EeePad with K-9 Mail.
-BEGIN PGP SIGNATURE-
Version: APG v1.0.8

iHUEAREIADUFAk/9Rv8uHERpbWl0cmkgTWF6aXVrIChEaW1hKSA8ZG1heml1a0Bi
bXJiLndpc2MuZWR1PgAKCRDHMby22qRtuMGWAJ0VNzUbDnRXXJ8u4nf03J9nqq9t
JgCgonBi7f1SWcoG5itJDXxnnm+wN/4=
=Sm43
-END PGP SIGNATURE-

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] DRBD/Pacemaker/Heartbeat

2012-06-06 Thread Dimitri Maziuk
On 06/05/2012 11:13 PM, Yount, William D wrote:

 The issue I am having is that if I take the network cable out of the
 primary service, thus simulating an unplanned outage, nothing
 happens. The secondary server isn't promoted to primary, it doesn't
 mount /dev/drbd0 to /Storage and the 10.89.99.30 IP address doesn't
 fail over.

Without looking at the details of your setup

1. how would your primary server know that 10.89.99.30 is unreachable
*from the outside* and
2. if your secondary server can't reach 10.89.99.30, how does it tell
whether it's the primary or itself that got cut off?

In my R1 clusters I have a mon script that greps for link detected:
yes in the output of ethtool. Obviously, if both go off (e.g. the
switch loses power), that's gonna screw things up, but then I'll have
bigger problems anyway.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] DRBD-8.4

2012-05-20 Thread Dimitri Maziuk
On 5/20/2012 7:54 AM, Willi Fehler wrote:
 Hi all,

 I would like to upgrade my cluster to DRBD-8.4.1. Currently my cluster
 is using DRBD-8.3.12 on CentOS-6.2.

FWIW I've been using 8.4.0 from ATrpms for a while now without problems. 
Never tried upgrading a live cluster from 8.3 to 8.4, though.

 I'm using DRBD packages from the ELRepo repository.

Does their kernel module package work across kernel updates?

What I find extremely annoying is that atrpm's packagea are named (e.g.) 
drbd-kmdl-2.6.32-220.17.1.el6.centosplus.x86_64 -- i.e. kernel arch, 
version, and repo are all part of the package *name*. You have to 
remember to yum install new drbd-kmdl every time centos releases a 
kernel update. Sometimes atrpms take a few days to build it, and 
sometimes they never do (happend with some centosplus kernels).

It's annoying enough to bite the bullet  upgrade to elrepo packages 
-- provided they built their kernel module properly and it doesn't 
require a reinstall every damn time.

Dima
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] weird problem w/ R1

2012-05-18 Thread Dimitri Maziuk
On 05/09/2012 07:07 AM, Dejan Muhamedagic wrote:
 Hi,
 
 On Wed, May 02, 2012 at 06:25:55PM -0500, Dimitri Maziuk wrote:
 
[heartbeat] doesn't seem to run that particular script: it starts
 pure_uploadscript from resource.d and mon from init.d, but not the one
 in between. What's weird is I now have it happening on 2 clusters:
 centos 5 w/ heartbeat 2.1.4, and centos 6 w/ heartbeat 3.0.4. The only
 common thing is bacula version: 5.

 Any ideas?
 
 No, but you can add set -x in some places in ResourceManager and
 see what gives.

Not very useful on 2.1: it prints out the very 1st resource (drbd fs)
and stops there. And all that goes to the console, not to the log --
it's a good thing I have these boxes hooked up to a terminal server.

However, it did (eventually) lead me to status) case in my
bacula-client startup script: I forgot to change status $prog to
status -p $pidfile $prog. So it was reporting bacula-fd as already
running -- correctly, since there's already an instance listening on the
standard port. It just wasn't the right instance.

Thanks Dejan for pointing me in the right direction.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] weird problem w/ R1

2012-05-02 Thread Dimitri Maziuk
Hi everyone,

I must be overlooking something obvious... I have a simple haresources
setup with

node drbddisk::sessdata Filesystem::/dev/drbd0::/raid::ext3::rw \
ip.addr httpd xinetd pure_ftpd pure_uploadscript bacula-client mon

bacula-client is in /etc/ha.d/resource.d, it's a copy of stock
/etc/init.d/bacula-fd with config, lock, and pid file changed to make it
listen on a non-standard port: this is for backing up drbd filesystem
(there's the standard bacula client running also).

bacula-client doesn't start. I added a couple of 'logger' lines and if I
manually run /etc/ha.d/resource.d/bacula-client start ; echo $? I get
0 and the log:
node logger: starting bacula-fd -c /etc/bacula/deposit-fd.conf
node logger: bacula-fd -c /etc/bacula/deposit-fd.conf running

Yet on failover I get this:

node ResourceManager[3734]: info: Running /etc/init.d/httpd  start
node ResourceManager[3734]: info: Running /etc/init.d/xinetd  start
node ResourceManager[3734]: info: Running /etc/ha.d/resource.d/pure_ftpd
 start
node xinetd[4204]: xinetd Version 2.3.14 started with libwrap loadavg
labeled-networking options compiled in.
node xinetd[4204]: Started working: 1 available service
node ResourceManager[3734]: info: Running
/etc/ha.d/resource.d/pure_uploadscript  start
node ResourceManager[3734]: info: Running /etc/init.d/mon  start

It doesn't seem to run that particular script: it starts
pure_uploadscript from resource.d and mon from init.d, but not the one
in between. What's weird is I now have it happening on 2 clusters:
centos 5 w/ heartbeat 2.1.4, and centos 6 w/ heartbeat 3.0.4. The only
common thing is bacula version: 5.

Any ideas?

TIA
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HA samba?

2012-04-30 Thread Dimitri Maziuk
On 04/30/2012 05:04 PM, Seth Galitzer wrote:
 This was a bit trickier to get worked out, but I have made some 
 progress.  It turns out just putting the metadata on a shared disk 
 resource and symlinking wasn't quite enough.  nmbd (the netbios 
 management daemon that samba uses) complained that the symlink to its 
 working directory wasn't a real directory.

Why not use your AD controller (or whatever they call it) to be browse
master and netbios name server?

 The other new oddity is that after I've put the primary into standby and 
 everything has failed over to the secondary, as soon as I bring the 
 primary back online, the resources try to switch back, i.e. they don't 
 stay on the secondary (new primary) as expected.

As I recall clusters from scratch have a paragraph on that. (Basically,
it's configurable, it may be desirable if e.g. you're using a
low-powered back-up node.)

(I can't be more specific because I'm using R1 configs here, not crm.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HA samba?

2012-04-25 Thread Dimitri Maziuk
On 04/25/2012 03:53 PM, Seth Galitzer wrote:
 Can anybody point me to recent docs on how to go about setting this up? 
   I've found several much older posts, but not much current with any 
 kind of helpful detail.

If you're running active/passive DRBD, it's what the wiki page calls
mounted on one node at a time. That one's simple: use drbdlinks to
keep everything incl. /etc/samba on the drbd filesystem and fire up smbd
and nmbd after drbdlinks -- pretty much like any other daemon backed by
drbd storage.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HA samba?

2012-04-25 Thread Dimitri Maziuk
On 04/25/2012 05:28 PM, Seth Galitzer wrote:

 I see how that will get all the locking and user data and that should be 
 easy enough to configure.  But I'm also doing ADS integration instead of 
 winbind, and that also seems to be a problem as only one node can be 
 joined to the AD at a time, even with a shared IP.  Any suggestions for 
 that?

I've user-level security, samba accounts in OpenLDAP, and no AD, so no
suggestions on that. (To me the howto reads like you need to make sure
you register the cluster ip (not node ip) in AD and then you shouldn't
need to re-join the domain on failover.)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] ocf:heartbeat:apache resource agent and timeouts

2012-04-06 Thread Dimitri Maziuk
On 04/06/2012 06:15 AM, Dejan Muhamedagic wrote:
 On Thu, Apr 05, 2012 at 11:54:44AM -0500, Dimitri Maziuk wrote:

 How well the server is performing implies that it's up and running, so 
 it's a valid test -- in the same sense that counting ice cubes in the 
 freezer compartment is a valid test to see if your fridge is working.
 
 Oh, well... and isn't it?

The converse isn't true: having zero ice cubes in the freezer does *not*
mean the fridge *isn't* working.

Shooting the node because it doesn't get to
http://127.0.0.1/server-status within some number of seconds may be
exactly what the user wants. Just as long as the user understands how
that relates to his actual webpages served on his cluster ip.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Command execution prior to resource start

2012-03-17 Thread Dimitri Maziuk
On 3/17/2012 7:06 AM, Charles Williams wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hey all,

 I have been looking all over for a way to do this and have yet to find
 anything.

 I have a symlink resource that links an nfs share to /var/spool/cron
 (/var/spool/cron -  /mnt/imports/nvwh2.bluedotmedia.de/var/spool/cron)
 and I need to be able to check if /var/spool/cron exists and if so if
 it's a symlink prior to starting the resource. If it's an actual
 directory I will need to delete it, start the resource and restart cron.
 If not then cron will not be able to access the spool directory because
 the inode has changed.

 I have been looking at the dummy resource to maybe be able to do this
 but am not sure if it will work.

 Does anyone have an idea how this could be accomplished?

drbdlinks will rename /var/spool/cron to /var/spool/cron.drbdlink and 
create the symlink on start and revert it on stop. /var/spool/cron 
has to exist (and I think be a directory, not a symlink) and you'll need 
something else to restart cron.

Dima
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

2012-03-14 Thread Dimitri Maziuk
On 03/14/2012 05:22 PM, William Seligman wrote:

 Now consider a primary-primary cluster. Both run the same resource. One fails.
 There's no failover here; the other box still runs the resource. In my case, 
 the
 only thing that has to work is cloned cluster IP address, and that I've 
 verified
 to my satisfaction.

That may be true if you offer only completely stateless services over
UDP on the cluster IP address. Or running some interesting network stack
on top of IP. According to my (admittedly fading) memory of networking
101 TCP-based services don't quite work that way.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Active-passive cluster, best practice question

2012-02-08 Thread Dimitri Maziuk
On 02/08/2012 12:23 PM, Jonathan Schaeffer wrote:

 Thanks for your feedbacks.
 
 The rsync method is very simple to implement on 2 nodes ...but does it
 scale with 5 nodes or so ? If a modification is done on one node, how
 would rsync propagate it on every other ?

How are you going to do active/passive on 5 nodes?

As far as rsync goes, the difference is between rsync x rsync://foo/x
and for i in foo bar baz ; do rsync x rsync://${i}/x ; done -- i.e. a
non-issue really.
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] RA manipulation of iptables firewall?

2012-01-27 Thread Dimitri Maziuk
On 01/27/2012 02:22 PM, David Gersic wrote:
 I have an application that must simultaneously run as a non-root
 user
and listen on a port below 1024. I can do this, by hand, by making some
iptables rules forwarding the traffic from the low port on a public ip
address to a high port on a private ip address. Now I'm trying to find a
way to run this app in my cluster. So far, I'm not seeing an RA already
set up to do this.

Why not make it static? I think you might even be able to exclude them
from dynamic pool via /proc/sys/net/ipv4, but don't quote me on this.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] RA manipulation of iptables firewall?

2012-01-27 Thread Dimitri Maziuk
On 01/27/2012 02:48 PM, David Gersic wrote:
 On 1/27/2012 at 02:37 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: 

 Why not make it static? 
 
 Yeah, I could, but I didn't want to. I wanted to make it part of the
resource group so it'll even be there if I add a new cluster node and
move the group to it.

Fair enough. Rewriting iptables rules in a script is not something I'd
recommend, though.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Services does not react after IP Takeover

2012-01-23 Thread Dimitri Maziuk
On 01/23/2012 08:35 AM, Niclas Müller wrote:
 I've tryed something like this already. The IP changed but the running 
 service react on the Virtual-IP after 10-15 sec. I've wonderd

With active/passive it's usually the database that slows things down:
with shutdown transactional it can take forever. If you're seeing that
with active/active, then either your other node isn't really active or
routing's messed up or something else's broken.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Single Point of Failure

2012-01-13 Thread Dimitri Maziuk
On 01/13/2012 01:57 PM, Paul O'Rorke wrote:
 he he - I already thought that might be simpler...

Part of it is what do you mean by deliver client e-mails. SMTP is one
thing, POP/IMAP is another, direct read from mbox file is different
still (though it sits behind pop/imap as well). Another side is although
SMTP RFCs start with reliable and efficiently, e-mail delivery has
always been best effort -- so I would not build a business plan on the
assumption that e-mail is or will ever be reliable.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Single Point of Failure

2012-01-12 Thread Dimitri Maziuk
On 01/12/2012 05:22 PM, Paul O'Rorke wrote:
 hmmm - it looks like I may have to re-evaluate this.
 
 Geographic redundency is the point of this exercise, our office is in a
 location that has is less than ideal history for power reliability.  We are
 a small software company and rely on email for online sales and product
 delivery so our solution - what ever it be - must allow for one location to
 completely lose power and still deliver client emails.

One solution is called backup generator and a big fuel tank. Another
one is called gmail.

HTH
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] compiling cluster glue in solaris 10 getting error

2011-12-27 Thread Dimitri Maziuk
On 12/27/2011 01:22 AM, Nikita Michalko wrote:
 Hmmm, it seems you don't know nothing about Christmas - no wonder ...
 I don't know nothing about solaris 10, but obviously  are you missing some
 packages for compiling:
 - maybe libbz2 + libbz2-devel ?

What he's missing is the answer to which part of 'Linux-HA' spells
Solaris.

I actually know how to build gnuware on solaris, so if my employer told
me to do that I'd first try to talk them out of it. And if that failed,
I'd seriously consider changing jobs.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover from external access dosent' work

2011-12-13 Thread Dimitri Maziuk
On 12/13/2011 09:38 AM, Elvis Altherr wrote:

 Node 10.0.0.2 takes over the ressources and reserves the virtual IP 10.0.0.3
 
 now if i access the Website from external (p.a. via office) i't dosen't work
 
 i also checked the firewall which is correct configured and all web
 requests will be forwarded from the public static IP 62.2.208.170  to
 the IP 10.0.0.3
 
 so what else coud the problem?

Did it work when 10.0.0.1 was up? Does it work from 10.0.0.1 now?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover from external access dosent' work

2011-12-13 Thread Dimitri Maziuk
On 12/13/2011 12:48 PM, Elvis Altherr wrote:
 Am 13.12.2011 19:05, schrieb Dimitri Maziuk:
 On 12/13/2011 09:38 AM, Elvis Altherr wrote:

 Node 10.0.0.2 takes over the ressources and reserves the virtual IP 10.0.0.3

 now if i access the Website from external (p.a. via office) i't dosen't work

 i also checked the firewall which is correct configured and all web
 requests will be forwarded from the public static IP 62.2.208.170  to
 the IP 10.0.0.3

 so what else coud the problem?
 Did it work when 10.0.0.1 was up? Does it work from 10.0.0.1 now?
 Yes this works also on the secondary node 10.0.0.2
 
 means telnet 10.0.0.1  80 - works
 also 10.0.0.2 80 (if secondary node is up of course or apache started)

No I meant telnet 10.0.0.3 80.

Make sure you start apache *after* 10.0.0.3 is up.

 it also works within my private net 192.168.1.0/24
 
 if open the test page www.x/php/phpinfo.php it shows me the correct 
 hostname (mail2 if node one is up, and disthost2 if node two is up)
 
 Also the firewall logs shows the correct forwarding to the Cluster IP 
 10.0.0.3

This sounds like http://62.2.208.170/ is inaccessible from your office.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover from external access dosent' work

2011-12-13 Thread Dimitri Maziuk
On 12/13/2011 01:45 PM, Elvis Altherr wrote:

 ok i will check.. but i think this ins't the problem cause if a do a 
 portforwarding (NAT) to the webserver (mail2 = Node1) it works fine

Try this, then:
http://lists.linux-ha.org/pipermail/linux-ha/2008-March/031612.html

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-30 Thread Dimitri Maziuk
On 11/30/2011 2:01 AM, Ulrich Windl wrote:
 Dimitri Maziukdmaz...@bmrb.wisc.edu  schrieb am 29.11.2011 um 19:36 in
 Nachricht4ed52637.9080...@bmrb.wisc.edu:
 On 11/29/2011 07:49 AM, Lars Marowsky-Bree wrote:

 (But the mdadm operations the RA does also shouldn't cause data
 corruption. That strikes me as an MD bug.)

 If you repeatedly try to re-sync with a dying disk, with each resync
 interrupted by i/o error, you will get data corruption sooner or later.
 It's only MD bug in a sense that MD can't actually stop you from
 shooting yourself.

 I'd like to know more details: Which disk has an I/O error: source
 or
destination of the sync. How is data corruption created?

Well that's the point: if you have 2 disks, and neither has failed yet, 
how do you pick the one that isn't failing?

Specific failure mode I'm talking about is busy relocating bad sectors. 
Until the SMART counter hits the threshold value it's not failed, but 
you'll see sata timeouts/resets in /var/log/messages with spiking i/o 
wait and those all sorts of hangs Lars mentioned. If mdadm decides to 
use that disk as the source, you have a race: either SMART will fail the 
disk before it starts dropping bits or develops an unrelocatable bad 
sector, or said bad sectors will get copied to the mirror disk.

Granted, I've only seen data corruption on sata raid-1 once so far. But 
once is enough.

(Rumour has it, it's worse with raid-5 since that only protects from 
data loss if all chunks are committed to disk at once and not stuck in a 
write cache waiting for the elevator.)

Dima
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-29 Thread Dimitri Maziuk
On 11/29/2011 07:49 AM, Lars Marowsky-Bree wrote:

 (But the mdadm operations the RA does also shouldn't cause data
 corruption. That strikes me as an MD bug.)

If you repeatedly try to re-sync with a dying disk, with each resync
interrupted by i/o error, you will get data corruption sooner or later.
It's only MD bug in a sense that MD can't actually stop you from
shooting yourself.

HTH
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Q: unmanaged MD-RAID auto-recovery

2011-11-28 Thread Dimitri Maziuk
On 11/28/2011 02:37 PM, Andrew Beekhof wrote:
 On Mon, Nov 28, 2011 at 7:16 PM, Ulrich Windl
 ulrich.wi...@rz.uni-regensburg.de wrote:

 And therefore you need to monitor the _unmanaged_ resource? Strange.
 
 Now is the point where you explain how the cluster going to know what
 state of the unmanaged resource, /without/ monitoring.

I'd of thunk the un in unmanaged means none of its business.
(Would the cluster also like to know what my grandmother's maiden name
was between 1939 and 1945?)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Does ANYTHING Work on RHEL6?

2011-11-01 Thread Dimitri Maziuk
On 11/01/2011 07:17 AM, Nick Khamis wrote:
 We
 will need to turn SELINUX
 back soon.

You know you won't be able to ssh in as root if you do that, right?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Does ANYTHING Work on RHEL6?

2011-11-01 Thread Dimitri Maziuk
On 11/01/2011 01:45 PM, Nick Khamis wrote:
 Should be possible to sudo?

As long as you don't need xauth

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Does ANYTHING Work on RHEL6?

2011-11-01 Thread Dimitri Maziuk
On 11/01/2011 01:56 PM, Nick Khamis wrote:
 H... Thanks Dima! I really need to shut the doors for this
 project, but I do need to ssh as root. Any workarounds?

http://www.mail-archive.com/linux-390@vm.marist.edu/msg58510.html

(pay attention to breaks all kinds of traditional unix tools and
methods bit)

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] what if brain split happens

2011-10-31 Thread Dimitri Maziuk
On 10/31/2011 7:14 AM, Lars Ellenberg wrote:

 Yes, basically that is what happens: it kills everything it
 spawned, makes sure it sleeps for at least deadtime,
 then re-execs the itself as new master control process.

Re-exec itself? Cute. I never seen that -- but that again, the only time 
I lost the link between the nodes was when I used a crossover cable with 
broken-off plastic tab on one end  managed to pull it slightly lose 
while installing another server in the rack.

Dima
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] what if brain split happens

2011-10-28 Thread Dimitri Maziuk
On 10/28/2011 6:18 AM, Lars Ellenberg wrote:
 On Tue, Oct 25, 2011 at 02:29:55PM -0500, Dimitri Maziuk wrote:

 I guess he's confused or trolling because in v1 there's no way heartbeat
 can restart itself.

 No, absolutely not.
...
 Which is supposed to result in all started resources to be stopped,
 and then will do a clean restart without resources on any node,
 starting the resource group(s) on the preferred nodes.

My comment was based on my understanding of restart as heartbeat 
master process kills itself and then it's not there anymore to start 
itself back up. I see that I was mistaken.

Dima
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] what if brain split happens

2011-10-25 Thread Dimitri Maziuk
On 10/24/2011 10:49 PM, Hai Tao wrote:

 In case heartbeat communication is lost, brain split then happened, both 
 nodes (a two nodes cluster for a simple example) are having the vip and other 
 resources.

 When the heartbeat commnication comes back, what will happen?

 1. both nodes will still having the vip and resources forever?
 2. both nodes realize that brain split has happened, and will restart 
 heartbeat?

In theory -- #2 except they shouldn't restart heartbeat, one of them 
should stop the resources. In practice one of the interesting things 
that happen when the comms come back is you have a duplicate ip address 
(vip) on your network. That's not something you want to happen, so you 
better make sure one of the nodes is down before you restore the comms.

Dima
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] what if brain split happens

2011-10-25 Thread Dimitri Maziuk
On 10/25/2011 02:00 PM, Andreas Kurz wrote:
 On 10/25/2011 07:51 PM, Hai Tao wrote:

 actually what I saw is that both nodes shut down heartbeat, and then 
 restarted heartbeat
 
 I guess you are using Heartbeat v1  without crm but haresources
 config file?

I guess he's confused or trolling because in v1 there's no way heartbeat
can restart itself.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HA-Linux for Exim

2011-10-19 Thread Dimitri Maziuk
On 10/19/2011 11:00 AM, Paul O'Rorke wrote:
 after thinking this through and doing a little reading it occurs to me that
 this will only help cache incomming mail until the primary MX mail server is
 back up.  What I want is servers mirrored in realtime, with failover.  I
 would have expected HA-Linux to be a good candidate for that.

No, you can run the 2nd MX with the same priority and same config as the
1st one. Its intended use is load balancing, but as a side effect, if
one MX goes down the other gets all the mail.

The problem is shared resources: mbox'es in /var/spool/mail or MLM software.

If you are delivering to ~/Maildir's and don't have mailing lists, it
should work.

(If you're delivering to mbox'es in /var/spool/mail, don't. Consider
switching to maildirs as part of the upgrade.)

If you insist on mbox'es you'll need drbd for mail spool and list
archives (if any) and probably also for config files to make your life
easier -- pretty much what Florian said. What he didn't say is it'll
take you a week of frustration to get a working configuration.

The other thing he didn't say is that when you put those files/dirs on
drbd filesystem, you typically make symlinks from their original
location: e.g. /var/spool/mail-/drbd/mailspool. When drbd filesystem is
not mounted (on the passive node), all of them are broken symlinks. I
don't know about dpkg, but rpm does fix those whenever you update the
relevant packages. So you have to remember to triple-check and re-create
those broken symlinks after every software update on the passive node.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] HA-Linux for Exim

2011-10-19 Thread Dimitri Maziuk
Straw poll:

1. how long did it take you to get your first corosync/pacemaker/drbd
cluster up and running the way you wanted?

2. are you, or have you ever been a developer working on linux-ha projects?

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

  1   2   3   >