On Mon, 2009-02-02 at 12:05 +0100, Michael Schwartzkopff wrote:
> Am Montag, 2. Februar 2009 11:15:35 schrieb Tobias Appel:
> > Well I've given up on pingd, I just can't get it to work with DRBD and a
> > resource group. The constraints do shit and the cluster does nothing if
> > I pull the ethernet cable.
> > Can I do anything else to get it work? Maybe use STONITH to kill the
> > node which ethernet's cable has been pulled out? Or have it kill
> > itself.
> >
> > I'm really going crazy over this. My last resort would be to use another
> > software which monitors the network interface and which then would stop
> > the heartbeat service, but maybe, just maybe there is a simple solution
> > somewhere to be found within heartbeat.
> >
> > Best Regards,
> > Tobi
> 
> hi,
> 
> what time did you wait until you want the cluster to react?
I gave it a couple of minutes and I saw in the log files that it tried
to make the other node DRBD master but it said there can only be one
master (of course, that's how the M/S resource has been configured)
> 
> deadping option of ha.cf is 20 secs by default. Add 5 sec for the dumping and 
> get get in the range of 30 secs. Did you wait that long? If you want faster 
> reaction -> change deadping in ha.cf
> 
> Soes the logfile show the changes? What does
> cibadmin -Q -o status | grep pingd
> say on BOTH nodes?
> 
it registers that the ping group is dead. Now I changed my constraint
because Andrew gave a hint (he has some nice examples in the pacemaker
configuration which should be working for heartbeat 2.1.4 as well).
I know that my constraint is wrong but I just can't get it to work! I
tried all possible combinations now, I tried every example from your
book and from Andrew's documentation - of course none of these scenario
is similiar to mine - maybe it's just not possible to have it working in
my scenario. 

> In my setups I saw that DRBD sometimes reacts quite slow. Just configure 
> everything with the CLI. That prevents from errors people make clicking too 
> fast in GUIs ;-)

It's 90% configured by hand through xml files. I would need a constraint
that kills  the resource group, demoted DRBD and promotes the other node
to DRBD master and then start the resource group over there - but this
seems to be not possible with heartbeat 2.1.4. I really tried every
combination of constraint for pingd but it just does not work the way it
has to be.
Either it tries to promote the other node to DRBD master while the
current node still runs all the resource and returns an error or it does
nothing at all! It does not even stop the resource group.

I've put one year work into this project and now I fail on the last and
final step to making this HA cluster. I can't tell my boss that it will
failover in 90% of the scenarios but not when the ethernet connection is
down. 

bye,
tobi

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to