Hoy list people! :)
I have a strange problem: every time I have a node failure, the peer
(which is supposed to take over the services) reboots (instead of just
taking over the services).
I presume that this is something quite stupid I've declared on my
configuration, but I'm not able to say what, or why.
I'm using V2 configuration, and a rather simple Master/Slave DRBD set
+ a resource Group (and almost no constraints at all). This is the
crm_config relevant part of my 2 hosts configuration:
name="symmetric-cluster" value="TRUE"
name="stonith-enabled" value="FALSE"
name="no-quorum-policy" value="ignore"
name="default-resource-stickiness" value="100"
name="stop-orphan-resources" value="TRUE"
name="stop-orphan-actions" value="TRUE"
And this is my /etc/ha.d/ha.cf config file:
keepalive 2 # one heartbeat every 2 seconds
warntime 10 # log warning "late heartbeat" after 10s
deadtime 30 # node is pronounced dead after 30s
initdead 30 # special deadtime for network interface after reboot
udpport 694 # broadcast heartbeat using UDP port 694
auto_failback off # no preferred master
ping router.company.com # router
use_logd yes # how to log stuff: HB Log Daemon
# I have more heartbeats running in the same network
# that's why I use "ucast"
ucast eth1 sql3.company.com
node sql1.company.com
node sql3.company.com
crm on
respawn hacluster /usr/lib64/heartbeat/ipfail
Does anyone here see what could be causing this strange behavior?
I think that my logfiles are just too verbose to be useful at all. Maybe
someone here could tell me what to look for into the log files?
Many thanks in advance.
Kindest regards.
--
Luis Motta Campos (a.k.a. Monsieur Champs) is a software engineer,
Perl fanatic evangelist, and amateur {cook, photographer}
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems