Hi, On Mon, Apr 14, 2008 at 09:22:08AM -0400, Rob Morin wrote: > Hello all my first post here so be gentle.... :) > > I have setup already DRBD and Heartbeat-2 on 2 Debian Etch servers. Primary > named Joe secondary named Stewie > DRBD version 8 via apt-get and heartbeat-2 via apt-get version 2.0.7-2
2.0.7-2 is rather old. You would want to upgrade, in particular if you run v2/crm style configurations. > I am using 2 NICS, eth0 which is private for DRBD replication and heartbeat > and eth1 used for my real public IP address where outsiders connect to for > the services. See below. > I am not using heartbeat yet, but i am using drbd, as i am having a trouble > getting heartbeat to take over on the secondary server(Stewie). The problem > is Apache is dying for some reason... however i would like the other > resources to start, such as pop and mail and a couple others.. i figure its > better to have only one server dead such as web , rather than all services > dead... > > My question is, is it possible to have heartbeat ignore a problem when a > problem or error occurs starting up a service? In v1 probably not, but other services/groups shouldn't be affected. The point of having a service in a cluster is to make it more available, right? So, if your service is unstable, it should first be fixed. > As its is hard to troubleshoot a problem when it occurs as heartbeat gives > up if it encounters one error.... Why should it be hard to troubleshoot? There are logs I guess. > Also i noticed in the in the ha.cf file ther is a comment that says "# Node > name must be same as uname -r." > > SO i have "Joe" and "Stewie" as my hostnames but if i do a uname -r on > either host i get this in return > > 2.6.18-6-amd64 That must be a typo. It should read 'uname -n'. > Could this be an issue... here are my conf files.... > > > ha.cf file > ------------------------------------- > logfacility daemon # This is deprecated > keepalive 2 # Interval between heartbeat (HB) packets. > deadtime 60 # How quickly HB determines a dead node. > warntime 5 # Time HB will issue a late HB. > initdead 120 # Time delay needed by HB to report a dead > node. > udpport 694 # UDP port HB uses to communicate between > nodes. > #ping 192.168.5.1 # Ping VMware Server host to simulate > network resource. > bcast eth0 You need at least two comm links for production servers. Another link could be your public network interface. > #baud 115200 > #serial /dev/ttyS0 # Which interface to use for HB packets. > coredumps true > auto_failback off # Auto promotion of primary node upon return > to cluster. > node joe # Node name must be same as uname -r. > node stewie # Node name must be same as uname -r. > ### > respawn hacluster /usr/lib/heartbeat/ipfail > # Specifies which programs to run at startup > > ------------------------------------------------------------ > > > haresources file > ------------------------------------------------------ > joe IPaddr::xxx.xxx.xxx.150 \ > drbddisk::mail Filesystem::/dev/drbd0::/var/mail/virtual::ext3::defaults > apache2 mysql ispcp_daemon \ > drbddisk::web Filesystem::/dev/drbd1::/var/www::ext3::defaults postfix > courier-authdaemon courier-pop courier-imap Looks like you put everything in a single group. You should try to split them into several, if possible. For example, I'd assume that drbddisk::mail and drbddisk::web don't depend on each other and that various services depend on either the former or the latter. Then create at least two groups. If all depend on the IP address, then all have to be in a single group if you're running a v1/haresources based configuration. In that case, you would want to consider a v2/crm configuration. At any rate, you may consider introducing an extra IP address for the second group of services. See http://linux-ha.org/LearningAboutHeartbeat, http://linux-ha.org/HeartbeatTutorials, and http://linux-ha.org/GettingStartedV2 for more information. HTH, Dejan > ---------------------------------------------------------------------------------------------------------------------- > > Thanks to all for your help and have a great day! > > -- > > Rob Morin > Dido Internet Inc. > Montreal,Canada > http://www.dido.ca > 514-990-4444 > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
