2007/6/13, Thomas Ã…kerblom (HF/EBC) <[EMAIL PROTECTED]>:
Hi All. One of our tester has a system with four servers and a standby (heartbeat 2.0.7). servers: lab14b-ts-lim1 lab14b-ts-lim2 lab14b-ts-lim3 lab14b-ts-lim4 standby: lab14b-ts-limx For each server there is a group of three services (two IP addresses and a telephony service). group_lim1 group_lim2 group_lim3 group_lim4 If one server goes down the standby should take over the failing group. At one point the tester powered down lim1 and lim2 and after a short time powered them up again. Lim1 09:55:06 Lim2 09:54:20 Group_lim1 starts on lim1 and group_lim2 starts on lim2 At 12:25:05 the standby suddenly begins to start group_lim3 and at the same time group_lim2. The services for these two groups go up and down for a while until the IPs for group_lim3 and the telephony service for group_lim2 are running.
I see this in messages-lim2.txt: === Jun 11 12:24:59 lab14b-ts-lim2 heartbeat: [3362]: info: Link lab14b-ts-lim3:eth0 dead. Jun 11 12:25:02 lab14b-ts-lim2 kernel: tg3: eth1: Link is down. Jun 11 12:25:03 lab14b-ts-lim2 kernel: tg3: eth0: Link is down. === and soon: === Jun 11 12:25:09 lab14b-ts-lim2 kernel: tg3: eth1: Link is up at 100 Mbps, full duplex. Jun 11 12:25:09 lab14b-ts-lim2 kernel: tg3: eth1: Flow control is off for TX and off for RX. Jun 11 12:25:10 lab14b-ts-lim2 kernel: tg3: eth0: Link is up at 100 Mbps, full duplex. Jun 11 12:25:10 lab14b-ts-lim2 kernel: tg3: eth0: Flow control is off for TX and off for RX. === lab14b-ts-lim2 went off-line for a moment for a reason we don't know yet. Heartbeat handled this right as far as I understand. At this time I can see the message 'Another DC detected' in lim2 and in
limx. Crm_mon will in all nodes show that lim3 is OFFLINE, even on lim3 itself. I'm confused, can someone shed some light on this matter. I attach logs that I think may be of interest. <<nodgp.zip>> BR. *** Thomas This communication is confidential and intended solely for the addressee(s). Any unauthorized review, use, disclosure or distribution is prohibited. If you believe this message has been sent to you in error, please notify the sender by replying to this transmission and delete the message without disclosing it. Thank you. E-mail including attachments is susceptible to data corruption, interruption, unauthorized amendment, tampering and viruses, and we only send and receive e-mails on the basis that we are not liable for any such corruption, interception, amendment, tampering or viruses or any consequences thereof. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
