On Wed, May 12, 2010 at 3:16 AM, Lars Ellenberg
<[email protected]>wrote:

> On Tue, May 11, 2010 at 01:35:17PM -0700, Mike Sweetser wrote:
> > Hello,
> >
> > I've set up a DRBD and Heartbeat configuration communicating over an
> > Internet connection, rather than internal.  The servers are running
> CentOS
> > 5.4, with DRBD 8.3.2 and Heartbeat 3.0.3, out of the CentOS repository.
> >
> > I start seeing these in the ha-log.
> >
> > ERROR: Message hist queue is filling up (500 messages in queue)
> >
> > Then I see a bunch of these:
> >
> > WARN: Gmain_timeout_dispatch: Dispatch function for retransmit request
> took
> > too long to execute: 20 ms (> 10 ms) (GSource: 0x1c3025c0)
> >
> > And finally:
>
> What is before this?
> Below is "MCP dead" (Master Control Process)...
> it should log why it died.
> Or there should be some core file below
>        find /var/lib/heartbeat/cores/
> Or both.
>
>
May 11 17:38:33 mysql1 crmd: [904]: notice: run_graph: Transition 1
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-2440.bz2): Complete
May 11 17:38:33 mysql1 crmd: [904]: info: te_graph_trigger: Transition 1 is
now complete
May 11 17:38:33 mysql1 crmd: [904]: info: notify_crmd: Transition 1 status:
done - <null>
May 11 17:38:33 mysql1 crmd: [904]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
May 11 17:38:33 mysql1 crmd: [904]: info: do_state_transition: Starting
PEngine Recheck Timer
May 11 17:38:33 mysql1 pengine: [23821]: info: process_pe_message:
Transition 1: PEngine Input stored in: /var/lib/pengine/pe-input-2440.bz2
May 11 17:45:28 mysql1 cib: [900]: info: cib_stats: Processed 1 operations
(0.00us average, 0% utilization) in the last 10min

That's before all those messages. Right before that, it actually said it
lost a connection with the other server, but it came back right away.


> > May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5533 with
> > SIGTERM
> > May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5537 with
> > SIGTERM
> > May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5538 with
> > SIGTERM
> > May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5539 with
> > SIGTERM
> > May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Killing pid 5540 with
> > SIGTERM
> > May 08 05:33:19 mysql1 heartbeat: [5536]: CRIT: Emergency Shutdown(MCP
> > dead): Killing ourselves.
>
> > logfacility     local0
> > debug 1
> > debugfile /var/log/ha-debug
> > logfile /var/log/ha-log
>
> maybe you should use logd?
>
>
How would that affect Heartbeat crashing?


> > node mysql1
> > node mysql2
> > keepalive 2
> > deadtime 60
> > initdead 120
> > warntime 15
> > udpport 694
> > ucast eth1 66.165.231.34
> > ucast eth1 67.218.128.19
>
> You should add an additional link.
> Really.
>

What kind of a link should be added?  This is the first time I've done an
external connection setup like this - previous setups were all on internal
networks with directly connected machines.


>
> > auto_failback on
> > crm yes
>
> Are you short on memory, or under memory pressure?
>

The servers each  have 16 GB.


> Are UDP packets dropped?
>

I'm not showing packets dropping - the DRBD installs seem fine on the same
connections.


> Packet loss somewhere?
> Message corruption?
> Firewalled in one direction?
>

As far as I can tell, no packet loss or corruption, and there's no firewall
between the two.

Mike Sweetser


>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to