Hi Willy

I checked the cfg files for both HAProxy and heartbeat, and they're the same
where they are supposed to be. You we're right about the SYSLOGD="-r"
setting. I didn't know I had to do that. I've attached a new copy of the
haproxy.log file I started when I restarted both LB1 and LB2 servers. It
shows from the time it works at 08:54, till it stops at 09:14. The log is
from LB1, and the same log on LB2 shows "May 16 09:14:58 lb2 -- MARK --"
when I couldn't get back into the stats. I still can get to my test local
web site even after the stats go away.

Tom

-----Original Message-----
From: Willy Tarreau [mailto:[email protected]] 
Sent: Saturday, May 16, 2009 12:18 AM
To: Tom Potwin
Cc: [email protected]
Subject: Re: New HAProxy user keeps loosing connection

On Fri, May 15, 2009 at 05:03:07PM -0400, Tom Potwin wrote:
> I don't think I'm going to be much help there. I'm connected to the 
> Internet through a wireless broadband right now. I connect to my 
> network through a VPN connection. The server just sees me as a local
address.
> 
> I had it working for a short time, and I saved what showed up on 
> syslog for the time it stared to work, to the moment it stopped. The 
> last entry is at
> 16:17:01 when it was still working. It stopped working at about 16:30, 
> but nothing showed in any of the logs. Except again for the entry in 
> the apache error. I'm attaching the syslog copy here showing just the 
> time I talked about. I don't know if that helps, but please know that 
> I really appreciate all the help.

Well, unfortunately there is no haproxy log here. Are you sure that your
syslogd listens to the UDP socket ? (check for the "-r" option). However, in
the logs heartbeat is saying that it switches to node lb2 and that lb1 goes
standby. So most likely your second click is handled by lb2, and maybe that
one does not have exactly the same config options. You should compare the
configs.

Regards,
Willy
May 16 08:54:03 localhost.localdomain haproxy[24190]: Proxy webfarm started.
May 16 08:55:29 lb1 heartbeat: [23507]: info: Heartbeat shutdown in progress. 
(23507)
May 16 08:55:29 lb1 heartbeat: [24202]: info: Giving up all HA resources.
May 16 08:55:29 lb1 ResourceManager[24219]: info: Releasing resource group: 
lb1.tlthost.net 192.168.31.100
May 16 08:55:29 lb1 ResourceManager[24219]: info: Running 
/etc/ha.d/resource.d/IPaddr 192.168.31.100 stop
May 16 08:55:29 lb1 ResourceManager[24219]: debug: Starting 
/etc/ha.d/resource.d/IPaddr 192.168.31.100 stop
May 16 08:55:29 lb1 IPaddr[24273]: INFO: ifconfig eth0:0 down
May 16 08:55:29 lb1 IPaddr[24258]: INFO:  Success
May 16 08:55:29 lb1 ResourceManager[24219]: debug: /etc/ha.d/resource.d/IPaddr 
192.168.31.100 stop done. RC=0
May 16 08:55:29 lb1 heartbeat: [24202]: info: All HA resources relinquished.
May 16 08:55:30 lb1 heartbeat: [23507]: WARN: 1 lost packet(s) for 
[lb2.tlthost.net] [488:490]
May 16 08:55:30 lb1 ipfail: [23536]: debug: Other side is now stable.
May 16 08:55:30 lb1 heartbeat: [23507]: info: No pkts missing from 
lb2.tlthost.net!
May 16 08:55:30 lb1 heartbeat: [23507]: info: killing /usr/lib/heartbeat/ipfail 
process group 23536 with signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBWRITE process 23518 
with signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBREAD process 23519 with 
signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBFIFO process 23511 with 
signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBWRITE process 23512 
with signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBREAD process 23513 with 
signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBWRITE process 23514 
with signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBREAD process 23515 with 
signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBWRITE process 23516 
with signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBREAD process 23517 with 
signal 15
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23511 exited. 9 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23512 exited. 8 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23513 exited. 7 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23514 exited. 6 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23515 exited. 5 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23516 exited. 4 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23517 exited. 3 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23518 exited. 2 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23519 exited. 1 
remaining
May 16 08:55:31 lb1 heartbeat: [23507]: info: lb1.tlthost.net Heartbeat 
shutdown complete.
May 16 08:55:35 lb1 heartbeat: [24386]: WARN: WARNING: directive 'udp' replaced 
by 'bcast'
May 16 08:55:35 lb1 heartbeat: [24386]: WARN: Core dumps could be lost if 
multiple dumps occur.
May 16 08:55:35 lb1 heartbeat: [24386]: WARN: Consider setting non-default 
value in /proc/sys/kernel/core_pattern (or equivalent) for maximum 
supportability
May 16 08:55:35 lb1 heartbeat: [24386]: WARN: Consider setting 
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability
May 16 08:55:35 lb1 heartbeat: [24386]: info: Version 2 support: false
May 16 08:55:35 lb1 heartbeat: [24386]: WARN: Logging daemon is disabled 
--enabling logging daemon is recommended
May 16 08:55:35 lb1 heartbeat: [24386]: info: **************************
May 16 08:55:35 lb1 heartbeat: [24386]: info: Configuration validated. Starting 
heartbeat 2.1.3
May 16 08:55:35 lb1 heartbeat: [24387]: info: heartbeat: version 2.1.3
May 16 08:55:35 lb1 heartbeat: [24387]: info: Heartbeat generation: 1240088880
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP Broadcast heartbeat 
started on port 694 (694) interface eth0
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP Broadcast heartbeat 
closed on port 694 interface eth0 - Status: 1
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP multicast heartbeat 
started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0)
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: ucast: write socket 
priority set to IPTOS_LOWDELAY on eth0
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: ucast: bound send socket to 
device: eth0
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: ucast: bound receive socket 
to device: eth0
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: ucast: started on port 694 
interface eth0 to 192.168.31.211
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP Broadcast heartbeat 
started on port 694 (694) interface eth0
May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP Broadcast heartbeat 
closed on port 694 interface eth0 - Status: 1
May 16 08:55:35 lb1 heartbeat: [24387]: info: G_main_add_TriggerHandler: Added 
signal manual handler
May 16 08:55:35 lb1 heartbeat: [24387]: info: G_main_add_TriggerHandler: Added 
signal manual handler
May 16 08:55:35 lb1 heartbeat: [24387]: info: G_main_add_SignalHandler: Added 
signal handler for signal 17
May 16 08:55:35 lb1 heartbeat: [24387]: info: Local status now set to: 'up'
May 16 08:55:36 lb1 heartbeat: [24387]: info: Link lb1.tlthost.net:eth0 up.
May 16 08:55:36 lb1 heartbeat: [24387]: info: Link lb2.tlthost.net:eth0 up.
May 16 08:55:36 lb1 heartbeat: [24387]: info: Status update for node 
lb2.tlthost.net: status active
May 16 08:55:36 lb1 heartbeat: [24401]: debug: notify_world: setting SIGCHLD 
Handler to SIG_DFL
May 16 08:55:36 lb1 harc[24401]: info: Running /etc/ha.d/rc.d/status status
May 16 08:55:37 lb1 heartbeat: [24387]: info: Comm_now_up(): updating status to 
active
May 16 08:55:37 lb1 heartbeat: [24387]: info: Local status now set to: 'active'
May 16 08:55:37 lb1 heartbeat: [24387]: info: Starting child client 
"/usr/lib/heartbeat/ipfail" (108,112)
May 16 08:55:37 lb1 heartbeat: [24416]: info: Starting 
"/usr/lib/heartbeat/ipfail" as uid 108  gid 112 (pid 24416)
May 16 08:55:37 lb1 ipfail: [24416]: debug: PID=24416
May 16 08:55:37 lb1 ipfail: [24416]: debug: Signing in with heartbeat
May 16 08:55:37 lb1 ipfail: [24416]: debug: [We are lb1.tlthost.net]
May 16 08:55:37 lb1 heartbeat: [24387]: info: remote resource transition 
completed.
May 16 08:55:37 lb1 heartbeat: [24387]: info: remote resource transition 
completed.
May 16 08:55:37 lb1 heartbeat: [24387]: info: Local Resource acquisition 
completed. (none)
May 16 08:55:37 lb1 ipfail: [24416]: debug: auto_failback -> 1 (on)
May 16 08:55:38 lb1 heartbeat: [24387]: info: lb2.tlthost.net wants to go 
standby [foreign]
May 16 08:55:38 lb1 ipfail: [24416]: debug: Setting message filter mode
May 16 08:55:38 lb1 ipfail: [24416]: debug: Starting node walk
May 16 08:55:39 lb1 heartbeat: [24387]: info: standby: acquire [foreign] 
resources from lb2.tlthost.net
May 16 08:55:39 lb1 heartbeat: [24421]: info: acquire local HA resources 
(standby).
May 16 08:55:39 lb1 ipfail: [24416]: debug: Cluster node: lb2.tlthost.net: 
status: active
May 16 08:55:39 lb1 ResourceManager[24435]: info: Acquiring resource group: 
lb1.tlthost.net 192.168.31.100
May 16 08:55:39 lb1 IPaddr[24461]: INFO:  Resource is stopped
May 16 08:55:39 lb1 ResourceManager[24435]: info: Running 
/etc/ha.d/resource.d/IPaddr 192.168.31.100 start
May 16 08:55:39 lb1 ResourceManager[24435]: debug: Starting 
/etc/ha.d/resource.d/IPaddr 192.168.31.100 start
May 16 08:55:39 lb1 IPaddr[24532]: INFO: Using calculated nic for 
192.168.31.100: eth0
May 16 08:55:39 lb1 IPaddr[24532]: INFO: Using calculated netmask for 
192.168.31.100: 255.255.255.0
May 16 08:55:39 lb1 IPaddr[24532]: DEBUG: Using calculated broadcast for 
192.168.31.100: 192.168.31.255
May 16 08:55:39 lb1 IPaddr[24532]: INFO: eval ifconfig eth0:0 192.168.31.100 
netmask 255.255.255.0 broadcast 192.168.31.255
May 16 08:55:39 lb1 IPaddr[24532]: DEBUG: Sending Gratuitous Arp for 
192.168.31.100 on eth0:0 [eth0]
May 16 08:55:39 lb1 IPaddr[24517]: INFO:  Success
May 16 08:55:39 lb1 ResourceManager[24435]: debug: /etc/ha.d/resource.d/IPaddr 
192.168.31.100 start done. RC=0
May 16 08:55:39 lb1 heartbeat: [24421]: info: local HA resource acquisition 
completed (standby).
May 16 08:55:39 lb1 heartbeat: [24387]: info: Standby resource acquisition done 
[foreign].
May 16 08:55:39 lb1 heartbeat: [24387]: info: Initial resource acquisition 
complete (auto_failback)
May 16 08:55:39 lb1 ipfail: [24416]: debug: [They are lb2.tlthost.net]
May 16 08:55:39 lb1 ipfail: [24416]: debug: Cluster node: lb1.tlthost.net: 
status: active
May 16 08:55:40 lb1 heartbeat: [24387]: info: remote resource transition 
completed.
May 16 08:55:40 lb1 ipfail: [24416]: debug: Setting message signal
May 16 08:55:40 lb1 ipfail: [24416]: debug: Waiting for messages...
May 16 08:55:41 lb1 ipfail: [24416]: debug: Other side is now stable.
May 16 08:55:41 lb1 ipfail: [24416]: debug: Other side is now stable.
May 16 08:55:43 lb1 ipfail: [24416]: debug: Got asked for num_ping.
May 16 08:55:44 lb1 ipfail: [24416]: info: Ping node count is balanced.
May 16 08:55:44 lb1 ipfail: [24416]: debug: Abort message sent.
May 16 08:55:44 lb1 ipfail: [24416]: info: Giving up foreign resources 
(auto_failback).
May 16 08:55:44 lb1 ipfail: [24416]: info: Delayed giveup in 4 seconds.
May 16 08:55:44 lb1 ipfail: [24416]: debug: Other side is unstable.
May 16 08:55:45 lb1 ipfail: [24416]: debug: Other side is now stable.
May 16 08:55:45 lb1 ipfail: [24416]: debug: Other side is now stable.
May 16 08:55:48 lb1 ipfail: [24416]: info: giveup() called (timeout worked)
May 16 08:55:49 lb1 ipfail: [24416]: debug: Message [ask_resources] sent.
May 16 08:55:49 lb1 ipfail: [24416]: debug: giveup timeout has been destroyed.
May 16 08:55:49 lb1 heartbeat: [24387]: info: lb1.tlthost.net wants to go 
standby [foreign]
May 16 08:55:49 lb1 heartbeat: [24387]: info: standby: lb2.tlthost.net can take 
our foreign resources
May 16 08:55:49 lb1 heartbeat: [24632]: info: give up foreign HA resources 
(standby).
May 16 08:55:49 lb1 heartbeat: [24632]: info: foreign HA resource release 
completed (standby).
May 16 08:55:49 lb1 heartbeat: [24387]: info: Local standby process completed 
[foreign].
May 16 08:55:50 lb1 heartbeat: [24387]: WARN: 1 lost packet(s) for 
[lb2.tlthost.net] [512:514]
May 16 08:55:50 lb1 heartbeat: [24387]: info: remote resource transition 
completed.
May 16 08:55:50 lb1 heartbeat: [24387]: info: No pkts missing from 
lb2.tlthost.net!
May 16 08:55:50 lb1 heartbeat: [24387]: info: Other node completed standby 
takeover of foreign resources.
May 16 08:55:50 lb1 ipfail: [24416]: debug: Other side is now stable.
May 16 08:55:51 lb1 ipfail: [24416]: debug: Other side is now stable.
May 16 09:00:59 localhost.localdomain haproxy[24191]: 192.168.30.64:1241 
[16/May/2009:09:00:59.451] webfarm webfarm/<NOSRV> -1/-1/-1/-1/8 401 407 - - 
PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1"
May 16 09:01:01 localhost.localdomain haproxy[24191]: 192.168.30.64:1242 
[16/May/2009:09:01:01.044] webfarm webfarm/<STATS> 42/-1/-1/-1/43 200 510 - - 
PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1"
May 16 09:01:01 localhost.localdomain haproxy[24191]: 192.168.30.64:1243 
[16/May/2009:09:01:01.765] webfarm webfarm/web1 48/0/0/26/74 200 554 - - --NN 
0/0/0/0 0/0 "GET /favicon.ico HTTP/1.1"
May 16 09:10:12 localhost.localdomain haproxy[24191]: 192.168.30.64:1252 
[16/May/2009:09:10:12.292] webfarm webfarm/<STATS> 15/-1/-1/-1/15 200 553 - - 
PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1"
May 16 09:10:13 localhost.localdomain haproxy[24191]: 192.168.30.64:1253 
[16/May/2009:09:10:13.038] webfarm webfarm/web2 50/0/0/1/51 200 647 - - --NN 
0/0/0/0 0/0 "GET /favicon.ico HTTP/1.1"
May 16 09:11:18 localhost.localdomain haproxy[24191]: 192.168.30.64:1258 
[16/May/2009:09:11:18.045] webfarm webfarm/<NOSRV> -1/-1/-1/-1/15 401 423 - - 
PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1"
May 16 09:11:19 localhost.localdomain haproxy[24191]: 192.168.30.64:1259 
[16/May/2009:09:11:19.589] webfarm webfarm/<STATS> 43/-1/-1/-1/43 200 474 - - 
PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1"

Reply via email to