Hi Willy I checked the cfg files for both HAProxy and heartbeat, and they're the same where they are supposed to be. You we're right about the SYSLOGD="-r" setting. I didn't know I had to do that. I've attached a new copy of the haproxy.log file I started when I restarted both LB1 and LB2 servers. It shows from the time it works at 08:54, till it stops at 09:14. The log is from LB1, and the same log on LB2 shows "May 16 09:14:58 lb2 -- MARK --" when I couldn't get back into the stats. I still can get to my test local web site even after the stats go away.
Tom -----Original Message----- From: Willy Tarreau [mailto:[email protected]] Sent: Saturday, May 16, 2009 12:18 AM To: Tom Potwin Cc: [email protected] Subject: Re: New HAProxy user keeps loosing connection On Fri, May 15, 2009 at 05:03:07PM -0400, Tom Potwin wrote: > I don't think I'm going to be much help there. I'm connected to the > Internet through a wireless broadband right now. I connect to my > network through a VPN connection. The server just sees me as a local address. > > I had it working for a short time, and I saved what showed up on > syslog for the time it stared to work, to the moment it stopped. The > last entry is at > 16:17:01 when it was still working. It stopped working at about 16:30, > but nothing showed in any of the logs. Except again for the entry in > the apache error. I'm attaching the syslog copy here showing just the > time I talked about. I don't know if that helps, but please know that > I really appreciate all the help. Well, unfortunately there is no haproxy log here. Are you sure that your syslogd listens to the UDP socket ? (check for the "-r" option). However, in the logs heartbeat is saying that it switches to node lb2 and that lb1 goes standby. So most likely your second click is handled by lb2, and maybe that one does not have exactly the same config options. You should compare the configs. Regards, Willy
May 16 08:54:03 localhost.localdomain haproxy[24190]: Proxy webfarm started. May 16 08:55:29 lb1 heartbeat: [23507]: info: Heartbeat shutdown in progress. (23507) May 16 08:55:29 lb1 heartbeat: [24202]: info: Giving up all HA resources. May 16 08:55:29 lb1 ResourceManager[24219]: info: Releasing resource group: lb1.tlthost.net 192.168.31.100 May 16 08:55:29 lb1 ResourceManager[24219]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.31.100 stop May 16 08:55:29 lb1 ResourceManager[24219]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.31.100 stop May 16 08:55:29 lb1 IPaddr[24273]: INFO: ifconfig eth0:0 down May 16 08:55:29 lb1 IPaddr[24258]: INFO: Success May 16 08:55:29 lb1 ResourceManager[24219]: debug: /etc/ha.d/resource.d/IPaddr 192.168.31.100 stop done. RC=0 May 16 08:55:29 lb1 heartbeat: [24202]: info: All HA resources relinquished. May 16 08:55:30 lb1 heartbeat: [23507]: WARN: 1 lost packet(s) for [lb2.tlthost.net] [488:490] May 16 08:55:30 lb1 ipfail: [23536]: debug: Other side is now stable. May 16 08:55:30 lb1 heartbeat: [23507]: info: No pkts missing from lb2.tlthost.net! May 16 08:55:30 lb1 heartbeat: [23507]: info: killing /usr/lib/heartbeat/ipfail process group 23536 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBWRITE process 23518 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBREAD process 23519 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBFIFO process 23511 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBWRITE process 23512 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBREAD process 23513 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBWRITE process 23514 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBREAD process 23515 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBWRITE process 23516 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: killing HBREAD process 23517 with signal 15 May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23511 exited. 9 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23512 exited. 8 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23513 exited. 7 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23514 exited. 6 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23515 exited. 5 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23516 exited. 4 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23517 exited. 3 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23518 exited. 2 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: Core process 23519 exited. 1 remaining May 16 08:55:31 lb1 heartbeat: [23507]: info: lb1.tlthost.net Heartbeat shutdown complete. May 16 08:55:35 lb1 heartbeat: [24386]: WARN: WARNING: directive 'udp' replaced by 'bcast' May 16 08:55:35 lb1 heartbeat: [24386]: WARN: Core dumps could be lost if multiple dumps occur. May 16 08:55:35 lb1 heartbeat: [24386]: WARN: Consider setting non-default value in /proc/sys/kernel/core_pattern (or equivalent) for maximum supportability May 16 08:55:35 lb1 heartbeat: [24386]: WARN: Consider setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum supportability May 16 08:55:35 lb1 heartbeat: [24386]: info: Version 2 support: false May 16 08:55:35 lb1 heartbeat: [24386]: WARN: Logging daemon is disabled --enabling logging daemon is recommended May 16 08:55:35 lb1 heartbeat: [24386]: info: ************************** May 16 08:55:35 lb1 heartbeat: [24386]: info: Configuration validated. Starting heartbeat 2.1.3 May 16 08:55:35 lb1 heartbeat: [24387]: info: heartbeat: version 2.1.3 May 16 08:55:35 lb1 heartbeat: [24387]: info: Heartbeat generation: 1240088880 May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0) May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0 May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: ucast: bound send socket to device: eth0 May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: ucast: bound receive socket to device: eth0 May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: ucast: started on port 694 interface eth0 to 192.168.31.211 May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth0 May 16 08:55:35 lb1 heartbeat: [24387]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth0 - Status: 1 May 16 08:55:35 lb1 heartbeat: [24387]: info: G_main_add_TriggerHandler: Added signal manual handler May 16 08:55:35 lb1 heartbeat: [24387]: info: G_main_add_TriggerHandler: Added signal manual handler May 16 08:55:35 lb1 heartbeat: [24387]: info: G_main_add_SignalHandler: Added signal handler for signal 17 May 16 08:55:35 lb1 heartbeat: [24387]: info: Local status now set to: 'up' May 16 08:55:36 lb1 heartbeat: [24387]: info: Link lb1.tlthost.net:eth0 up. May 16 08:55:36 lb1 heartbeat: [24387]: info: Link lb2.tlthost.net:eth0 up. May 16 08:55:36 lb1 heartbeat: [24387]: info: Status update for node lb2.tlthost.net: status active May 16 08:55:36 lb1 heartbeat: [24401]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL May 16 08:55:36 lb1 harc[24401]: info: Running /etc/ha.d/rc.d/status status May 16 08:55:37 lb1 heartbeat: [24387]: info: Comm_now_up(): updating status to active May 16 08:55:37 lb1 heartbeat: [24387]: info: Local status now set to: 'active' May 16 08:55:37 lb1 heartbeat: [24387]: info: Starting child client "/usr/lib/heartbeat/ipfail" (108,112) May 16 08:55:37 lb1 heartbeat: [24416]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 108 gid 112 (pid 24416) May 16 08:55:37 lb1 ipfail: [24416]: debug: PID=24416 May 16 08:55:37 lb1 ipfail: [24416]: debug: Signing in with heartbeat May 16 08:55:37 lb1 ipfail: [24416]: debug: [We are lb1.tlthost.net] May 16 08:55:37 lb1 heartbeat: [24387]: info: remote resource transition completed. May 16 08:55:37 lb1 heartbeat: [24387]: info: remote resource transition completed. May 16 08:55:37 lb1 heartbeat: [24387]: info: Local Resource acquisition completed. (none) May 16 08:55:37 lb1 ipfail: [24416]: debug: auto_failback -> 1 (on) May 16 08:55:38 lb1 heartbeat: [24387]: info: lb2.tlthost.net wants to go standby [foreign] May 16 08:55:38 lb1 ipfail: [24416]: debug: Setting message filter mode May 16 08:55:38 lb1 ipfail: [24416]: debug: Starting node walk May 16 08:55:39 lb1 heartbeat: [24387]: info: standby: acquire [foreign] resources from lb2.tlthost.net May 16 08:55:39 lb1 heartbeat: [24421]: info: acquire local HA resources (standby). May 16 08:55:39 lb1 ipfail: [24416]: debug: Cluster node: lb2.tlthost.net: status: active May 16 08:55:39 lb1 ResourceManager[24435]: info: Acquiring resource group: lb1.tlthost.net 192.168.31.100 May 16 08:55:39 lb1 IPaddr[24461]: INFO: Resource is stopped May 16 08:55:39 lb1 ResourceManager[24435]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.31.100 start May 16 08:55:39 lb1 ResourceManager[24435]: debug: Starting /etc/ha.d/resource.d/IPaddr 192.168.31.100 start May 16 08:55:39 lb1 IPaddr[24532]: INFO: Using calculated nic for 192.168.31.100: eth0 May 16 08:55:39 lb1 IPaddr[24532]: INFO: Using calculated netmask for 192.168.31.100: 255.255.255.0 May 16 08:55:39 lb1 IPaddr[24532]: DEBUG: Using calculated broadcast for 192.168.31.100: 192.168.31.255 May 16 08:55:39 lb1 IPaddr[24532]: INFO: eval ifconfig eth0:0 192.168.31.100 netmask 255.255.255.0 broadcast 192.168.31.255 May 16 08:55:39 lb1 IPaddr[24532]: DEBUG: Sending Gratuitous Arp for 192.168.31.100 on eth0:0 [eth0] May 16 08:55:39 lb1 IPaddr[24517]: INFO: Success May 16 08:55:39 lb1 ResourceManager[24435]: debug: /etc/ha.d/resource.d/IPaddr 192.168.31.100 start done. RC=0 May 16 08:55:39 lb1 heartbeat: [24421]: info: local HA resource acquisition completed (standby). May 16 08:55:39 lb1 heartbeat: [24387]: info: Standby resource acquisition done [foreign]. May 16 08:55:39 lb1 heartbeat: [24387]: info: Initial resource acquisition complete (auto_failback) May 16 08:55:39 lb1 ipfail: [24416]: debug: [They are lb2.tlthost.net] May 16 08:55:39 lb1 ipfail: [24416]: debug: Cluster node: lb1.tlthost.net: status: active May 16 08:55:40 lb1 heartbeat: [24387]: info: remote resource transition completed. May 16 08:55:40 lb1 ipfail: [24416]: debug: Setting message signal May 16 08:55:40 lb1 ipfail: [24416]: debug: Waiting for messages... May 16 08:55:41 lb1 ipfail: [24416]: debug: Other side is now stable. May 16 08:55:41 lb1 ipfail: [24416]: debug: Other side is now stable. May 16 08:55:43 lb1 ipfail: [24416]: debug: Got asked for num_ping. May 16 08:55:44 lb1 ipfail: [24416]: info: Ping node count is balanced. May 16 08:55:44 lb1 ipfail: [24416]: debug: Abort message sent. May 16 08:55:44 lb1 ipfail: [24416]: info: Giving up foreign resources (auto_failback). May 16 08:55:44 lb1 ipfail: [24416]: info: Delayed giveup in 4 seconds. May 16 08:55:44 lb1 ipfail: [24416]: debug: Other side is unstable. May 16 08:55:45 lb1 ipfail: [24416]: debug: Other side is now stable. May 16 08:55:45 lb1 ipfail: [24416]: debug: Other side is now stable. May 16 08:55:48 lb1 ipfail: [24416]: info: giveup() called (timeout worked) May 16 08:55:49 lb1 ipfail: [24416]: debug: Message [ask_resources] sent. May 16 08:55:49 lb1 ipfail: [24416]: debug: giveup timeout has been destroyed. May 16 08:55:49 lb1 heartbeat: [24387]: info: lb1.tlthost.net wants to go standby [foreign] May 16 08:55:49 lb1 heartbeat: [24387]: info: standby: lb2.tlthost.net can take our foreign resources May 16 08:55:49 lb1 heartbeat: [24632]: info: give up foreign HA resources (standby). May 16 08:55:49 lb1 heartbeat: [24632]: info: foreign HA resource release completed (standby). May 16 08:55:49 lb1 heartbeat: [24387]: info: Local standby process completed [foreign]. May 16 08:55:50 lb1 heartbeat: [24387]: WARN: 1 lost packet(s) for [lb2.tlthost.net] [512:514] May 16 08:55:50 lb1 heartbeat: [24387]: info: remote resource transition completed. May 16 08:55:50 lb1 heartbeat: [24387]: info: No pkts missing from lb2.tlthost.net! May 16 08:55:50 lb1 heartbeat: [24387]: info: Other node completed standby takeover of foreign resources. May 16 08:55:50 lb1 ipfail: [24416]: debug: Other side is now stable. May 16 08:55:51 lb1 ipfail: [24416]: debug: Other side is now stable. May 16 09:00:59 localhost.localdomain haproxy[24191]: 192.168.30.64:1241 [16/May/2009:09:00:59.451] webfarm webfarm/<NOSRV> -1/-1/-1/-1/8 401 407 - - PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1" May 16 09:01:01 localhost.localdomain haproxy[24191]: 192.168.30.64:1242 [16/May/2009:09:01:01.044] webfarm webfarm/<STATS> 42/-1/-1/-1/43 200 510 - - PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1" May 16 09:01:01 localhost.localdomain haproxy[24191]: 192.168.30.64:1243 [16/May/2009:09:01:01.765] webfarm webfarm/web1 48/0/0/26/74 200 554 - - --NN 0/0/0/0 0/0 "GET /favicon.ico HTTP/1.1" May 16 09:10:12 localhost.localdomain haproxy[24191]: 192.168.30.64:1252 [16/May/2009:09:10:12.292] webfarm webfarm/<STATS> 15/-1/-1/-1/15 200 553 - - PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1" May 16 09:10:13 localhost.localdomain haproxy[24191]: 192.168.30.64:1253 [16/May/2009:09:10:13.038] webfarm webfarm/web2 50/0/0/1/51 200 647 - - --NN 0/0/0/0 0/0 "GET /favicon.ico HTTP/1.1" May 16 09:11:18 localhost.localdomain haproxy[24191]: 192.168.30.64:1258 [16/May/2009:09:11:18.045] webfarm webfarm/<NOSRV> -1/-1/-1/-1/15 401 423 - - PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1" May 16 09:11:19 localhost.localdomain haproxy[24191]: 192.168.30.64:1259 [16/May/2009:09:11:19.589] webfarm webfarm/<STATS> 43/-1/-1/-1/43 200 474 - - PRNN 0/0/0/0 0/0 "GET /haproxy?stats HTTP/1.1"

