Re: httpchk failures

Benjamin Smith Tue, 14 Apr 2015 16:55:18 -0700

Igor, 

Thanks for the response; I didn't see this email until just now as it didn't 
go through the mailing list and so wasn't filtered as expected.

I spent my morning trying everything I could think of to get haproxy's agent-
check to work consistently. The main symptom is that haproxy would mark hosts 
with the status of "DRAIN" and provide no clues as to why, even with log-
health-checks on. After a *lot* of trial and error, I've found the following 
that seem to be bugs, on the latest 1.5.11 release, running on CentOS 6. 

1) agent-check output words sometimes handled inconsistently, ignored, or 
misunderstood if " " was used instead of "," as a separator. 

This is understood: 
echo "ready,78%\r\n"

This line often causes a DRAIN state. A restart of haproxy was insufficient to 
clear the DRAIN state: (see #3) 
echo "ready 78%\r\n"

2) Inconsistent logging of DRAIN status change when health logging was on. 
(the server would turn blue in the stats page without any logging as to why. 
Logging status would sometimes say " Server $service/$name is UP (leaving 
forced drain)" even as the stats page continues to report DRAIN state! 

3) Even when the agent output was amended as above, for hosts that were set to 
the DRAIN state pursuant to #1 issue were not brought back to ready/up state 
until "enable health $service/$host" and/or "enable agent $service/$host" was 
sent to the stats port. 

4) Setting the server weight to 10 seems to help a significant amount. If, in 
fact, haproxy can't handle 35% of 1 it should throw an error on startup IMHO. 

See also my comments interspersed below: 

Thanks, 

Benjamin Smith 

On Tuesday, April 14, 2015 10:50:31 AM you wrote:
> On Tue, Apr 14, 2015 at 10:11 AM, Igor Cicimov <
> 
> ig...@encompasscorporation.com> wrote:
> > On Tue, Apr 14, 2015 at 5:00 AM, Benjamin Smith <li...@benjamindsmith.com>
> > 
> > wrote:
> >> We have 5 Apache servers behind haproxy and we're trying to enable use
> >> the
> >> "httpchk" option along with some performance monitoring. For some reason,
> >> haproxy keeps thinking that 3/5 apache servers are "down" even though
> >> it's
> >> obvious that haproxy is both asking the questions and the servers are
> >> answering.
> >> 
> >> Is there a way to log httpchk failures? How can I ask haproxy why it
> >> seems to
> >> think that several apache servers are down?
> >> 
> >> Our config:
> >> CentOS 6.x recently updated, 64 bit.
> >> 
> >> Performing an agent-check manually seems to give good results. The below
> >> result is immediate:
> >> [root@xr1 ~]# telnet 10.1.1.12 9333
> >> Trying 10.1.1.12...
> >> Connected to 10.1.1.12.
> >> Escape character is '^]'.
> >> up 78%
> >> Connection closed by foreign host.
> >> 
> >> 
> >> I can see that xinetd on the logic server got the response:
> >> Apr 13 18:45:02 curie xinetd[21890]: EXIT: calcload333 status=0 pid=25693
> >> duration=0(sec)
> >> Apr 13 18:45:06 curie xinetd[21890]: START: calcload333 pid=26590
> >> from=::ffff:10.1.1.1
> >> 
> >> 
> >> I can see that apache is serving happy replies to the load balancer:
> >> [root@curie ~]# tail -f /var/log/httpd/access_log | grep -i "10.1.1.1 "
> >> 10.1.1.1 - - [13/Apr/2015:18:47:15 +0000] "OPTIONS / HTTP/1.0" 302 - "-"
> >> "-"
> >> 10.1.1.1 - - [13/Apr/2015:18:47:17 +0000] "OPTIONS / HTTP/1.0" 302 - "-"
> >> "-"
> >> 10.1.1.1 - - [13/Apr/2015:18:47:19 +0000] "OPTIONS / HTTP/1.0" 302 - "-"
> >> "-"
> >> ^C
> > 
> > I have a feeling you might have been little bit confused here. Per my
> > understanding, and your configuration:
> > 
> > server server10 10.1.1.10:20333 maxconn 256 *check agent-check agent-port
> > 9333 agent-inter 4000*
> > 
> > the HAP is doing a health check on the agent you are using and not on the
> > Apache so the apache response in this case looks irrelevant to me. I don't
> > know how did you setup the agent since you haven't posted that part but
> > this is an excellent article by Malcolm Turnbull, the inventor of
> > agent-check, that might help:
> > 
> > 
> > http://blog.loadbalancer.org/open-source-windows-service-for-reporting-ser
> > ver-load-back-to-haproxy-load-balancer-feedback-agent/

We used this exact blog entry as our starting point. In our case, the xinetd 
script compares load average, apache process count, cpu info and a little salt 
to come up with a number ranging from 0% to 500%. 

> and press enter twice and check the output. Other option is using curl:
> 
> $ curl -s -S -i --http1.0 -X OPTIONS http://10.1.1.12:20333/

[root@xr1 ~]# curl -s -S -i --http1.0 -X OPTIONS http://10.1.1.12:20333
HTTP/1.1 302 Found
Date: Tue, 14 Apr 2015 23:39:40 GMT
Server: Apache/2.2.15 (CentOS)
X-Powered-By: PHP/5.3.3
Set-Cookie: PHPSESSID=3ph0dvg4quebl1b2e711d8i5p1; path=/; secure
Cache-Control: public, must-revalidate, max-age=0
X-Served-By: curie.-SNIP-
Location: /mod.php/index.php
Vary: Accept-Encoding
Content-Length: 0
Connection: close
Content-Type: text/html; charset=UTF-8

> and some variations of the above that I often use to check the headers only:
> 
> $ curl -s -S -I --http1.0 -X OPTIONS http://10.1.1.12:20333/
> $ curl -s -S -D - --http1.0 -X OPTIONS http://10.1.1.12:20333/
> 
> You can also try the health check with HTTP/1.1 version which provides
> keepalive but you need to specify the Host header in that case.
> 
> By the way, any errors in the haproxy logs? Maybe set the log mode to debug?

Originally there was very little useful data in the log files at all. Adding 
the log-health-checks helped but it's still frustratingly incomplete.

Re: httpchk failures

Reply via email to