I'm having some trouble interpreting an HAProxy log file line - it doesn't seem 
to make much sense to me, so I'm fairly certain that I'm reading it 
incorrectly.  My full configuration file is at the end of this email.  
Symptomatically, I've been seeing situations in which people are briefly 
redirected unexpectedly onto a different 'gui' backend server.  Its rare, but 
when it happens it often appears at the same point in the GUI use flow.  This 
may be a red herring however.

This morning I was able to reproduce it even after restarting our appserver 
proceses and our haproxy process (which had been up for an extended period of 
time - thanks!).  When it reproduced, I noticed that the backend appserver was 
marked as DOWN in the statistics report even though the logfiles gave no 
indication that there were any problems. From the user's experience, the system 
"hung" waiting for a response for some 40 seconds and then showed a login 
screen, which was served when the "incorrect" backend received a page request.  
Simply re-requesting the previous page was successful, as the previously 
authenticated backend was shown as UP by that point.

In the logfile we see the following entry at the appropriate time:

Jul 26 06:21:09 localhost haproxy[3853]: 172.25.200.177:3124 
[26/Jul/2012:06:21:01.826] secure gui/gui2 4993/0/0/-1/7992 -1 0 - - CH-- 
40/40/1/1/0 0/0 "HEAD /panel/healthcheck.html HTTP/1.1"

This file - while similar to the HAProxy healthcheck URL (and in fact produced 
the same way on the backend server) - is actually the one being requested by 
our hardware LBs (we use existing hardware LBs to distribute between machines, 
and a single haproxy daemon on each box to distribute to appservers within each 
machine).  It appears that it saw some sort of timeout, but I'm not sure why 
that would have caused haproxy to mark the backend as DOWN.  We add backend 
server markers to our generated pages, so I'm confident that the hardware LB 
kept us on the same physical system the entire time.

Needless to say I'm a little confused as to what I'm seeing.  If anyone can 
shed a little more light on the subject it would be very much appreciated!  
Other than this recurrent issue, everything's been working incredibly well.

Configuration file below in full in case there's something odd:

global
    maxconn 20000
    tune.bufsize 128000
    tune.maxrewrite 8192
    ulimit-n 60000
    stats socket /tmp/haproxy mode 0777 level admin
    log 127.0.0.1 local0

defaults
    mode http
    timeout connect 5s
    timeout client 1m
    timeout server 10m
    option srvtcpka
    option http-server-close
    option http-pretend-keepalive
    option redispatch
    option abortonclose
    option httpchk HEAD /panel/healthcheck.txt HTTP/1.0

backend gui
    balance source
    appsession JSESSIONID len 64 timeout 3h
    server gui1 172.25.200.53:8080 check maxconn 2000
    server gui2 172.25.200.54:8080 check maxconn 2000

backend platform
    balance roundrobin
    server platform1 172.25.200.50:8080 check maxconn 2000 
    server platform2 172.25.200.51:8080 check maxconn 2000
    server platform3 172.25.200.52:8080 check maxconn 2000

frontend secure
    bind 172.25.200.70:8080
    maxconn 20000
    option forwardfor
    option clitcpka
    acl javascript url_beg   /js
    acl widgets    url_beg   /widgets
    acl reports    url_beg   /datafeed
    acl deny       url_beg   /server-status /balancer-manager /jmx-console 
/web-console /invoker
    block if deny
    default_backend gui
    use_backend platform if javascript or widgets or reports
    log global
    option httplog
    option dontlognull
    option dontlog-normal

listen stats :7583
    mode http
    stats uri /




Reply via email to