Following up on this problem I had, thanks to the help of Baptiste and Willy 
I've found a work around.

It turns out that my web server (xlightweb) when compressing some responses  
was truncating the data returned compared to size it was advertising.

The slightly weird thing is that was was only causing a problem across a 
network and not on the loopback device. The conclusion being this must be 
related to timing somehow.

Unfortunately I'm stuck with xlightweb, as much as I'd love to get rid of it, 
but still need to make use of compression.
Haproxy to the rescue here with its relatively new ability to offload the 
compression.

This was the config I added.

    compression algo gzip
    compression type text/cmd text/css text/csv text/html text/javascript 
text/plain text/vcard text/xml application/json 
application/x-www-form-urlencoded application/javascript 
application/x-javascript
    compression offload

This will remove the Accepts-Encoding header from the request to your backend 
and perform the compression in the haproxy itself where appropriate.

The only final gottcha is make sure you are compiling haproxy with USE_ZLIB=yes 
to enable gzip support

Once again thanks to Baptiste and Willy for their help.


Will Lewis


>> Hi,
>> 
>> There is no content-length because you're in chunk mode, so this is fine.
>> 
>> Which version of HAProxy are you running?
>> 
>> Could you take a tcpdump between HAProxy and your server?
>> and send it to me (/Cc Willy).
>> You might be hitting a bug :/
>> 
>> Baptiste
>> 
>> On Tue, Jan 8, 2013 at 5:45 PM, William Lewis <[email protected]> wrote:
>>> Hi,
>>> 
>>> Thanks for the reply.
>>> 
>>> I'm using 2 different ports because the ports are dynamically assigned on
>>> startup and the haproxy config rewritten and reloaded.
>>> 
>>> The web server isn't actually sending a Content-Length header, which it
>>> probably should be but still shouldn't cause it to break in this fashion.
>>> 
>>> The request and response headers looks like so.
>>> 
>>> GET /app/js/libs/jq.mobi.js HTTP/1.1 Host: photorating.mshot.example.com
>>> Connection: keep-alive User-Agent: Mozilla/5.0(iPad; U; CPU OS 6_0 like Mac
>>> OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2
>>> Mobile/8F191 Safari/6533.18.5 Accept: */* Referer:
>>> http://netproteus.com/test.html Accept-Encoding: gzip,deflate,sdch
>>> Accept-Language: en-US,en;q=0.8 Accept-Charset:
>>> ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie: ha-server-photorating=wap10-mlan
>>> 
>>> HTTP/1.1 200 OK Server: xLightweb/2.13.2-B7 Content-Type:
>>> application/javascript Transfer-Encoding: chunked Last-modified: Mon, 7 Jan
>>> 2013 12:27:14 +0000 Cache-control: max-age=101461101461 Expires: Wed, 9 Jan
>>> 2013 20:49:17 +0000 Cache-Control: no-transform Content-Encoding: gzip
>>> Set-Cookie: ha-server-photorating=host_b; path=/; domain=.example.com
>>> 
>>> 
>>> I can believe that xlightweb is doing something odd that is contributing to
>>> this but the fact that
>>> a) this only ever manifests for 1 specific payload
>>> b) this never causes a problem when haproxy isn't in the chain
>>> 
>>> Leads me to believe I'm hitting a race condition in haproxy itself.
>>> 
>>> Thanks
>>> 
>>> Will
>>> 
>>> On Jan 8, 2013, at 4:10 PM, Baptiste <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> What is weird is that your Application server does not deliver the
>>> same object on both servers to a local connection or a remote
>>> connection...
>>> Why are you using 2 different ports??
>>> Could you log the response Content-Length header?
>>> 
>>> From your log line, it sounds related to a network or server issue...
>>> 
>>> cheers
>>> 
>>> 
>>> On Tue, Jan 8, 2013 at 1:38 PM, William Lewis <[email protected]> wrote:
>>> 
>>> Further to this I've found if I add a large comment at the bottom of the
>>> file to increase the file size then the problem goes away.
>>> 
>>> It might be that xlightweb (it's a rather rubbish container but I'm stuck
>>> with it) is doing something weird for this file, but if I remove the haproxy
>>> layer and have the ip balancer talk directly to the web servers the problem
>>> doesn't manifest in a browser.
>>> 
>>> 
>>> On Jan 8, 2013, at 11:20 AM, William Lewis <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm going slightly crazy trying to work out this problem and I hope someone
>>> can help.
>>> 
>>> I have 2 hosts, each host is running an instance of haproxy and an instance
>>> of a java web server xlightweb. Between these hosts and the outside world
>>> there is a dumb round robin ip balancer that holds no state.
>>> 
>>> Each of the haproxy are configured to balance between both the localhost
>>> instance of the web server and the instance on the other host.
>>> 
>>> (This is a currently a proof of concept system before I deploy haproxy on
>>> its own machines in front of many more web servers)
>>> 
>>> 
>>> The problem that I'm having is that serving a particular javascript file is
>>> failing when the haproxy on host A is fetching it from host B and when the
>>> haproxy on host B is fetching it from host A.
>>> It always affects the same javascript file only, and there are many more
>>> javascript files in use which are being served fine.
>>> 
>>> Logs from haproxy host A
>>> 
>>> be/host_a 1/0/1/13/18 200 16389   --NI 1/1/0/0/0 0/0 {Mozilla/5.0(iPad; U;
>>> CPU OS 6_0 like Mac OS X|} {} "GET /app/js/libs/jq.mobi.js HTTP/1.1"
>>> be/host_b 4640/0/0/7/4648 200 14955   SDNI 0/0/0/0/0 0/0 {Mozilla/5.0(iPad;
>>> U; CPU OS 6_0 like Mac OS X|} {} "GET /app/js/libs/jq.mobi.js HTTP/1.1"
>>> 
>>> Logs from haproxy host B
>>> 
>>> be/host_a 0/0/1/13/17 200 14956   SDNI 0/0/0/0/0 0/0 {Mozilla/5.0(iPad; U;
>>> CPU OS 6_0 like Mac OS X|} {} "GET /app/js/libs/jq.mobi.js HTTP/1.1"
>>> be/host_b 0/0/0/4/5 200 16388   --NI 1/1/0/1/0 0/0 {Mozilla/5.0(iPad; U; CPU
>>> OS 6_0 like Mac OS X|} {} "GET /app/js/libs/jq.mobi.js HTTP/1.1"
>>> 
>>> 
>>> My haproxy configuration is this
>>> 
>>> global
>>>  daemon
>>>  quiet
>>>  maxconn 200000
>>>  pidfile /local/migwproxy/haproxy.pid
>>>  uid     60003
>>>  gid     1001
>>>  chroot  /local/migwproxy/run
>>>  log     127.0.0.1       local0
>>>  log     127.0.0.1       local1 notice
>>>  log-tag migwproxy
>>> 
>>> defaults
>>>  log global
>>> 
>>>  balance roundrobin
>>>  mode http
>>>  monitor-uri /migwproxy
>>>  http-check send-state
>>> 
>>>  retries 3
>>> 
>>>  timeout connect 6000
>>>  timeout client 1020000
>>>  timeout server 1020000
>>>  timeout http-request 6000
>>> 
>>>  option forwardfor except 127.0.0.1
>>>  option http-server-close
>>>  option httplog
>>>  option log-health-checks
>>>  option log-separate-errors
>>>  option redispatch
>>>  option tcpka
>>> 
>>> frontend external
>>>  bind *:9000
>>> 
>>>  maxconn 200000
>>> 
>>>  # Capture User-Agent and X-Forward-For headers to the log
>>>  capture request header User-agent len 45
>>>  capture request header X-Forwarded-For len 15
>>>  # Capture any 302 redirects to the log
>>>  capture response header Location len 20
>>> 
>>> 
>>>  # We keep track of connection rates and connection numbers
>>>  stick-table type ip size 200k expire 2m store conn_rate(3s),conn_cur
>>>  # And we do this per source address
>>>  tcp-request connection track-sc1 src
>>> 
>>>  acl source_rate_abuser sc1_conn_rate gt 200
>>>  acl source_connections_abuser sc1_conn_cur gt 3000
>>> 
>>>  acl acl_photorating hdr(host) -i photorating.mshot.example.com -i
>>> photorating.mshoteu.example.com -i photorating.mshotus.example.com -i
>>> api.photorating.mshot.example.com -i api.photorating.mshoteu.example.com -i
>>> api.photorating.mshotus.example.com -i push.photorating.mshot.example.com -i
>>> push.photorating.mshoteu.example.com -i push.photorating.mshotus.example.com
>>> 
>>>  use_backend be if acl_photorating !source_rate_abuser || acl_photorating
>>> !source_connections_abuser
>>>  use_backend be-slow if acl_photorating source_rate_abuser ||
>>> acl_photorating source_connections_abuser
>>> 
>>> 
>>> backend be
>>> 
>>>  cookie ha-server-photorating insert domain .example.com
>>> 
>>>  server host_a 10.10.184.103:34025 cookie host_a check inter 5000 maxconn
>>> 500
>>>  server host_b 10.10.184.11:25117 cookie hosst_b check inter 5000 maxconn
>>> 500
>>> 
>>> backend be-slow
>>> 
>>>  cookie ha-server-photorating insert domain .example.com
>>> 
>>>  server host_a 10.10.184.103:34025 cookie host_a check inter 5000 maxconn
>>> 500
>>>  server host_b 10.10.184.11:25117 cookie hosst_b check inter 5000 maxconn
>>> 500
>>> 
>>> 
>>> You can also see a live example here, http://netproteus.com/test.html
>>> 
>>> 
>>> Any insight, greatly appreciated.
>>> 
>>> 
>>> Will Lewis
>>> 

Reply via email to