On Thu, Oct 30, 2014 at 11:45:41PM +0100, Evert van Es wrote: > Willy, > > thank for the reply. > > I did remove some of the information from the log but it was only the public > ip numbers, there was no port information.
The default log format shows the port after the IP address, but indeed in your log format you stripped it. > So I presume the port is the same on all requests. We cannot know, just guess. And since we're diagnosing something others don't reproduce, every bit of information counts. > I Also made the urls anonymous in the log by specifying www.customera.nl. > > You did notice a pause. That is correct. The page first shows a list to the > user. The first postback is to select one item, that returns the ???big??? > page with different information where more input is gathered. The user then > does another post on the same page to continue. This last post is the > ???big??? one that gives trouble. I know I need to get the page to be smaller > but it would be nice to find the transmission error too. OK so that would confirm the single-port supposition. > I checked the server timeouts but they are all above 120 seconds as far as I > can find. OK. > Using my cell phone and a local wifi connection I also tried to simulate, > this took very long but no error. Tomorrow I will try to do this again and > also record the haproxy log. OK. > My haproxy -vv output is: > > > HA-Proxy version 1.5.4 2014/09/02 > Copyright 2000-2014 Willy Tarreau <[email protected]> OK so while there were a few important fixes since 1.5.4, none of them seems to affect your use case. > Build options : > TARGET = linux2628 > CPU = generic > CC = gcc > CFLAGS = -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat > -Wformat-security -Werror=format-security -D_FORTIFY_SOURCE=2 > OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 (...) So far so good. > frontend http > bind 0.0.0.0:80 > reqadd X-Forwarded-Proto:\ http > option httplog > capture request header Host len 32 > capture request header Referrer len 64 > > log-format %ci\ [%Tl]\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ %CS\ > %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %r > > errorfile 408 /dev/null > > acl acl_ws06 hdr_end(Host) -i customera.nl > > use_backend bk_ws06 if acl_ws06 > > default_backend bk_ws01 > > backend bk_ws06 > option httplog > option forwardfor > server iis05 192.168.30.16:80 > > Hope this helps. Sure. But there's nothing wrong here, so we're probably on a bug :-( > If I need to create a trace please help me with some directions on how to do > that. So you'll have to use tcpdump on the machine running haproxy. You'll have to run two instances of it, one with the ip:port haproxy listens to, and one with the server's ip:port it connects to. It's important to capture complete packets. Once the problem is reproduced, you can simply Ctrl-C, gzip the traces and send them. Be careful, these traces *will* contain sensitive information such as exchanged contents, IP addresses etc. Don't post them to the list if you think it can be a problem or if they're large. You can replace IP addresses using tcprewrite. If you do this, please use different addresses for each different node! The simplest way to capture for a given ip and port is the following : tcpdump -vps0 -w trace-client.cap host 1.2.3.4 and port 1234 tcpdump -vps0 -w trace-server.cap host 5.6.7.7 and port 5678 It will count captured packets. You can stop the capture with Ctrl-C. You can look at the details using "tcpdump -nvXr trace-server.cap" or using tools like wireshark to inspect the contents. Here I hope the uploads are not too large, I'm hoping to find an issue between the content-lengthh and the number of bytes sent, or an unexpected close or timeout during the latest transfer. If your haproxy machine is used at a very low load, you can also take a trace of the haproxy process itself in parallel so that we can see what causes any strange pattern we might discover. This can be achieved this way : strace -Ttto haproxy-trace.log -s 200 -p $(pgrep haproxy) Same, ctrl-c to terminate. I'd advise you to experiment with each of them first, as there's nothing more frustrating than asking a customer for a second test because the first one wasn't recorded. This is also the reason for launching both strace and tcpdump. Regards, Willy

