Re: Question about source IP persistence (balance source) when a server goes down:
Hi Malcolm, On Fri, Jan 16, 2009 at 02:48:18PM +, Malcolm Turnbull wrote: The manual states that when using balance source: The source IP address is hashed and divided by the total weight of the running servers to designate which server will receive the request. This ensures that the same client IP address will always reach the same server as long as no server goes down or up. If the hash result changes due to the number of running servers changing, many clients will be directed to a different server. This algorithm is generally used in TCP mode where no cookie may be inserted. It may also be used on the Internet to provide a best-effort stickyness to clients which refuse session cookies. This algorithm is static, which means that changing a server's weight on the fly will have no effect. Does this mean that if say 1 server out of a cluster of 5 servers fails, then it is likely that the hash result changes and lots of the clients could potentially loose their session? (i..e hit the wrong server because the hash has changed) Absolutely. That's why hashing should only be used for cache optimization but not when strong persistency is required in case of server failure (cookies are far better then). Regards, Willy
Re: Balancing OpenLDAP
On Mon, Jan 19, 2009 at 10:16:46PM +0100, Jordi Espasa wrote: Jordi's question got me thinking. Does haproxy support externally scripted healthchecks? If not, this would be useful for implementing a variety of healthchecks that aren't built into haproxy. Yes. It would be a very cool feature. No it does not. Yes it would be cool, but it's somewhat incompatible with chroot. The possible long-term solutions include : - shared library support, in order to load external plugins, including complex health-checks plugins ; - performing the checks in an independant process. That would be very nice since it would allow better support for multi-process usage. Another solution would be to state that chroot is incompatible with external scripts, and let the user make a choice. Maybe we can try to think about the required parameters for an external script, and see how that could be implemented. We might even reuse some parts of what I had developped for Keepalived (VRRP tracking scripts). It was quite advanced (cache of last result, etc...), and keepalived's and haproxy's architectures are quite similar. Now, speaking about the LDAP checks, I was about to implement one in the past due to a customer's need, and finally let go because the customer was not interested due to some aspects which were not covered (detection of end of replication). So right now there's no LDAP check. Regards, willy
Re: HAProxy: listening port set up and performance
Hi, On Mon, Jan 19, 2009 at 06:11:13PM -0800, Hsin, Chih-fan wrote: Hi, I am new to HAProxy and have questions about the configuration and performance. I downloaded HAProxy 1.3.15.7 from http://haproxy.1wt.eu/blocked::http://haproxy.1wt.eu/ to /home/user/Tool Unpacked it by tar xopf name.tar.gz.tar Run make TARGET=linux26 1) Apache HTTP testing Servers webA (192.168.5.4) and webB (192.168.5.5) have Apache server started. Client (192.168.5.1) sends http requests to webA (192.168.5.4:80) and webB (192.168.5.5:80), and can get valide http response (default Apache response). [client] - [webA],[webB] Now, the HAProxy does not have Apache server and has IP address 192.168.5.3. At HAProxy, run ./haproxy -f ./haproxy1.cfg. The haproxy1.cfg is below [client] - [HAProxy] - [webA],[webB]. Client sends http request to HAProxy via IE web browing to http://192.168.5.3. However, the client cannot get web response. By this, do you mean that no response ever comes back, or that you get an error ? Client sends http request to HAProxy via multiple IE web browsing to http://192.168.5.3:80. Then it works. Are you sure that your apache machines don't block access from haproxy ? Also, do you have any form of virtual hosting on those machines, which would refuse requests with a Host: field with a wrong IP address in it ? haproxy1.cfg file listen webfarm 192.168.5.3:80- How do I decide the correct port number to make http://192.168.5.3 work? This is the correct form. mode http balance roundrobin cookie SERVERID insert indirect stats enable server webA 192.168.5.4:80 cookie A check server webB 192.168.5.5:80 cookie B check I see that you have not configured timeouts. This is bad (though it should not cause you the problem you're seeing). Please add the following lines to the section above : timeout client 30s timeout server 30s timeout connect 5s Also, you should log, you would see in the logs what is wrong. For this, please add the following line : log 127.0.0.1 local0 Then ensure that your syslogd listens to the UDP socket (syslogd -r), and check the log files (you will see one line added when you start haproxy, then one line per request). 2) TCP testing (using Iperf) Servers webA (192.168.5.4) and webB (192,168.5.5) run Iperf TCP sink to listen to port 5001 Client runs 2 Iperf TCP connections and send traffic to 192.168.5.3:80 At HAProxy, run ./haproxy -f ./haproxy2.cfg haproxy2.cfg listen webfarm 192.168.5.3:80 mode tcp balance roundrobin stats enable server webA 192.168.5.4:5001 server webB 192.168.5.5:5001 I can achieve 36Mbps each under the following scenario. [client] - 36Mbps - [webA] |-36Mbps - [webB] 36 Mbps ??? You definitely have something wrong in your setup ! This is the bandwidth you could expect from a saturated 100 Mbps HUB at maximal collision rate !!! Please check that all your connections are made to a switch and not a hub, and that all your hosts have negociated full duplex (with ethtool). However, when I use HAProxy, I can only achieve 18Mbps each. The utilization of HAProxy machine is low. [client] - [HAProxy] - 18Mbps - [webA] | 18Mbps - [webB] Is this normal? Is there any way to improve it? No it's not normal, but expected from the bad numbers above. If you are using a hub which is saturated, then making the traffic pass twice over it will half the per-host bandwidth. Right now, it does not make sense to use a hub for network testing, not even a 100 Mbps switch. Gigabit switches are very cheap, you can find a 5-port gig switch for less than $50, it could save you a lot of time spent troubleshooting connectivity issues. Regards, Willy
Re: Balancing OpenLDAP
On Tue, Jan 20, 2009 at 07:43:25PM +0800, Unai Rodriguez wrote: How about writing a bash script that checks LDAP status somehow and have this script managed by xinetd? The script should return HTTP/1.1 200 OK\r\n if the LDAP server is fine or something else if not (e.g. HTTP/1.1 503 Service Unavailable\r\n). Xinetd could be configured in such a way that the script is invoked upon connecting to a defined port, let's say 9200. Then, we could have on the HAProxy configuration something like this: listen LDAP IP:389 modetcp option httpchk server ldap_srv1 ip:389 check port 9200 inter 5000 rise 3 fall 3 What would you think of that approach? it is the usual way of performing complex checks. An alternative method consists in referencing the LDAP script itself in inetd. Willy
Re: reqrep help
Hi Dave, On Wed, Jan 21, 2009 at 12:44:53PM -0500, Dave Pascoe wrote: Long-time haproxy user...first time poster. Finally ran into a rewrite issue I just haven't been able to solve. Seems like it ought to be simple. Problem: Need to rewrite requests like /foo/favicon.ico and to just /favicon.ico Using this line: reqrep ^([^\ ]*)\ /(.*)/favicon.ico \1\ /favicon.ico results in an HTTP 502 returned. Just having a mental block today...why would I be getting a 502? I think I found the reason. reqrep replaces the *whole line* with the new string. So basically, you're replacing lines such as GET /foo/favicon.ico HTTP/1.0 with GET /favicon.ico (without the HTTP version). This becomes HTTP/0.9, and I guess your server simply resets the connection because it does not support it, leading to a 502. IMHO you should be using : reqrep ^([^\ ]*)\ /(.*)/favicon.ico\ (.*) \1\ /favicon.ico\ \3 And yes, I know this is not very convenient ;-) Cheers, Willy
Re: stats socket problem
Hi Martin, On Wed, Jan 21, 2009 at 12:13:35PM +0100, Martin Karbon wrote: Hi I am relatively new to this great software and I am having problems with the feature stats socket. it won't write the haproxy.stat file no matter what. so I cannot run the socat. r...@lb1:~# echo show stat | socat unix-connect:/var/run/haproxy.stat stdio 2009/01/21 12:12:54 socat[4887] E connect(3, AF=1 /var/run/haproxy.stat, 23): No such file or directory I wanted to try to write some script that checks the connection distribution every n seconds (i.e.for a monitoring tool)... any advice for this? I see nothing wrong in your config. It's so simple! Could you check a few things, such as if something already exists in /var/run, if you see a symbolic link or anything like this which might explain such a strange behaviour. Also, it would be nice if you could start haproxy under strace : # strace -o /tmp/haproxy-start.log haproxy -db -f /etc/haproxy.cfg Then press Ctrl-C after a few seconds, and look for haproxy.stat in the output file (or post the result here). I think we'll find the reason there. There must be an error somewhere, or there is something disabling the stats but I don't see what can do that. Regards, Willy
Re: stats socket problem
On Wed, Jan 21, 2009 at 09:43:58PM +0100, Martin Karbon wrote: Quoting Willy Tarreau w...@1wt.eu: Hi Willy, thanks for the fast reply Hi Martin, On Wed, Jan 21, 2009 at 12:13:35PM +0100, Martin Karbon wrote: Hi I am relatively new to this great software and I am having problems with the feature stats socket. it won't write the haproxy.stat file no matter what. so I cannot run the socat. r...@lb1:~# echo show stat | socat unix-connect:/var/run/haproxy.stat stdio 2009/01/21 12:12:54 socat[4887] E connect(3, AF=1 /var/run/haproxy.stat, 23): No such file or directory I wanted to try to write some script that checks the connection distribution every n seconds (i.e.for a monitoring tool)... any advice for this? I see nothing wrong in your config. It's so simple! Could you check a few things, such as if something already exists in /var/run, if you see a symbolic link or anything like this which might explain such a strange behaviour. As far as I can see no symbolic link in here...just pid files and some dirs Also, it would be nice if you could start haproxy under strace : # strace -o /tmp/haproxy-start.log haproxy -db -f /etc/haproxy.cfg Then press Ctrl-C after a few seconds, and look for haproxy.stat in the output file (or post the result here). I think we'll find the reason there. There must be an error somewhere, or there is something disabling the stats but I don't see what can do that. Regards, Willy here is the output of the strace (note: this is a virtual machine I have at home, same installation as the original one) (...) now that's rather intriguing. There's no trace of the socket at all. Just as if the stats line was ignored. What version of haproxy do you have ? (haproxy -vv) Could you add an error on the stats line (insert foobar before socket) so that you can verify haproxy complains ? If it does not complain, could you retype the line or at least add one line with stats foobar in order to get the parsing error ? I'm realizing that I generally add the stats socket line approximately as the last line of the global section. While it would appear stupid to me, could you please move this line at the end of the section ? Maybe there's a long standing bug caused by another parameter clearing this one (but once again, I'd find that a bit strange). I prefer easily reproducible errors like this one to non- deterministic ones ! Regards, Willy
Re: Problems with HAProxy, down servers and 503 errors
Hi John, On Sun, Jan 25, 2009 at 11:23:24AM -0500, John Marrett wrote: I'm embarassed to report that this is not an HAProxy issue. Don't feel embarassed. I'm glad that you found the issue. And it's kind to send us an update. In addition to the changes being made on the load balancing level, we have also upgraded the backend real servers. It seems there has been a change in their shutdown procedure, where before they would stop responding immediately when a shutdown was initiated they now return 503 errors during the (protracted) shut down process. This explains both unusual issues perfectly clearly. I'm very sorry for any time wasted looking into this issue. No problem, no time wasted yet ! Have you at least found a solution to your issue ? Regards, Willy
Re: Stunnel + HAProxy + Apache + Tomcat
Hi Jill, On Thu, Jan 22, 2009 at 02:30:55PM -0500, Jill Rochelle wrote: I'm just getting started with all this; I thought I had this working last year, but having issues now. When using stunnel and xforwardfor with haproxy, is the URL suppose to stay https or will it change to http? If it changes to http, is it secure; no lock shows in browser? The URL used by the browser is still https, as it only defines the protocol to use. Also, has anybody got this working along side Apache and Tomcat where Apache is routing everything to tomcat as the main application is running in tomcat only? Routing is port 80 (apache) 85 (haproxy) (server for proxy - goes back to apache) mod_jk to tomcat I see nothing abnormal in your description, though I've never used tomcat. Parts of application is http and other parts are https. I need the URL to remain https (when entering that part of app) so that it is secure and the lock for the certificate appears in the browser. I feel like I'm missing something, but I can't put my finger on it. If you're using apache's mod_proxy, you might have difficulties setting up proxypass and proxypass reverse to make https appear as such. I don't remember the exact details, but I know people who're constantly annoyed by the fact that apache rewrites the URL when passing the request, instead of leaving it untouched. This could be what you had in mind. Regards, Willy
Re: Problems with HAProxy, down servers and 503 errors
On Sun, Jan 25, 2009 at 07:06:23PM -0500, John Marrett wrote: Willy, No problem, no time wasted yet ! Well, none of your time :) It took me far longer than it should have to realise my error. Regretable, packet captures are usually my first diagnostic tool. A mistake I won't make again any time soon. Have you at least found a solution to your issue ? I've found a partial solution to my issue, and in fact, now I have a question that's relevant to the list. The backend server is IIS, if you're getting 503s during shutdowns, you can use this solution to turn them into RSTs [1]. The RST is sent by IIS after it receives the full client request from HAProxy (I suspect that it may want to see the Host header before it decides how it wants to treat the request). When HAProxy receives the RST it returns a 503 to the client (respecting the errorfile!). Despite the presence of option redistribute, HAProxy does not send the request to another backend server. If there was a way to get HAProxy to send the request to another functional real server at this time it would be great, though I fear that HAProxy no longer has the request information after having sent it to the server. You're perfectly right, redispatch only happens when the request is still in haproxy. Once it has been sent, it is cannot be performed. It must not be performed either for non idempotent requests, because there is no way to know whether some processing has begun on the server before it died and returned an RST. Any further advice would be much appreciated, I can provide packet captures off list if required. Shouldn't you include the Host header in the health checks, in order to sollicit the final server and get a chance to see it fail ? Regards, Willy
Re: Reducing I/O load of logging
Hi guys, On Fri, Feb 13, 2009 at 08:04:50AM -0500, John Lauro wrote: It wouldn't hurt to put RHEL 5 or Centos 5 on the box instead of FC. FC is generally meant for desktops instead of servers. A customer has encountered a similar issue a few times on RHEL3. We noticed there was swap on the affected machines. It would happen after about 6 months of production. Haproxy would not receive any request for some long periods (several seconds) and we noticed this happened most frequently during network backups. We had a few occurrences of the issue in the middle of the day while the admins were grepping errors in the logs. There was a lot of CPU usage, so at first we suspected scheduling issues. But when we noticed the swap usage, we figured that some of the process' structures might have been swapped, causing long delays when accessing data. Interestingly, restarting the process was enough to make the issue go away, since the memory usage was quite lower after a restart. The reason for the swap was not a lack of RAM but a high usage of the disk cache pushing rarely used data into the swap. And I agree with you John, a swapoff -a must absolutely be done. There's not even one valid reason to enable swap on a network server, all it can do is delay all operations and kill performance. Your default ulimit -n is only 1024. Just make sure you raise that to match or exceed your Haproxy configuration prior to starting Haproxy. Even if that is a problem, it wouldn't explain why you have a problem when looking at the logs. It is not a problem if haproxy is started as root, as it adjusts the ulimit-n itself. And you're right, it would not cause side effects while looking at the logs. The grep on /var/messages completed too quick to really catch much. That said, your SYS time is a little high, especially after it finished. For an 8 core box, only 12.5% would mean one core dedicated to the task, and it rose from 4 to 16. Given that it was counted as sys and not user, and generated little I/O, indicates it might be slow memory processing on the cache. Other I/O intensive workloads such as wc -l /var/log/* might help seeing if the swap usage suddenly grows. Another test which might be done when the problem becomes reproducible, is to flush the caches and swapoff everything : # echo 1 /proc/sys/vm/drop_caches # swapoff -a Then redo the operation. If the problem does not happen anymore, it clearly indicates a poor tradeoff between swap and cache. Regards, Willy
Re: Response with leading space?
On Sat, Feb 14, 2009 at 10:19:33AM -0500, Luke Melia wrote: On Fri, Feb 13, 2009 at 09:30:18PM +0100, Willy Tarreau wrote: Wow. That's pretty strange. I don't see any possibility for haproxy to do something like this, especially at the beginning of the data. But I can't imagine how nor why nginx would do that either. Quick update: heard from one our admins that another customer of theirs experienced the same problem. The other customer is NOT using HAProxy, but rather a custom load balancer for nginx, so I think we can safely rule out HAProxy. OK, thanks for the update. Willy
Re: Problem with haproxy under testload
Hi Valentino, On Thu, Feb 19, 2009 at 11:04:21AM -0800, Valentino Volonghi wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I've been trying to use haproxy 1.3.15.7 in front of a couple of erlang mochiweb servers in EC2. The server alone can deal with about 3000 req/sec and I can hit it directly with ab or siege or tsung and see a similar result. I then tried using nginx in front of the system and it was about to reach about the same numbers although apparently it couldn't really improve performance as much as I expected and instead it increases latency quite a lot. I then went on to try with haproxy but when I use ab to benchmark with 100k connection with 1000 concurrency after 30k requests I see haproxy jumping to 100% CPU usage. I tried looking into a strace of what's going on and there are many EADDRNOTAVAIL errors which I suppose means that ports are finished, You're correct, this is typically what happens in such a case. The increase of CPU usage is generally caused by two factors : - connection retries, which act as a traffic multiplier - immediate closure which causes the load injection tool to immediately send a new connection instead of having to wait a few ms for the application. even though I increased the available ports with sysctl. could you try to reduce the number of ports to see if the problem still shows up after 30k reqs or if it shows up before ? Use 5k ports for instance. I'm asking this because I see that you have 25k for maxconn, and since those numbers are very close, we need to find which one triggers the issue. haproxy configuration is the following: global maxconn 25000 user haproxy group haproxy defaults log global modehttp option dontlognull option httpclose option forceclose option forwardfor maxconn 25000 timeout connect 5000 timeout client 2000 timeout server 1 timeout http-request 15000 balance roundrobin listen adserver bind :80 server ad1 127.0.0.1:8000 check inter 1 fall 50 rise 1 stats enable stats uri /lb?stats stats realm Haproxy\ Stats stats auth admin:pass stats refresh 5s Since you have stats enabled, could you check in the stats how many sessions are still active on the frontend when the problem happens ? Reading this list archives I think I have some of the symptoms explained in these mails: http://www.formilux.org/archives/haproxy/0901/1670.html oops, I don't remember having read this mail, and the poor guy seems not to have got any response! This is caused by connect() failing for EADDRNOTAVAIL and thus considers the server down. I'm not that sure about this one. There's no mention of any CPU usage, and traditionally this is the symptom of the ip_conntrack module being loaded on the machine. http://www.formilux.org/archives/haproxy/0901/1735.html I think I'm seeing exactly the same issue here. There could be a common cause, though in John's test, it was caused by a huge client timeout, and maybe the client did not correctly respond to late packets after the session was closed (typically if a firewall is running on the client machines), which causes the sessions to last for as long as the client timeout. A small strace excerpt: socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 18 fcntl64(18, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 setsockopt(18, SOL_TCP, TCP_NODELAY, [1], 4) = 0 connect(18, {sa_family=AF_INET, sin_port=htons(8000), sin_addr=inet_addr(127.0.0.1)}, 16) = -1 EADDRNOTAVAIL (Cannot assign requested address) close(18) This one is a nice capture of the problem. The fact that the FD is very low (18) also makes me think that you will see very few connections on haproxy on the stats page, meaning that some resource in the system is exhausted (likely source ports). recv(357, 0x9c1acb8, 16384, MSG_NOSIGNAL) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(0, EPOLL_CTL_ADD, 357, {EPOLLIN, {u32=357, u64=357}}) = 0 This was is not related, it's normal behaviour : running in non-blocking mode, you try to read, there's nothing (EAGAIN), so you register this FD for polling. The last one mostly to show that I'm using epoll, in fact speculative epoll, but even turning it off doesn't solve the issue. Agreed it's speculative epoll which produces the two lines above. If you just use epoll, you'll see epoll_ctl(EPOLL_CTL_ADD), then epoll_wait() then read(). But this will not change anything to the connect() issue the system is reporting. An interesting problem is that if I use mode tcp instead of mode http this doesn't happen, but since it doesn't forward the client IP address (and I can't patch an EC2 kernel) I can't do it. this is because your load tester uses HTTP keep-alive by default, which tcp mode does not affect. In http mode with option httpclose and option forceclose, keep-alive is disabled. If you comment
Re: HAProxy mod_rails (Passenger)
On Thu, Feb 19, 2009 at 10:02:36AM +0100, Matthias Müller wrote: Hello there I'm trying to find a suitable solution to load balance Rails applications via Passenger and HAProxy.Currentliy I'm doing a lot of testing using Apache Bench. The setting is as simple as follows: machine A: HAProxy machine B: Apache with mod_rails my test: 100 concurrent requests via Apache Bench When running 100 concurrent requests against HAProxy, Apache Bench has a lot of non 2XX responses and I get a lot of BADRQ in my HAProxy log like: Feb 19 08:50:25 localhost.localdomain haproxy[1890]: 10.0.0.1:33917 [19/Feb/2009:08:50:10.898] http-proxy http-proxy/member1 -1/13816/1/-1/14702 503 13757 - - 99/99/99/9/0 0/32 BADREQ There's something odd in the log above. It implies that the request was sent into the queue waiting for a server to be free, but also that the request was invalid or perhaps empty. I suspect that AB has timedout slightly before haproxy while waiting for a server response, but I don't understand why we have BADREQ here, we should not have got a 503 with BADREQ. Could you please indicate what exact version this is ? This would help explaining why we can have BADREQ and 503 at the same time. Regards, Willy
Re: Read stat or info from the socket via perl
Hi, On Sat, Feb 14, 2009 at 10:53:11PM +0100, vmware vmware wrote: Hi all, I am trying to read the information (show info, show stat) from the socket of haproxy with a perl script in order to get a similar result when using the socat command. The problem is that I am not able to read something. I see also a similar example in the sources in the contrib folder, but somehow I don't get the point how to read and print this information. This is my perl program. Thanks for any help ### use strict; use IO::Socket; use lib /usr/local/nagios/libexec; my $sock = new IO::Socket::UNIX ( LocalAddr = /var/run/haproxy.socket, Type = SOCK_STREAM, Timeout = 1) or die 'error on connectin.'; Well, about 10 years have elapsed since I last wrote a perl line, but I find the name LocalAddr above suspect for a connection. How do you know from the code above that your program is trying to connect and is not setting up a listening socket instead ? That would explain your problem ! Perl still looks awfully obscure to me :-( Willy
Re: Problem with haproxy under testload
On Thu, Feb 19, 2009 at 03:59:54PM -0800, Valentino Volonghi wrote: Could you check net.ipv4.tcp_tw_reuse, and set it to 1 if it's zero ? It probably was set to 0... This fix and the change of tcp_mem to the standard values (which are created dynamically depending on the available memory) basically fixed the problem. Fine! Now HaProxy is giving me much much better latency numbers than nginx, That's surprizing since the two products rely on a similar architecture. and that's what I was looking for. Now 2 machines combined run at a maximum of about 4700 req/sec more or less and average latency for the full request is 0.311 while the fastest is 77ms. You should try to play with the maxconn parameter on the server line in order to reduce the number of concurrent connections on apache. It will surely reduce the response time and increase the number of reqs per second. In my experience, apache is performing the best between 30 and 200 concurrent requests. I would expect something like 20-30% better performance under load with maxconn 100 than without when you load it with 1000 concurrent requests. Regards, Willy
Re: Read stat or info from the socket via perl
Hello Maria, On Fri, Feb 20, 2009 at 11:56:53AM +0100, Maria wrote: Dear Willy, I don't have a lot of experience in perl to. As Nagios allows also to do this via bash or c, I can also use this. My main goal is only to read this information with a language (supported by nagios) and send them to nagios server. It does not matter if it is Perl or not. OK. BTW. I modified a little the code, but I am still not able to read out this information. Maybe I will try in C code. i read from the manual that it is possible to write out statistics into a csv file? Do I specify this in the configuration file? It's not a CSV file, it's just the CSV format as an alternative to the HTML format used for HTTP stats. You can simply request the CSV format by appending ;csv to the stats request. For instance : GET /haproxy_stats;csv HTTP/1.0 Regards, Willy
Re: protection against DDoS attacks
On Tue, Feb 24, 2009 at 07:43:53PM +0300, Ahmad Al-Ibrahim wrote: Hi, I'm using HAProxy in the frontend as a reverse proxy to backend servers, I'm thinking of possible ways to protect backend servers from being attacked. How effective is doing url redirect to protect against these attacks? it will stop all stupid bots which don't even care about parsing the response. You can even improve the setup by setting a cookie and checking it in response. The idea is that if the cookie is there and valid, you forward the traffic to the proper backend. Otherwise you perform a redirect with a set-cookie. Or balancing based on URI? This will most often overload one server which matches the URI being attacked. In most of the DDoS traces I got, only one URI was being requested by thousands of clients. How about using cookies? for example Logged in users with Cookie A goes to backend group A and clients with no cookie set go to backend group x see above ;-) Keep in mind that some people don't like this solution because they fear that some people will not get the cookie. I've once set up a 2-steps redirect for that. The principle is easy : 1) if uri = /XXX and no cookie, return error page 2) redirect to /XXX with set-cookie if no cookie 3) if uri = /XXX, redirect to / /XXX will catch clients which don't support cookies and gently return them an error. One could also decide to slow them down but granting them access to the service, or limiting their number of connections (dedicated backend). There is conn tarpit, how effective it is? and how it can be used to protect against DDoS attacks? It is very effective. I developped it in a hurry to help one guy whose site was set down by a medium-sized attack. Once the tarpit was installed with a proper criteria, we observed the number of concurrent connection go up to stabilize at about 7K, and the load on the servers and the frontend firewall dropped. The tarpit was developped precisely to protect the frontend firewall and the internet link, because most attack tools will simply run a dirty request in a loop and can't parallelize them. So if you're slowing down an attacker to one request per minute, you're saved. The difficulty is to find the matching criteria. You have to check your servers logs to see what causes them trouble, and if you can't blacklist the uri itself, you often have to fire up tcpdump. It's very common to find an uncommon header, a wrong syntax or something like this in the request. You then use that to decide to tarpit the request. What is the most effective way to protect against such DDoS attacks? there's no most effective way. There's only a combination of tools which have an efficiency depending on the attacker's skills and knowledge of your counter-measures. So there are a number of important rules to keep in mind : 1) you have the logs, the attacker does not. Exploit them to the maximum to elaborate the smartest possible matching. 2) you know what he does and he does not know what you do. You must never let the attacker know what you're doing nor how you plan to stop him. Reporting wrong information is nice too. For instance, the tarpit will return 500 after the timeout with a fake server error. But you can also decide to sacrifice a server and send all identified crap to it. This is very important because every method of filtering will have limits which can easily be bypassed with a few minutes or hours of coding once understood. You must ensure that your attacker does not even know what products you are using. 3) he knows who you are (IP) and you don't know behind what IP he hides. This is the problematic part because you don't want to block your customers on your site. 4) you have to constantly monitor your systems and adapt the response to the attack in real time. This prevents the attacker from getting a precise idea of your architecture and components, which would serve him to build an effective attack. Also, you'll have to adjust system tuning (eg: number of SYN/ACK retries, timeouts, etc...), balancing between protection efficiency and the site accessibility for normal users. 5) you must not publicly tell your customers that you're being attacked because if the attacker sees that, he will think hey, they can't stand it anymore, they're about to give up, and they will continue. However, stating that the site is slow due to a transient network issue is fine. 6) never over-estimate the capability of any of your components, and do not hesitate to replace one which does not fit anymore. For instance, if you use haproxy and you see the attack is smart enough to kick it off, put something in front of it, replace it or find any trick to quickly solve the issue. Source-based layer 4 balancing to many L7 proxies is very
Re: Tw timeout server, but no retries happened? sQ 503 NOSRV error in logs
On Mon, Feb 23, 2009 at 12:12:43PM -0800, Michael Fortson wrote: Feb 23 18:50:22 www haproxy[15344]: 11.1.11.1:45025 [23/Feb/2009:18:50:21.939] webservers fast_mongrels/NOSRV 0/101/-1/-1/101 503 212 - - sQ-- 322/309/9/0/0 0/1 GET /blahblah/update/57f6c2408f HTTP/1.1 sQ The session spent too much time in queue and has been expired. See the timeout queue and timeout connect settings to find out how to fix this if it happens too often. If it often happens massively in short periods, it may indicate general problems on the affected servers due to I/O or database congestion, or saturation caused by external attacks. Here's the relevant config: modehttp retries 20 option abortonclose option redispatch timeout connect 100 # time waiting for backend server to accept timeout client 80s # data from client (browser) timeout server 120s # data from server (mongrel) It looks like tW timed out (101 100), but shouldn't it have retried? Retries shows 0 No, it should not retry because the connection attempt timed out. The retries are performed when the connection is refused by the server (in most cases, during a quick restart). Normally, with a correct connect timeout, the system will perform the retries by itself, reason why we don't need haproxy to add retries on top of that. I'm used to set the timeout connect to 4 or 5 seconds. Here you have 100ms which is very small and does not allow even one retransmit (3s). Also I see something odd in your log. The connect time being -1, it's the queue time which is reported as 100ms, which is wrong. I should fix that so the queue time still remains zero and the total time 101. Regards, Willy
Re: Just a small inconsistency in the docs for listening on multiple ports?
Hi Malcolm, On Thu, Feb 26, 2009 at 11:45:31AM +, Malcolm Turnbull wrote: I'm using haproxy-1.3.15.7.tar.gz for some testing and looking at the options to bind multiple ports. The docs imply that you can use a line such as: listen VIP_Name :80,:81,:8080-8089 But this gives me : [ALERT] 056/114217 (18173) : Invalid server address: ':80,' [ALERT] 056/114217 (18173) : Error reading configuration file : /etc/haproxy/haproxy.cfg However if I break up the ports using 3 per line it works fine: listen VIP_Name 192.168.2.83:80 bind 192.168.2.83:103,192.168.2.83:102 bind 192.168.2.83:104,192.168.2.83:105 Is this a deliberate feature change? (not a major issue but just wanted to check) No it's a bug reported by Laurent Dolosor and fixed by Krzysztof Oledzki last month, it's just that I have not released a new release with that alone yet. I think I'll have to find some time to release 1.3.15.8 this week-end. If time is enough, I'll try to also backport my work on the doc. If you want, you can download patch 1d62e33b0108a1da634c1936605235635df3081d on the git web interface ([BUG] Fix listen more of 2 couples ip:port). Regards, Willy
Re: option httpchk is reporting servers as down when they're not
Hi Thomas, On Thu, Mar 05, 2009 at 08:45:20AM -0500, Allen, Thomas wrote: Hi Jeff, The thing is that if I don't include the health check, the load balancer works fine and each server receives equal distribution. I have no idea why the servers would be reported as down but still work when unchecked. It is possible that your servers expect the Host: header to be set during the checks. There's a trick to do it right now (don't forget to escape spaces) : option httpchk GET /index.php HTTP/1.0\r\nHost:\ www.mydomain.com Also, you should check the server's logs to see why it is reporting the service as down. And as a last resort, a tcpdump of the traffic between haproxy and a failed server will show you both the request and the complete error from the server. Regards, Willy
Re: load balancer and HA
On Wed, Mar 04, 2009 at 12:12:21AM +0100, Alexander Staubo wrote: On Tue, Mar 3, 2009 at 11:44 PM, Martin Karbon martin.kar...@asbz.it wrote: just wanted to know if anyone knows an opensource solution for a so called transparent failover: what I mean with that is, I installed two machines with haproxy on it which comunicate with each other via heartbeat. If one fails the other one goes from passive to active but all sessions are lost and users have to reconnect. We use Heartbeat (http://www.keepalived.org/) for this. Heartbeat lets us set up virtual service IPs which are reassigned to another box if the box goes down. Works like a charm. Current connections are lost, but new ones go to the new IP. Note that there are two current versions of Heartbeat. There's the old 1.x series, which is simple and stable, but which has certain limitations such as only supporting two nodes, if I remember correctly. Then there's 2.x, which is much more complex and less stable. We run 2.0.7 today, and we have had some situations where the Heartbeat processes have run wild. It's been running quietly for over a year now, so recent patches may have fixed the issues. I would still recommend sticking with 1.x if at all possible. I still don't understand why people stick to heartbeat for things as simple as moving an IP address. Heartbeat is more of a clustering solution, with abilities to perform complex tasks. When it comes to just move an IP address between two machines an do nothing else, the VRRP protocol is really better. It's what is implemented in keepalived. Simple, efficient and very reliable. I've been told that ucarp was good at that too, though I've never tried it yet. While there are solutions out there that preserve connections on failover, my gut feeling is that they introduce a level of complexity and computational overhead that is necessarily puts a restraint on performance. In fact it's useless to synchronise TCP sessions between load-balancers for fast-moving connections (eg: HTTP traffic). Some people require that for long sessions (terminal server, ...) but this cannot be achieved in a standard OS, you need to synchronise every minor progress of the TCP stack with the peer. And that also prevents true randomness from being used at TCP and IP levels. It also causes trouble when some packets are lost between the peers, because they can quickly get out of sync. In practise, in order to synchronise TCP between two hosts, you need more bandwidth than that of the traffic you want to forward. There are intermediate solutions which synchronise at layer 4 only, without taking into account the data nor the sequence numbers. Those present the advantage of being able to take over a connection without too much overhead, but no layer 7 processing can be done there, and those cannot be system sockets. That's typically what you find in some firewalls or layer4 load balancers which just forward packets between two sides and maintain a vague context. Regards, Willy
Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 11:23:02AM -0800, Michael Fortson wrote: On Fri, Mar 6, 2009 at 8:43 AM, Willy Tarreau w...@1wt.eu wrote: Hi Michael, On Thu, Mar 05, 2009 at 01:04:06PM -0800, Michael Fortson wrote: I'm trying to understand why our proxied requests have a much greater chance of significant delay than non-proxied requests. The server is an 8-core (dual quad) Intel machine. Making requests directly to the nginx backend is just far more reliable. Here's a shell script output that just continuously requests a blank 0k image file from nginx directly on its own port, and spits out a timestamp if the delay isn't 0 or 1 seconds: Thu Mar 5 12:36:17 PST 2009 beginning continuous test of nginx port 8080 Thu Mar 5 12:38:06 PST 2009 Nginx Time is 2 seconds Here's the same test running through haproxy, simultaneously: Thu Mar 5 12:36:27 PST 2009 beginning continuous test of haproxy port 80 Thu Mar 5 12:39:39 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:39:48 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:39:55 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:40:03 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:40:45 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:40:48 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:40:55 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:40:58 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:41:55 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:42:01 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:42:08 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:42:29 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:42:38 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:43:05 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:43:15 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:44:08 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:44:25 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:44:30 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:44:33 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:44:39 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:44:46 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:44:54 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:45:07 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:45:16 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:45:45 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:45:54 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:45:58 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:46:05 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:46:08 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:46:32 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:46:48 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:46:53 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:46:58 PST 2009 Nginx Time is 3 seconds Thu Mar 5 12:47:40 PST 2009 Nginx Time is 3 seconds 3 seconds is typically a TCP retransmit. You have network losses somewhere from/to your haproxy. Would you happen to be running on a gigabit port connected to a 100 Mbps switch ? What type of NIC is this ? I've seen many problems with broadcom netxtreme 2 (bnx2) caused by buggy firmwares, but it seems to work fine for other people after a firmware upgrade. My sanitized haproxy config is here (mongrel backend was omitted for brevity) : http://pastie.org/408729 Are the ACLs just too expensive? Not at all. Especially in your case. To reach 3 seconds of latency, you would need hundreds of thousands of ACLs, so this is clearly unrelated to your config. Nginx is running with 4 processes, and the box shows mostly idle. ... which indicates that you aren't burning CPU cycles processing ACLs ;-) It is also possible that some TCP settings are too low for your load, but I don't know what your load is. Above a few hundreds-thousands of sessions per second, you will need to do some tuning, otherwise you can end up with similar situations. Regards, Willy Hmm. I think it is gigabit connected to 100 Mb (all Dell rack-mount servers and switches). OK so then please check with ethtool if your port is running in half or full duplex : # ethtool eth0 Most often, 100 Mbps switches are forced to 100-full without autoneg, and gig ports in front of them see them as half thinking they are hubs. The nginx backend runs on the same machine as haproxy and is referenced via 127.0.0.1 -- does that still involve a real network port? Should I try the test all on localhost to isolate it from any networking retransmits? Yes if you can do that, that would be nice. If the issue persists, we'll have to check the network stack tuning, but that's getting harder as it depends on the workload. Also, please provide the output of netstat -s. Here's a peek at the stats page after about a day of running (this should help demonstrate current loading) http://pastie.org/409632 I'm seeing something odd here. A lot of mongrel servers experience connection retries. Are they located
Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 11:49:39AM -0800, Michael Fortson wrote: Oops, looks like it's actually Gb - Gb: http://pastie.org/409653 ah nice ! Here's a netstat -s: http://pastie.org/409652 Oh there are interesting things there : - 513607 failed connection attempts = let's assume it was for dead servers - 34784881 segments retransmited = this is huge, maybe your outgoing bandwidth is limited by the provider, causing lots of drops ? - 8325393 SYN cookies sent = either you've been experiencing a SYN flood attack, or one of your listening socket's backlog is extremely small - 1235433 times the listen queue of a socket overflowed 1235433 SYNs to LISTEN sockets ignored = up to 1.2 million times some client socket experienced a drop, causing at least a 3 seconds delay to establish. The errors your scripts detect certainly account for a small part of those. - 2962458 times recovered from packet loss due to SACK data = many losses, related to second point above. Could you post the output of sysctl -a |grep ^net ? I think that your TCP syn backlog is very low. Your stats page indicate an average of about 300 sessions/s over the last 24 hours. If your external bandwidth is capped and causes drops, you can nearly saturate the default backlog of 1024 with 300 sessions/s each taking 3s to complete. If you're interested, the latest snapshot will report the number of sess/s in the stats. Haproxy and nginx are currently on the same box. Mongrels are all on a private network accessed through eth1 (public access is via eth0). OK. stats page attached (backend everything is not currently in use; it'll be a use-when-full option for fast_mongrels once we upgrade to the next haproxy). According to the stats, your avg output bandwidth is around 10 Mbps. Would this match your external link ? Regards, Willy
Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 01:00:38PM -0800, Michael Fortson wrote: Thanks Willy -- here's the sysctl -a |grep ^net output: http://pastie.org/409735 after a quick check, I see two major things : - net.ipv4.tcp_max_syn_backlog = 1024 = far too low, increase it to 10240 and check if it helps - net.netfilter.nf_conntrack_max = 265535 - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 = this proves that netfiler is indeed running on this machine and might be responsible for session drops. 265k sessions is very low for the large time_wait. It limits to about 2k sessions/s, including local connections on loopback, etc... You should then increase nf_conntrack_max and nf_conntrack_buckets to about nf_conntrack_max/16, and reduce nf_conntrack_tcp_timeout_time_wait to about 30 seconds. Our outbound cap is 400 Mb OK so I think you're still far away from that. Regards, Willy
Re: question about queue and max_conn = 1
Hi Greg, On Fri, Mar 06, 2009 at 03:54:13PM -0500, Greg Gard wrote: hi willy and all, wondering if i can expect haproxy to queue requests when max conn per backend it set to 1. running nginx haproxy mongrel/rails2.2.2. yes, it works fine and is even the recommended way of setting it for use with mongrel. There have been trouble in the past with some old versions, where a request could starve in the queue for too long. What version are you using ? all seems ok, but i am getting a few users complaining of connection problems and never see anything other than zeros in the queue columns. Have you correctly set the maxconn on the server lines ? I suspect you have changed it in the frontend instead, which would be a disaster. Could you please post your configuration ? Regards, Willy
Re: Dropped HTTP Requests
On Fri, Mar 06, 2009 at 04:55:21PM -0500, Timothy Olson wrote: I'm using HAProxy 1.3.15.7 to load-balance three Tomcat instances, and to fork requests for static content to a single Apache instance. I've found that after the initial HTML page is loaded from Tomcat, the browser's subsequent first request for a static image from Apache gets dropped (neither HAProxy nor Apache logs the request, but I can sniff it). The rest of the images after the first load fine. If I create a small, static, test HTML page on Tomcat (making the images come from a different backend), it shows the first image on the page as broken. If I put the exact same HTML page on Apache (no backend switch required), it works fine. I wonder if we have a configuration problem, or perhaps this is a bug in the way HAProxy deals with an HTTP keepalive request that spreads to a second backend? Haproxy does not support HTTP keepalive yet. However it can workaround it using option httpclose, which you should set in your defaults section. What you describe is typically what happens without the option. Regards, Willy
Re: measuring haproxy performance impact
On Fri, Mar 06, 2009 at 02:36:59PM -0800, Michael Fortson wrote: On Fri, Mar 6, 2009 at 1:46 PM, Willy Tarreau w...@1wt.eu wrote: On Fri, Mar 06, 2009 at 01:00:38PM -0800, Michael Fortson wrote: Thanks Willy -- here's the sysctl -a |grep ^net output: http://pastie.org/409735 after a quick check, I see two major things : - net.ipv4.tcp_max_syn_backlog = 1024 = far too low, increase it to 10240 and check if it helps - net.netfilter.nf_conntrack_max = 265535 - net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 = this proves that netfiler is indeed running on this machine and might be responsible for session drops. 265k sessions is very low for the large time_wait. It limits to about 2k sessions/s, including local connections on loopback, etc... You should then increase nf_conntrack_max and nf_conntrack_buckets to about nf_conntrack_max/16, and reduce nf_conntrack_tcp_timeout_time_wait to about 30 seconds. Our outbound cap is 400 Mb OK so I think you're still far away from that. Regards, Willy Hmm; I did these (John is right, netfilter is down at the moment because I dropped iptables to help troubleshoot this), What did you unload precisely ? You don't need any iptables rules for the conntrack to take effect. so I guess the syn backlog is the only net change. No difference so far -- still seeing regular 3s responses. It's weird, but I actually see better results testing mongrel than nginx; haproxy = mongrel heartbeat is more reliable than the haproxy = nginx request. mongrel is on another machine ? You might be running out of some resource on the local one making it difficult to reach accept(). Unfortunately I don't see what :-( Have you checked with dmesg that you don't have network stack errors or any type of warning ? Willy
Re: load balancer and HA
On Fri, Mar 06, 2009 at 11:47:14PM +0100, Alexander Staubo wrote: On Fri, Mar 6, 2009 at 7:48 PM, Willy Tarreau w...@1wt.eu wrote: When it comes to just move an IP address between two machines an do nothing else, the VRRP protocol is really better. It's what is implemented in keepalived. Simple, efficient and very reliable. Actually, it seems that my information is out of date, and we (that is, our IT management company that we outsource our system administration to) are in fact using Keepalived these days. I was confused by the presence of ha_logd on our boxes, which is part of the Heartbeat package; don't know what the one is doing there. So, yeah, you're right. Stick with Keepalived. :-) Ah nice! The author will be please to read this, he's subscribed to the list :-) In fact it's useless to synchronise TCP sessions between load-balancers for fast-moving connections (eg: HTTP traffic). Some people require that for long sessions (terminal server, ...) but this cannot be achieved in a standard OS, you need to synchronise every minor progress of the TCP stack with the peer. A less ambitious scheme would have the new proxy take over the client connection and retry the request with the next available backend. Will not work because the connection from the client to the proxy will have been broken during the take-over. The second proxy cannot inherit the primary one's sockets. This depends on a couple of factors: For one, it only works if nothing has yet been sent back to the client. Secondly, it assumes the request itself is repeatable without side effects. The latter, of course, is application-dependent; but following the REST principle, in a well-designed app GET requests are supposed to have no side effects, so they can be retried, whereas POST, PUT etc. cannot. Still expensive and error-prone, of course, but much more pragmatic and limited in scope. What you're talking about are idempotent HTTP requests, which are quite well documented in RFC2616. Those are important to consider because idempotent requests are the only ones a proxy may retry upon a connection error when sending a request on a keep-alive session. IIRC, HEAD, PUT, GET and DELETE were supposed to be idempotent methods. But we all know that GET is not that much when used with CGIs. Willy
Re: load balancer and HA
On Sat, Mar 07, 2009 at 12:14:44AM +0100, Alexander Staubo wrote: On Sat, Mar 7, 2009 at 12:07 AM, Willy Tarreau w...@1wt.eu wrote: A less ambitious scheme would have the new proxy take over the client connection and retry the request with the next available backend. Will not work because the connection from the client to the proxy will have been broken during the take-over. The second proxy cannot inherit the primary one's sockets. Unless you have some kind of shared-memory L4 magic like the original poster talked about, that allows taking over an existing TCP connection. in this case of course I agree. But that means kernel-level changes. What you're talking about are idempotent HTTP requests, which are quite well documented in RFC2616. That was the exact word I was looking for. I didn't know that PUT was idempotent, but the others make sense. in fact it also makes sense for PUT because you're supposed to use this method to send a file. Normally, you can send it as many times as you want, the result will not change. Willy
Re: question about queue and max_conn = 1
On Fri, Mar 06, 2009 at 10:02:03PM -0500, Greg Gard wrote: thanks for taking a look willy. let me know if there's anything else i should change. (...) defaults (...) # option httpclose This one above should not be commented out. Otherwise, client doing keepalive will artificially maintain a connection to a mongrel when they don't use it, thus preventing another client from using it. #stable sites run win2k/iis5/asp listen stable 192.168.1.5:10301 option forwardfor server stable1 192.168.1.10:10300 weight 4 check server stable2 192.168.1.11:10300 weight 6 check You can also set a maxconn on your iis sites if you think you sometimes hit their connection limit. Maybe maxconn 200 or something like this. The stats will tell you how high you go and if there are errors. The rest looks fine. Regards, Willy
[ANNOUNCE] haproxy-1.3.15.8 and 1.3.14.12
Hi All, as there were a bunch of pending fixes, I have released 1.3.15.8 and 1.3.14.12. The big bug was found and fixed by Krzysztof, it involved server state tracking which could become extremely inefficient with large numbers of servers because of a typo. Some user-visible fixes include the annoying limit of 2 ip:ports on a bind or listen line, and better error reporting during startup, about protocol binding errors and missing privileges. The source keyword did not reset the usesrc parameter in transparent proxy configurations, this is now fixed. This bug was reported by John Lauro. Another big change (in 1.3.15.8 only) is the doc update. As I could spend a few days on the doc, I took this opportunity to backport the changes to 1.3.15. This concerns everything about logging, this represents about 20 pages of documentation and examples. Normally the old doc (haproxy-en) should not be needed anymore. Please tell me if you think there are missing parts. Here's the changelog for 1.3.15.8. 1.3.14.2 has the same fixes except the doc update and the server tracking fix because the feature did not exist : - [BUG] Fix listen more of 2 couples ip:port - [DOC] remove buggy comment for use_backend - [CRITICAL] fix server state tracking: it was O(n!) instead of O(n) - [BUG] option transparent is for backend, not frontend ! - [BUG] we must not exit if protocol binding only returns a warning - [BUG] inform the user when root is expected but not set - [DOC] large doc update backported from mainline - [BUG] the source keyword must first clear optional settings - [BUG] global.tune.maxaccept must be limited even in mono-process mode - [BUG] typo in timeout error reporting : report *res and not *err Both versions are available at the usual place : http://haproxy.1wt.eu/download/1.3/src/ http://haproxy.1wt.eu/download/1.3/bin/ Note that most bugs fixed here mainly affect the start-up of the program, so if your configuration has been working fine for the last months, there is no need to hurry for an upgrade. Willy
[ANNOUNCE] haproxy-1.3.16-rc1
Hi all, Yes, this is it! 1.3.16-rc1. After almost 11 months of development! There are new features I often forget about after being used to them in the dev tree, but fortunately there are people who remind me those were not in 1.3.15 when I suggest them to use those ;-) I may forget a lot of them, but from memories and changelog : - absolute/relative redirection on ACL matching, with ability to set/clear a cookie and drop the query string - support for a domain name on the persistence cookie - support for URI hash depth and length limits - monotonic internal clock which becomes insensible to system time variations (caused by ntpdate and early buggy dual-core opterons) - permit renaming of x-forwarded-for header - tcp request content inspection with ability to detect SSL version and data length at the moment, but it's easy to add new matches. - better ACL type checking (permanent, layer4, layer7, req, rep, ...) with better error detection and reporting at load time. - show sess on the unix socket reports the whole sessions list - show errors on the unix socket reports a full capture of last invalid request and response for each proxy - support for Linux 2.6.27 TCP splicing avoiding memory copies for much improved bandwidth when used with the proper NICs. - support for interface binding on bind and source statements, allowing several interfaces to be used at once on the same LAN - support for process - proxy affinity, allowing users of nbproc 1 to specify which frontend/backend runs on which process. - measure and report of session rate per frontend, backend and server - support for session rate limiting per frontend - quite a few more ACL matching criteria - fully up-to-date configuration documentation Other more technical changes which have an impact on contributions and future evolutions : - better layering between sockets, buffers, sessions, and protocol analysers (eg: HTTP) - much improved scheduler with a separate wait queue and run queue supporting tasks renicing (only used for stats and checks at the moment) - support for configuration keyword registration within modules - maintain a global session list in order to ease debugging - make stats look more like an internal application and less like a hack - buffers now have send limits and can automatically forward any arbitrary amount of data without waking the task up - support for pipe pools (initially for splicing) - several documented regression tests - documentation of some internal parts A lot of small optimisations have been performed on the I/O subsystem and on the scheduler. The overall performance gain from 1.3.15 is around 10-15% depending on the workload, with the same configuration. A lot of cleanups have been done, but I'm conscious that there is still a lot of work to be done in this area. Last but not least, I have added a small log parser that I've written for handling large volumes of logs (gigabytes to tens of gigabytes). This one can search, count, sort and graph errors, response times and percentiles at speeds between 2 and 4 millions lines a second on a 3 GHz machine, which translates into about 1 gigabyte per second. Of course logs are better in RAM at these speeds. My primary usage is as a helpful tool for digging anomalies by hand. It is located in contrib/halog. It awfully lacks any documentation, the curious will have to read the source :-) Obviously after all these changes, there are bugs, possibly many ones. That's the reason why I released it as -rc1. I'll wait a bit (several days to a few weeks) to get some feedback and fix reported bugs. Anyway, it has been running in production on a very big site with splicing enabled for about 1 month, and is running on my various servers too. I've got reports of other people running this branch in production. I'm not particularly afraid of stability now, but it is fairly possible that some new features are buggy. I've successfully built it on Linux/x86, Solaris/sparc and OpenBSD/vax (took quite some time BTW). Source and pre-built binaries for Linux and Solaris can be found here : http://haproxy.1wt.eu/download/1.3/src/Beta/ The binaries were built in debug mode and not stripped, in case someone manages to get a core, it could help. Have fun ! Willy
Re: [ANNOUNCE] haproxy-1.3.15.8 and 1.3.14.12
On Sun, Mar 08, 2009 at 10:13:04PM -0400, Jeffrey Buchbinder wrote: I have attached a copy of the NSLU2 armv5b build (.ipk package) for the 1.3.15.8 release. If it doesn't attach properly, it's also available at: http://www.mediafire.com/file/bmhtdnzndu2/haproxy_1.3.15.8-1_armeb.ipk Thanks Jeff! Is there a URL where you regularly upload these packages, that you'd like to see ilnked to from the haproxy page ? Willy
Re: option httpchk is reporting servers as down when they're not
Hi Thomas, just replying quick, as I'm in a hurry. On Mon, Mar 09, 2009 at 04:01:29PM -0400, Allen, Thomas wrote: That, along with specifying HTTP1.1, did it, so thanks! What should I load into Host: ? It seems to work fine with www, but I'd prefer to use something I understand. Please keep in mind that none of this is yet associated with a domain, so www.mydomain.com would be inaccurate. Of course, www.mydomain.com was an example. Often web servers are fine with just www but normally you should use the same host name that your server will respond to. Sometimes you can also put the server's IP address. Some servers also accept an empty header (so just Host: and nothing else). Beginning very recently, I get a 504 Gateway Timeout for about 30% of all requests. What could be causing this? responses taking too much time. Are you sure that your timeout server is properly set ? Maybe you have put times in milliseconds there thinking they were in seconds ? More importantly, I'm not convinced that HAProxy is successfully forwarding requests to both servers, although I could wrong. As you can see on the two app instances, each reports a separate internal IP to help diagnose. It appears that only SAMP1 receives requests, although both pass health checks now. I see both servers receiving 20 sessions, so that seems fine. Among possible reasons for what you observe : - ensure you're using balance roundrobin and not any sort of hash or source-based algorithm - ensure that you have not enabled cookie stickiness, or that you close your browser before retrying. - ensure that you have option httpclose and that your browser is not simply pushing all requests in the same session tunnelled to the first server haproxy connected to. Regards, Willy
Re: option httpchk is reporting servers as down when they're not
On Mon, Mar 09, 2009 at 04:15:34PM -0400, Allen, Thomas wrote: I used the unit 'S' for my timeouts, as in clitimeout 60S contimeout 60S srvtimeout 60S Is that to be avoided? I assumed it meant seconds. OK it's just a minor problem. You have to use a lower-case s : 60s. It's stupid that the parser did not catch this mistake. I should improve it. By default, it ignores unknown chars, you you clearly had 60 ms here. BTW, there's no use in setting large contimeouts. You should usually stay with lower values such as 5-10s. Oh BTW, what version are you running ? Your stats page looks old. The time units were introduced in 1.3.14, so I hope you're at least at this level. I'm using roundrobin and adding the httpclose option. I've been using cookie stickiness (which will be important for this website), but after disabling this stickiness, I get the same results. I tried clearing out the server cookie before and opening the page in multiple browsers, and still got these results. Then it is possible that haproxy could not manage to connect to your server in 60ms, then immediately retried on the other one, and sticked to that one. Regards, Willy
Re: option httpchk is reporting servers as down when they're not
Hi Thomas, On Mon, Mar 09, 2009 at 05:20:49PM -0400, Allen, Thomas wrote: Hi Willy, Hm, changing to 60s for each gave me 100% 504 errors, I removed all three. Bad idea, I know, but at least it works then. then use 6, that's the old way of doing it :-) I'm running 1.2.18 because the HAProxy homepage calls it the Latest version. Ah OK, version 1.2 did not have the time units. Well, in fact it's not exactly marked as the only latest version, it's the latest version of branch 1.2, and 1.2 is the only one not tainted by development I admit. I've removed all cookies from this IP, cleared my cache, and still it seems that only one server is being hit. But the stats page reports an equal distribution, so it's anybody's guess. What would be a simple way to log the distribution? I find it difficult to determine this even in debug mode (I'm running the proxy in daemon mode, of course). it is in the logs, you have the server's name (assuming you're logging with option httplog). Something is possible if you're playing with only once client. If the number of objects on a page is a multiple of the number of servers and you're in round-robin mode, then each time you'll fetch a page, you'll alternatively fetch objects from both servers and come back to the first one for the next click. Of course that does not happen as soon as you have at least another client. And since I saw 20 sessions on your stats after my access, I'm tempted to think that it could be related. Regards, Willy
Re: HaProxy ACL (fwd) - access control
Hi Krzysztof, On Mon, Mar 09, 2009 at 01:13:31PM +0100, Krzysztof Oledzki wrote: Hi Willy, First, please excuse that it took me nearly one moth to replay to your letter, shame on me. :( no problem, I know we're all facing the same issues trying to find time :-) In fact, I think that having use_backend evaluated last makes sense since it's really how it's supposed to work. But I agree that for the poor guy writing the rules, it would be easier to be able to put it before other rules. In fact, there's a solution consisting in using allow to escape the rules, but it requires that the rule is duplicated for the use_backend one, which is not always very convenient. So, after so long time of thinking (1 month, right? ;) ) I believe we should keep it as-is and prevent duplication by simply writing a best practices chapter in the docs and suggesting to use a dedicated backend to handle redirects: backend pax_redirects redirect prefix https://pay.xxx.bg if pay_xxx (...) Indeed, it may be one way of solving difficult ordering for complex setups, I like the idea. Maybe we should see the backend more as the place where most of the processing should be performed once it has been selected by the frontend. This is already more or less the case after all. backend XX use_backend www_php4 if payment use_backend pax_redirects if pay_xxx default_backend www_php4 We may also add a warinig or even disallow to use use_backend and redirect in the same proxy. I believe it is the best solution - we already have everything what is needed, let's use it. I want to keep the ability to redirect from the frontend, as I already have several configs making use of it. It's useful to catch unexpected conditions, such as : frontend XXX acl local_site_down nbsrv(bk) lt 3 redirect prefix http://backup-site.domain.com if local_site_down use_backend bk if ... See ? However, I think that the root cause of the problem is having redirect *after* use_backend. This is indeed misleading and should cause a warning to be emitted. I'll try to enumerate all cases of mis-placed conditions which can lead to unexpected behaviours, and report warnings in such cases. The example above should be fine, but the one below should send a warning : frontend XXX acl local_site_down nbsrv(bk) lt 3 use_backend bk if ... redirect prefix http://backup-site.domain.com if local_site_down Don't you think we should create a new set-backend keyword to merge it with the whole list, and let use_backend slowly die (for instance, we mark it deprecated in version 1.4 with a big warning) ? This means we would have : block allow|deny|redirect|tarpit|set-backend use_backend I'm not sure. I like the idea of two step request processing: - first decide if we need to allow|deny|tarpit a request - then decide which backend to use (frontend) or which server/redirect (backend) Yes, after one month of thinking too :-) I think it is important to separate the authorisation (allow/deny/...) from the switching rule. And in fact, that's what was on the diagram I sent a month ago. It's in /download/1.3/doc/acl.pdf on the site, in case you don't have it in mind. We may add set-backend directive but I think use_backend will be still useful, so keeping both could be hard to maintain in the long term. Agreed. I even remember now that I wanted use_backend in the frontend and use_server in the backend, both processed after allow/deny/... Another idea would consist in splitting access rules from traffic management rules. I mean, allow, deny, tarpit grant or deny access. Even the tarpit could be considered as an extended deny. Then we have use_backend, redirect, and maybe later things about QoS, logging, etc... which would make sense in a separate list. Yes. This is definitely the way I think we would like to go currently! I think it too. Since we can't ask the user to remember which keyword works in which group, the syntax should make it possible to explicitly state where the processing ought to happen. That's the principle of the http-req in etc... This is good, however I'm not sure about this http-req in. First: why we need to explicitly state that it is a http processing? Because there would be multiple levels. First, we can already match at the TCP level even in HTTP mode. Second, once there will be multiple layers (eg: HTTP on top of SSL on top of TCP), we will want to tell where the rule applies. However, I absolutely want that these boring keywords are not needed for the normal case. Right now we know that most of the rules will be HTTP, so it makes sense not to indicate anything by default for HTTP. Ditto if we implement SMTP, it should not be needed to explitictly state we want to process SMTP. The same syntax should be used for different backends (tcp, http, smtp) if that is possible. The default syntax should apply the expected behaviour depending on the
Re: selinux policy for haproxy
Hi, On Tue, Mar 17, 2009 at 09:26:43PM +0100, Jan-Frode Myklebust wrote: Here's an selinux policy for haproxy. The patch is built and lightly tested with haproxy-1.3.15.7-1.fc10.i386 on Fedora9, and haproxy-1.2.18 on RHEL5. believe it or not, I've never experimented at all with selinux. However, reading your config files, it looks appealing. I'll merge your work into 1.3.16, as there's already a contrib dir with various things there. Thanks! Willy
Re: The gap between ``Total'' and ``LbTot'' in stats page
On Thu, Mar 19, 2009 at 11:14:48PM -0700, James Satterfield wrote: I just recently upgraded my LBs to 1.3.15.8 from 1.2.something and noticed those stats. I was wondering about them as well. In my setup those numbers only seem to differ where I'm using cookies for persistence. Normally the difference is caused by persistence cookies. But there is another case too. If all servers are full and the request is queued in the backend's queue, then one server will pick the requests once available, regardless of the LB algorithm, so lbtot will not be incremented. Willy
Re: Can Haproxy work as a TCP-multiplexer i.e. combine requests into one connection to a server?
Hi Malcolm, On Thu, Mar 19, 2009 at 11:42:31AM +, Malcolm Turnbull wrote: Possibly a stupid question but: Can Haproxy work as a TCP-multiplexer i.e. combine requests into one connection to a server? Or would that be related to using keep-alive? It requires that we get keep-alive to work first. What you want is the most complex feature. It requires connection pools, keep-alive ability to continuously check the pool, and user-connection affinity to avoid moving users across multiple processes on the server. Right now without keep-alive and using queueing, you can get very close to that behaviour, with the only difference that connections are closed and opened between haproxy and the servers (which is not expensive since both should be very close to each other). Regards, Willy
[ANNOUNCE] haproxy-1.3.16 (Stable)
Hi all, now that's it for real. 1.3.16 is out. And with it, I'm declaring 1.3 as the new stable branch. That means that only fixes and minor feature enhancements may be merged in future 1.3 versions. New development will take place in 1.4 or maybe 2.0, I'll see. Anyway I'd like to adopt a new versioning scheme for development releases. We would only have 2 digits to remember for major releases, and either use a -rcX suffix for development, or a .X for stable versions. One of the goals is also to shorten the timeframe between two consecutive versions. This should be easier now that the layers have been split. There were a few last-minute fixes and improvements since rc1 : - [BUG] stream_sock: write timeout must be updated when forwarding ! - [BUILD] Fixed Makefile for linking pcre - [CONTRIB] selinux policy for haproxy - [MINOR] show errors: encode backslash as well as non-ascii characters - [MINOR] cfgparse: some cleanups in the consistency checks - [MINOR] cfgparse: set backends to balance roundrobin by default - [MINOR] tcp-inspect: permit the use of no-delay inspection - [MEDIUM] reverse internal proxy declaration order to match configuration - [CLEANUP] config: catch and report some possibly wrong rule ordering - [BUG] connect timeout is in the stream interface, not the buffer - [BUG] session: errors were not reported in termination flags in TCP mode - [MINOR] tcp_request: let the caller take care of errors and timeouts - [CLEANUP] http: remove some commented out obsolete code in process_response - [MINOR] update ebtree to version 4.1 - [MEDIUM] scheduler: get rid of the 4 trees thanks and use ebtree v4.1 - [BUG] sched: don't leave 3 lasts tasks unprocessed when niced tasks are present - [BUG] scheduler: fix improper handling of duplicates __task_queue() - [MINOR] sched: permit a task to stay up between calls - [MINOR] task: keep a task count and clean up task creators - [MINOR] stats: report number of tasks (active and running) - [BUG] server check intervals must not be null - [OPTIM] stream_sock: don't retry to read after a large read - [OPTIM] buffer: new BF_READ_DONTWAIT flag reduces EAGAIN rates - [MEDIUM] session: don't resync FSMs on non-interesting changes - [BUG] check for global.maxconn before doing accept() - [OPTIM] sepoll: do not re-check whole list upon accepts The source and the full changelog are available here : http://haproxy.1wt.eu/download/1.3/src/ I've also uploaded binaries for linux-i586 and solaris-sparc, as usual. I'll do my best to avoid 4-digit versions in the future. That means that next stable version will be 1.3.17. I hope we'll not have too many bugs to fix. The doc needs a little bit of refreshing though. Users with extreme bandwidth may want to give this new version a try. Some performance improvements of up to 10-15% have been observed compared to 1.3.15 in some circumstances. Recent linux kernels supporting splice() will help even further. Users with critical availability requirements should wait a bit for others to report bugs if any, and stick to 1.3.15.X for now. Have fun ! Willy
Re: option httpchk is reporting servers as down when they're not
Hi Thomas, On Wed, Mar 25, 2009 at 12:57:41PM -0400, Allen, Thomas wrote: Hi Willy, We now have HAProxy running over our freshly released website: http://www.infrastructurereportcard.org/ thanks for the heads up ! Thanks for this great piece of software and all the help! Only two connection errors in 3 connections thus far, one of which was due to me cancelling a long-running page load in the admin. fine ! anyway, you should expect to get some error requests due to such activities from your clients. In general, various sites report request error rates ranging from 0.1 to 0.6%, so what you observe is almost perfect :-) Cheers, Willy
Re: some specfile fixes
Hi Jan-Frode, On Thu, Mar 26, 2009 at 03:45:53PM +0100, Jan-Frode Myklebust wrote: And here's the patch that does everything I want to do to the specfile... Sorry about the noise. Thanks for your work on this. I have no way to test that the specfiles work, and I only update a few fields in them at each release. So it's really a good thing that someone like you checks them and proposes fixes. It might also be nice to rename haproxy-small.spec to haproxy.spec-small, then haproxy can be built directly from the tar.gz by: rpmbuild -ta haproxy-1.3.16.tar.gz When there are two .spec-files, rpmbuild will concatinate both, and fail. OK fine, I wasn't aware of this. I'm even thinking that we can safely remove the -small variant right now. Cheers, Willy
Re: High Cpu usage : fixed
Guys, I've released 1.3.17 which fixes the high CPU usage. Bart Bobrowski helped me a lot tracking this bug that I could not reproduce here. It was caused by a timeout being re-armed just after a socket is being closed. Regards, Willy
Re: cpu 100% at strange times, epoll_wait and gettimeofday gets called too often
Hi, On Fri, Mar 27, 2009 at 01:09:30PM +0100, Remco Verhoef wrote: Hi, We're experiencing strange behaviour of haproxy-1.3.15.8 and haproxy-1.3.16, at frequent times it will use 100% cpu. It appears that it is wait_time is not used. I've used both poll and epoll, same behaviour. The kernel is 2.6.26-1-686 #1 SMP. This problem is specific to 1.3.16, it has been fixed in 1.3.17. Regards, Willy
Re: balance source based on a X-Forwarded-For
On Sun, Mar 29, 2009 at 07:46:05PM +0200, benoit wrote: Jeffrey 'jf' Lim a écrit : On Wed, Mar 25, 2009 at 8:02 PM, Benoit maver...@maverick.eu.org wrote: diff -ru haproxy-1.3.15.7/doc/configuration.txt haproxy-1.3.15.7-cur/doc/configuration.txt --- haproxy-1.3.15.7/doc/configuration.txt 2008-12-04 11:29:13.0 +0100 +++ haproxy-1.3.15.7-cur/doc/configuration.txt 2009-02-24 16:17:19.0 +0100 @@ -788,6 +788,19 @@ balance url_param param [check_post [max_wait]] + header The Http Header specified in argument will be looked up in + each HTTP request. + + With the Host header name, an optionnal use_domain_only + parameter is available, for reducing the hash algorithm to + the main domain part, eg for haproxy.1wt.eu, only 1wt + will be taken into consideration. + I'm not so sure how balancing based on a hash of the Host header would be useful. How would this be useful? I would see an application for balancing on perhaps other headers (like xff as mentioned), but for Host... I dunno... (so basically what I'm saying is, is the code for the 'use_domain_only' bit useful? can it be left out?) -jf Well, at least it's usefull for our business, it's why it's here :) It's aimed at a shared hosting environmen, with multiple host entry pointing to differents web sites. The objective was to maximise caching efficiency and response times by 'sticking' a site to a backend server. The use_domain_only was here to reduce the hashing to the most significative part and help with site using many tld for language selection. BTW Benoit, be careful, you left some fprintf() in your patch. Regards, Willy
Re: balance source based on a X-Forwarded-For
On Sun, Mar 29, 2009 at 12:31:27PM -0700, John L. Singleton wrote: I'm a little mystified as to the usefulness of this as well. I mean, what does hashing the domain name solve that just balancing back to a bunch of Apache instances with virtual hosting turned on doesn't? Are you saying that you have domains like en.example.com, fr.example.com and you want them all to be sticky to the same backend server when they balance? If that's the case, I could see that being useful if the site in question were doing some sort of expensive per-user asset generation that was being cached on the server. Is this what you are talking about? There are proxies which can do prefetching, and in this case, it's desirable that all requests for a same domain name pass through the same cache. Regards, Willy
Re: balance source based on a X-Forwarded-For
On Sun, Mar 29, 2009 at 10:17:39PM +0200, benoit wrote: BTW Benoit, be careful, you left some fprintf() in your patch. Regards, Willy Heck yes, i'll have to check on this thanks. You're welcome. Btw, why isn't this list set with a default reply to the list ? Because I hate it when responses to my messages only go to the list. I can't permanently check all the lists I'm subscribed to, and I find it a lot better that people involved in a discussion are kept CCed. However, IIRC the list does not change the reply-to header, so anyone is free to set it to the list and will not get direct copies when people hit reply-to-all. Willy
Re: x-client with SMTP, revisited
Hi Eric, On Sun, Mar 29, 2009 at 09:06:40PM -0700, Eric Schwab wrote: We would like to use x-client with the SMTP protocol with haproxy, as a means to pass along some basic data to the backend SMTP servers. We looked into this a month or two ago and Willy mentioned that this would be substantially easier once 1.3.16/17 was released. ... and now I confirm this ;-) I know that this could be implemented previously with HTTP - is this now viable with SMTP? If not, does anybody have the time / inclination / ability to add this capability? I would be happy to discuss this off-line as appropriate. I think it is plainly doable now that we can intercept requests and responses for analysis. In fact, I'm even wondering whether we should also support a generic protocol which would work in any mode by prepending a specific line before the payload. This could be used between multiple proxies to make the last one work in transparent mode and connect to the server using the original client's IP. I have some vague memories of some oddities in the XCLIENT protocol, but right now I don't remember which ones. I believe it was the need to let the EHLO line pass first, then check the response, then insert XCLIENT and pass the rest. If an option is provided to force XCLIENT on some servers, we could also simply prepend this line in the request before ever doing the EHLO. I'm not sure whether it works, but the README does not contradict this option. Now, as to who will develop this... I think it's a matter of several hours of work, maybe a full day. I really don't have time right now for this, but maybe there are some persons on the list who can do. Regards, Willy
Re: [RFC] development model for future haproxy versions
On Tue, Mar 31, 2009 at 10:57:26AM +0800, Jeffrey 'jf' Lim wrote: On Tue, Mar 31, 2009 at 5:06 AM, Willy Tarreau w...@1wt.eu wrote: Hi all! Now that the storm of horror stories has gone with release of 1.3.17, I'd like to explain what I'm planning to do for future versions of haproxy. Right now there are a few issues with the development process and the version numbering in general : snip 4) encourage people to work on a next feature set with their own tree. Since haproxy has migrated to use GIT for version control, it has really changed my life, and made it a lot more convenient for some contributors to maintain their own patchsets. yo, I hadnt noticed! What's the clone url, though? All links I've tried only get me to a gitweb interface. It's in http://git.1wt.eu/git/haproxy-1.3.git/ Willy
Re: Forcing SSL encryption (a.k.a. 'redirect' keyword not recognised)
On Wed, Apr 01, 2009 at 12:57:36PM +0300, John Doe wrote: I am perplexed as HAproxy 1.3.15.8 doesn't recognise the 'redirect' keyword. And it's right because 1.3.15.8 does not have it. This was implemented in 1.3.16 (use 1.3.17 instead, 1.3.16 is buggy). Also, be careful, there's a small mistake in your config : redirect https:// if !LOCALHOST You should use prefix (or location) before https://; above, and you must put the host name which corresponds to the whole part before the slash (eg: https://www.domain.com;). However the documentation and multiple examples found in the net show that 'redirect' is a valid keyword. I also tried 'redir'. Removing 'if !LOCALHOST' doesn't help either (the idea is to redirect the browser to https:// if the connection doesn't come from localhost i.e. Stunnel). In fact, the documentation on the haproxy site is always the latest one. You should always refer to the doc provided with your version. And the doc from 1.3.15.8 does not mention this keyword (just checked ;-)). So what I am trying to do is to force http-connections to use SSL. I have Stunnel listening 10.0.0.220:443 and Stunnel connects to 10.0.0.220:80 (i.e. HAproxy). OK, this is the right way to do this. Just download 1.3.17 and try again (after fixing your config with the tweak above) and it should be OK. Regards, Willy
Re: patch: nested acl evaluation
Hi Jeffrey, On Thu, Apr 02, 2009 at 02:23:44PM +0800, Jeffrey 'jf' Lim wrote: (...) Ok perhaps combinatorial was not the word that i should have used, but... I hope you can see the point/s with the explanation that i gave. The head acl only gets checked once - thereafter which it goes into the body (you could treat it like the standard if statement in any programming language) to do the evaluation. Without this, the head acl has to be evaluated every time for every combination of head acl + sub acl for you can go into each 'use_backend'. So eg: use_backend b1 if host_www.domain.com path1 use_backend b2 if host_www.domain.com path2 use_backend b3 if host_www.domain.com path3 use_backend b4 if host_www.domain path4 or host_www.domain path5 ... use_backend b10 if host_www.d2.com path1 use_backend b11 if host_www.d2.com path2 ... So does it make sense to cut out all of this? Of course it does. Faster acl processing (you dont repeat having to process 'host_www.domain.com' every time!), much neater (and maintainable) config file. == The following is a refined patch (did i mention i was serious about this? just to allay the April Fool's thing) One caveat to take note: at this point in time I'm not going to cater for nested 'for_acl's (I think if u really need this, you probably have bigger problems). One level of 'for_acl' should be all you need... (why is this deja vu) (Willy, I'll work out the documentation later once/if u give the go ahead for this, thanks) I understand the usefulness of your proposal, but I really dislike the implementation for at least two reasons : - right now the config works on a line basis. There are sections, but no block. This is the first statement which introduces the notion of a block, with a beginning and a mandatory end. I don't like that for the same reason I don't like config languages with braces. It's harder to debug and maintain. - only the use_backend rules are covered. So the for_acl keyword is not suited since in fact you're just processing a use_backend list. Also, that leaves me with two new questions : what could be done for other rules ? Do we need to duplicate all the code, or can we factor the same block out and use it everywhere ? Second question : maybe you need this for use_backend only, which marks it as a special case which might be handled even more easily ? With all that in mind, I'm wondering if what you want is not just sort of a massive remapping feature. I see that you have arbitrarily factored on the Host part and you are selecting the backend based on the path part. Is this always what you do ? Are there any other cases ? You still have to declare ACLs for each path. Maybe it would be better to simply support something like this : use_backend back1 if acl1 map_path_to_backend if Host1 use BK1 on ^/img use BK2 on ^/js use BK3 on ^/static/.*\.{jpg|gif|png} ... use_backend backN if aclN See ? No more ACLs on the path and direct mapping between path and backends for a given host. If you think you still need ACLs but just for use_backend rules, maybe we should just use a slightly different keyword : simply not repeat use_backend and use select instead, which does not appear in the normal config section : use_backend bk1 if acl1 use_backend_block if Host1 select bk1 if path1 or path2 select bk2 if path3 select bk3 if path4 src1 ... use_backend bkN if aclN That one would present the advantage of being more intuitive and would integrate better with other rules. Also, it would make it more intuitive how to write such blocks for other rule sets, and is very close to what you've already done. And that does not require any end tag since the keyword used in the block (select above) is not present in the containing block. Maybe with a little bit more thinking we could come up with something more generic like this : ... call use_backend if acl1 with bk1 if path1 or path2 with bk2 if ... ... call redirect if acl1 with prefix http://here.local if path2 with location / if path3 ... See ? That would basically add an iterator around any type of ACL rule, providing us with the ability to only specify the verb on the first line, and all the args in the list, and this would make a lot of sense. I don't like the call nor with keywords, it's just an illustration. I'd like to get opinions for other massive ACL users, because any minor change might have significant impact to many of us, and we must keep in mind that what we develop has to be maintained over the years, sometimes conflicting with further features. Best regards, Willy
Re: patch: nested acl evaluation
On Sat, Apr 04, 2009 at 10:20:23AM +0800, Jeffrey 'jf' Lim wrote: OK maybe use is OK in fact, considering the alternatives. :) some proposals for the keywords: for/use condition/use cond/use (cond/use seems the best compromise - short, but understandable enough) what would you think about do/use ? If we extend the system, we'll have to associate parameters with a condition. But since the entry point is the switching rule here, maybe we'll end up with something very close to what you have implemented in your patch, in the end, and it's just that we want to make it more generic to use the two conditions layers in any ruleset. I would guess so! Even the redirect rules and 'block' rules look pretty similar... :) yes, and maybe this is what we should try to improve : converge towards a generic condition matching/execution system, to which we pass the action and the args. That way, we just have to run it over redirect_rules or switching_rules always using the same code. Willy
Re: tcp proxy
On Sat, Apr 04, 2009 at 11:43:38AM -0300, Nicolas Cohen wrote: Hi Willy, It seems right to implement it. I'll review this with the team and let you know once we have an available patch. Nice, thanks! Willy
Re: Delay incoming tcp connections
Hi, On Sat, Apr 04, 2009 at 07:46:28PM +0400, Alexey wrote: Hi, I saw post about delaying incoming smtp connections via haproxy. Looks like I need transparent proxy for saving source ip addresses, but it requires TPROXY in linux kernel. yes it does. I need to patch kernel + iptables for make it working? Yes. Malcolm Turnbull posted a howto on the subject. What difference between squid and haproxy transparenting (squid requires only -j REDIRECT support in kernel) ? - Squid does only HTTP, not TCP - the -j REDIRECT method only works for destination, but does not help binding to the client's source address. Is there any simpiest ways to delay incoming tcp connections without changin source address? Not that I'm aware of. This is also called delayed binding and at least requires an equipment which is able to translate TCP sequence numbers. Doing that in a proxy is the simplest and most reliable method to do this, but this requires a very recent linux kernel (= 2.6.28) or to apply the TProxy patch. Regards, Willy
Re: Forcing SSL encryption (a.k.a. 'redirect' keyword not recognised)
Hi, On Tue, Apr 07, 2009 at 11:05:16AM +0300, John Doe wrote: Hi For some reason acl stunnel src 10.0.0.0/8 doesn't seem to work (with version 1.3.15.8). That's not expected at all. Are you sure you were not mixing up with another problem ? Could you please retest with 1.3.17 ? I did the re-test using 1.3.17 and I can confirm that the following configuration doesn't function as expected (i.e. the traffic is not redirected into https): acl stunnel src 10.0.0.0/8 redirect prefix https://10.0.0.220 unless stunnel but this works OK: acl stunnel src 10.0.0.220/32 redirect prefix https://10.0.0.220 unless stunnel No other modifications were made. Hope you can sort it out even though it is no biggie for me. Well, I have tried here and it works as expected for me with /8 : if the source is any address in 10.0.0.0/8, it is not redirected, otherwise it is. Maybe your clients are local and in 10.0.0.0/8 too ? Willy
Re: httpchk with apache tomcat
On Tue, Apr 07, 2009 at 12:34:37PM -0400, Jill Rochelle wrote: I have a unique, maybe not unique, situation. The flow is like this In on apache 80 haproxy on 85 to find servers server is tomcat server but port is which goes back to apache then apache uses mod_jk to forward to tomcat. My problem: How do I use httpchk for the tomcat servers? Or can I? I'm sorry, I fail to see the issue here. Why couldn't you simply enable httpchk towards the tomcat servers ? From my understanding, haproxy listens on port 85 and forwards to multiple instances on port . Just enable httpchk with a test URL to which tomcat must respond to, and you're done. Willy
Re: Using acls to check if # connections less than number of up servers
On Tue, Apr 07, 2009 at 02:58:27PM -0700, Karl Pietri wrote: Hey all I'm trying to use Acls to have a priority queue of servers for a special ip/port and fail over to the regular section and i'm wondering if its possible to have an acl that would check if dst_conn gt nbsrv(backend); the code works fine as it is, but if one server is down in the priority farm then the check to send it to the bigger farm doesn't pass properly as its checking for 3 not 2. Any help would be appreciated. The section of code is copied below showing what i currently have. I think that what you want can be achieved using connslots. Your frontend config would look approximately like this : frontend internal_api_rails_farm 192.168.1.2:80 mode http option forwardfor acl priority_full connslots(priority_rails_farm) eq 0 acl priority_down nbsrv(priority_rails_farm) lt 1 use_backend rails_farm if priority_full or priority_down default_backend priority_rails_farm You need 1.3.17 to use connslots though. Regards, Willy
Re: [PATCH] Added 'option inject' for mode 'tcp'
Hi Maik, On Fri, Apr 17, 2009 at 04:29:11AM +0200, Maik Broemme wrote: Hi, attached is a patch which adds a new option to HAProxy called 'inject' for the mode 'tcp'. In the current version of this patch you can only add data at the beginning of the session. I think this is very useful - at least for me it is. :)) There are interesting concepts here it seems, eventhough some parts still seem a bit confusing to me. I'll review that in depth this week-end. Could you give us a few hints about what type of real-world usage you make of such a feature ? I certainly can understand the ability to insert a client's IP in TCP data, but I'm not sure what purpose returning data to the client will serve. Also, I don't think this should be set as an option. Options tend to be just flags, eventhough some of them are slightly more. Once we figure out a more general usage, we will probably find a new keyword family for such a feature ;-) Regards, Willy
Re: Simple TCP with backup config
Hi Michael, On Fri, Apr 17, 2009 at 04:47:38PM +0100, Michael Miller wrote: Hi, I am doing some intial testing with HAProxy and have come across a problem I don't seem to be able to resolve. A summary of what I am initially trying to achieve follows. I am trying to use HAProxy to provide a VIP that passes on a tcp (SMTP as it happens) stream to a backend server. If that server is down, I would like the connection forwarded to a backup server. Doing some testing and watching the status page reveals that if both servers are configured as normal, rather than backup, servers the tcp connection is rerouted when the initial attempt to connect fails. However, when one server is configured as backup, the connection never gets to the backup server. The config I am using is: global log 127.0.0.1 local0 log 127.0.0.1 local1 notice maxconn 4096 pidfile /var/run/haproxy.pid ##chroot /usr/share/haproxy user haproxy group haproxy daemon #debug #quiet spread-checks 10 defaults default_settings log global modehttp option httplog option dontlognull option abortonclose ## option allbackups option clitcpka option srvtcpka option forwardfor retries 10 option redispatch maxconn 2000 backlog 256 timeout connect 5000 timeout client 5 timeout server 1 listen www-health bind 0.0.0.0:8080 mode http monitor-uri /haproxy stats enable stats uri /stats listen smtp log global bind 0.0.0.0:25 mode tcp #option smtpchk HELO haproxy.local option tcplog balance roundrobin rate-limit sessions 10 timeout connect 1 timeout client 6 timeout server 6 server smtp01 10.1.1.5:25 server smtp02 10.1.1.6:25 backup Note that I am trying to avoid using active health checks and am hoping that the tcp connection failure when connecting to the primary will fall back to the backup server. This works as expected when both servers are configured as active rather than backup servers. Looking at the status page when one is down, the 10 retries against the down server are shown and then the tcp connection succeeds to the second server. Is this a bug that the tcp connection is not forwarded to the backup server, or am I missing some obvious configuration settings? Neither :-) It is designed to work like this though I agree that it is not necessarily obvious. As documented, a backup server is only activated when all other servers are down. Here, since you are not checking the active server, it is never down. That's as simple as that. May I ask why you don't want to enable health-checks ? That's a rather strange choice, as it means you don't care about the server's status but still hope that a failure will be detected fast enough to hope a redispatch would work. You might destroy a lot of traffic acting like this. Also, there is an smtpchk option which is able to check that your server responds on port 25. You should really use it. You don't necessarily need to check every second, for SMTP generally, checking once a minute may be enough for small setups. Regards, Willy
Re: haproxy 1.3.14.2 bad request outage
Hi, On Fri, Apr 24, 2009 at 09:36:34AM +0200, Jean-Baptiste Quenot wrote: Hi there, This morning I noticed interesting problems regarding haproxy (1.3.14.2 here, yes I know archeology might be involved, I must upgrade). I have to say that we had a blackout during a few hours this night on our site, and servers restarted when power supply came back. This may be one of the reasons, network problems might also be involved. The problem only happened on one of the remote backends (WAN setup). An haproxy reload fixed the problem. FWIW note that the related URLs were protected with Basic authentication. Here are the offending logs (with name of host and frontend changed): Apr 24 07:38:27 myhost haproxy[4499]: 127.0.0.1:51565 [24/Apr/2009:07:38:27.139] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 206/206/0/0 0/0 {} BADREQ Apr 24 07:38:49 myhost haproxy[4499]: 127.0.0.1:44142 [24/Apr/2009:07:38:49.177] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 206/206/0/0 0/0 {} BADREQ Apr 24 07:42:07 myhost haproxy[4499]: 127.0.0.1:48588 [24/Apr/2009:07:42:07.351] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 204/204/0/0 0/0 {} BADREQ Apr 24 07:42:39 myhost haproxy[4499]: 127.0.0.1:49220 [24/Apr/2009:07:42:39.231] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 205/205/0/0 0/0 {} BADREQ Apr 24 08:06:12 myhost haproxy[4499]: 127.0.0.1:55251 [24/Apr/2009:08:06:12.106] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 202/202/0/0 0/0 {} BADREQ Apr 24 08:14:48 myhost haproxy[4499]: 127.0.0.1:59733 [24/Apr/2009:08:14:48.979] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 201/201/0/0 0/0 {} BADREQ Apr 24 08:30:50 myhost haproxy[4499]: 127.0.0.1:50823 [24/Apr/2009:08:30:50.775] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 205/205/0/0 0/0 {} BADREQ Apr 24 08:30:59 myhost haproxy[4499]: 127.0.0.1:51282 [24/Apr/2009:08:30:59.542] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 205/205/0/0 0/0 {} BADREQ Apr 24 08:31:29 myhost haproxy[4499]: 127.0.0.1:52191 [24/Apr/2009:08:31:29.621] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 199/199/0/0 0/0 {} BADREQ Apr 24 08:31:32 myhost haproxy[4499]: 127.0.0.1:52423 [24/Apr/2009:08:31:32.993] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 200/200/0/0 0/0 {} BADREQ Apr 24 08:31:34 myhost haproxy[4499]: 127.0.0.1:52628 [24/Apr/2009:08:31:34.124] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 199/199/0/0 0/0 {} BADREQ Apr 24 08:32:58 myhost haproxy[4499]: 127.0.0.1:55046 [24/Apr/2009:08:32:58.916] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 201/201/0/0 0/0 {} BADREQ Apr 24 08:33:01 myhost haproxy[4499]: 127.0.0.1:55351 [24/Apr/2009:08:33:01.277] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 203/203/0/0 0/0 {} BADREQ Apr 24 08:33:01 myhost haproxy[4499]: 127.0.0.1:55546 [24/Apr/2009:08:33:01.558] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 201/201/0/0 0/0 {} BADREQ Apr 24 08:33:01 myhost haproxy[4499]: 127.0.0.1:55740 [24/Apr/2009:08:33:01.641] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 200/200/0/0 0/0 {} BADREQ Apr 24 08:33:08 myhost haproxy[4499]: 127.0.0.1:56143 [24/Apr/2009:08:33:08.816] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 202/202/0/0 0/0 {} BADREQ Apr 24 08:33:09 myhost haproxy[4499]: 127.0.0.1:56354 [24/Apr/2009:08:33:09.096] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 204/204/0/0 0/0 {} BADREQ Apr 24 08:33:12 myhost haproxy[4499]: 127.0.0.1:56607 [24/Apr/2009:08:33:12.120] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 205/205/0/0 0/0 {} BADREQ Apr 24 08:33:32 myhost haproxy[4499]: 127.0.0.1:57102 [24/Apr/2009:08:33:32.116] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 205/205/0/0 0/0 {} BADREQ Apr 24 08:33:41 myhost haproxy[4499]: 127.0.0.1:47058 [24/Apr/2009:08:33:41.125] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 207/207/0/0 0/0 {} BADREQ Apr 24 08:33:55 myhost haproxy[4499]: 127.0.0.1:47752 [24/Apr/2009:08:33:55.918] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 202/202/0/0 0/0 {} BADREQ Apr 24 08:34:22 myhost haproxy[4499]: 127.0.0.1:48586 [24/Apr/2009:08:34:22.053] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 203/203/0/0 0/0 {} BADREQ Apr 24 08:35:31 myhost haproxy[4499]: 127.0.0.1:50471 [24/Apr/2009:08:35:31.359] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 204/204/0/0 0/0 {} BADREQ Apr 24 08:36:47 myhost haproxy[4499]: 127.0.0.1:52694 [24/Apr/2009:08:36:47.769] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 201/201/0/0 0/0 {} BADREQ Apr 24 08:36:49 myhost haproxy[4499]: 127.0.0.1:52910 [24/Apr/2009:08:36:49.838] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 198/198/0/0 0/0 {} BADREQ Apr 24 08:37:19 myhost haproxy[4499]: 127.0.0.1:53877 [24/Apr/2009:08:37:19.243] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 199/199/0/0 0/0 {} BADREQ Apr 24 08:37:19 myhost haproxy[4499]: 127.0.0.1:54083 [24/Apr/2009:08:37:19.318] abc abc/NOSRV -1/-1/-1/-1/0 400 187 - - PR-- 397/397/0/0 0/0 {} BADREQ Apr 24 08:42:53 myhost haproxy[4499]: 127.0.0.1:55385 [24/Apr/2009:08:42:53.186] abc abc/NOSRV -1/-1/-1/-1/0 400
Re: 1.3.17 in TCP mode sees dead servers (but they're not)
On Mon, May 04, 2009 at 11:47:10AM +0200, Nicolas MONNET wrote: I'm experiencing a problem since updating to 1.3.17, whereby checks periodically see a backend service as down, one at a time, but for all 3 checks; and it picks right up again on the next check. Not sure what info I could get you. generally this is caused by overloaded servers which can't manage to respond at all due to the amount of work they have in their backlog queue. Please add maxconn 50 for instance on each server line to see if it changes anything. Also, what type of server are you using ? For instance, mongrel only accepts one request at a time and will not respond to any health-check while it's processing a long request, so with it you need maxconn 1. One question: couldn't it be possible to have redispatch work for TCP connections? it does. However you have one particular config, you're using balance source with your TCP config. That means that when you redispatch the connection, you apply the LB algorithm again and you can only get back to the same server if it is still seen as up, because the size of the farm has not changed. There are two workarounds for this : - don't use balance source when not needed :-) - add enough retries to cover for the time to detect the server down, taking into account that each attempt waits at least 1 second. For the second solution, you can combine inter and fastinter to lower the failure detection time. For instance, inter 5s fastinter 1s fall 2 will take 5 + 2*1 = 7s to see the server as down. So with at least 8 retries it should be OK. The redispatch will occur once the server has been taken out of the farm, so the source hash algorithm will bring you to another server. Regards, Willy
Re: [PATCH] Fix 'tcp-request content [accept|reject] if condition' parser for missing 'if'.
Hi Maik, On Tue, May 12, 2009 at 01:36:46AM +0200, Maik Broemme wrote: Hi, attached is a patch which fixes a configuration mistake regarding the 'tcp-request' option. If you have the following in your configuration file: acl localnet dst 10.0.0.0/8 tcp-request content reject if localnet This will work fine, but if you change the 'tcp-request' line and remove the 'if' haproxy-1.3.17 will segfault, I think the following changelog entry in 1.3.18 addresses this problem: [BUG] fix parser crash on unconditional tcp content rules yes precisely. But now in 1.3.18 the default behaviour is a bit weird. If you remove the 'if' statement the haproxy will reject every connection, regardless of matching to 'localnet' or not and the configuration seems to be valid, but which is definetly not what expected. I can't reproduce the issue here. For me, what happens is the right thing : - the following config rejects everything : tcp-request content reject - the following config rejects everything which was not accepted : tcp-request content accept if cond tcp-request content reject - the following config rejects only everything which matches the condition : tcp-request content reject if cond The second case above was precisely what led me to discover the segfault bug, which was introduced in 1.3.17 with the refinement of the config warnings. But the behaviour has not changed since 1.3.16. I have changed this to the following behaviour: If nothing is specified after accept or reject the default condition will apply (like source and documentation says) and if there is some parameter after accept or reject it has to be 'if' or 'unless' anything else will result in: [ALERT] 131/012555 (27042) : parsing [/etc/haproxy/haproxy.cfg:94] : 'tcp-request content reject' expects 'if', 'unless' or nothing, but found 'localnet' [ALERT] 131/012555 (27042) : Error reading configuration file : /etc/haproxy/haproxy.cfg I think this is much more accurate. At least it took me some time to verify why the hell my configuration file is valid, but did not work as expected. :) in fact not, that's precisely what I don't want. To workaround the bug I encountered, I had to write that : tcp-request content accept if cond tcp-request content reject if TRUE That's pretty annoying. All conditionnal actions support either if/unless cond or inconditional execution if no condition is specified. Are you sure your config was OK ? Can you post the example which causes you trouble ? Maybe your example is right and the doc is wrong ;-) Regards, Willy
Re: TCP traffic multiplexing as balance algorithm?
Hi Maik, On Tue, May 12, 2009 at 01:57:47AM +0200, Maik Broemme wrote: Hi, I have a small question. Did someone know if it is possible to do simple traffic multiplexing with HAProxy? Maybe I am missing it somehow, but want to ask on the list before creating a patch for it. what do you call traffic multiplexing ? From your description below, I failed to understand what it consists in. Just to answer the real-world scenario question. TCP multiplexing can be very useful for debugging backend servers or doing a simple logging and passive traffic dumping. There are two major ideas of implementing it: - 1:N (Active / Passive) - 1:N (Active / Active) Well active means that request is going to destination and response back to client and passive means that only request is going to the destination. In configuration it could look like: listen smtp-filter 127.0.0.1:25 modetcp balance multiplex server smtp1 10.0.0.5:25 server smtp2 10.0.0.6:25 The active / active would be very hard to implement, tcp stream synchronisation would be a pain and I think no one will really need this, but active / passive is a very useful feature. In my environment it is often so, that developers need access to real traffic data to debug (in the example above) their developed smtp software. Is anyone else missing such functionality? :) Access to real data is solved with tcpdump or logs, I don't see what your load-balancing method will bring here. --Maik Regards, Willy
Re: New HAProxy user keeps loosing connection
On Wed, May 13, 2009 at 04:53:15PM -0400, Tom Potwin wrote: Thanks Alex for the info. Unfortunately, I'm already using 'option httpclose'. Here's my current cfg: global log 127.0.0.1 local0 log 127.0.0.1 local1 notice #log loghostlocal0 info maxconn 4096 #debug #quiet user haproxy group haproxy defaults log global modehttp option httplog option dontlognull retries 3 redispatch maxconn 2000 contimeout 5000 clitimeout 5 srvtimeout 5 listen webfarm 192.168.31.100:80 mode http stats enable stats auth netadmin:5bgr+bdd1WbA stats refresh 5s balance roundrobin cookie JSESSIONID prefix option httpclose option forwardfor option httpchk HEAD /check.txt HTTP/1.0 server web1 192.168.31.202:80 cookie w01 check inter 2000 rise 2 fall 2 server web2 192.168.31.212:80 cookie w02 check inter 2000 rise 2 fall 2 option persist redispatch contimeout 5000 you can try to add option forceclose. If this works with that option, it means that one of the sides (browser or server) is incorrectly ignoring Connection: close (already encountered a long time ago). Do you reach haproxy through a proxy or directly ? maybe this proxy would enforce keep-alive. Hoping this helps, Willy
Re: reloading haproxy
Hi Adrian, On Thu, May 14, 2009 at 03:33:39PM +0200, Adrian Moisey wrote: Hi I tried that, also gave the same result. What is happening is that the new haproxy process asks the old one to release the ports so that it can bind to them. So there exists a short period of time (a few hundreds of microseconds) during which neither the old nor the new process owns the port. This does not happen if your system supports SO_REUSEPORT (such as various BSDs, as well as Linux kernels which have my patch applied). On those systems, the new process can bind to the port eventhough the old one is already bound, so there is absolutely 0 downtime. Anyway in production it will not matter at all, because in practice, when a browser faces a connection abort, it retries after a few hundreds of milliseconds, when the new process is already in place. And this will only concern the very few requests which can happen during the switch. Last, as John said, please ensure to always use -sf instead of -st if you want your reload to really remain unnoticed, since -st kills terminates connections. Regards, Willy
Re: New to HAproxy - how to define custom health-check msg?
Hi, On Fri, May 15, 2009 at 10:51:20AM -0400, John Lauro wrote: I think there might be a better way, but you could run the check against a different port. On that other port, you could have it run your custom check and return an OK response if your check passes and fail if it doesn't. That's generally what is done. However, I'd like to point out that a patch has been proposed to implement explicit content validation (ECV) on HTTP but it should be easily adapted to non-HTTP services. I've not merged it right now because it needs some fixing (risks of segfault if the server does not return a content length or returns an incorrect one). That said, we need a more generic health-check framework. Many people are asking for send/expect, others for lists of rotating URLs, others for an easier ability to send headers. We should put all that down and try to find how to implement something better. Regards, Willy
Re: Multiple httpchks per Backend (some ideas...)
Hi Craig, -- I replied too early to another mail without CCing you, please consult the thread how to define custom health-check msg -- And yes, I'm for script-based checks ;-) Willy On Fri, May 15, 2009 at 05:38:38PM +0200, Craig wrote: Hi, I'd really like to do multiple and advanced checks on a backend, but it does not seem to be possible currently. Shoot^H^H^H^H^HCorrect me, if I missed it in the manual... It would be cool if something like this worked: option httpchk CHECK1 GET /service1.jsp HTTP/1.1\r\nHost:\ www.host.com option httpchk CHECK2 GET /service2.html HTTP/1.1\r\nHost:\ www.host.com expect my text option httpchk CHECK3 HEAD /service3.jsp HTTP/1.1\r\nHost:\ www.host.com notexpect HTTP/1.1 302 You could define extra keywords like expect,notexpect and the Service would only be considered up, if the returned text contains (or does not contain) a certain textpattern, (custom) Host-Headers could be checked, too. Checks would be defined by a name, so that this would be possible: server one 192.168.0.1:80 weight 100 check CHECK1 inter 1000 fall 1 rise 6 check CHECK2 inter 5000 fall 1 rise 1 CHECK1 would be a service that is a bit slowish if it is being restarted (e.g. some Tomcat Application), but is important for your site and you do not wish users to see any error in that application. CHECK2 could be some static web page. It is surely possible to do multiple httpchks with ACLs and some fuddling, but I think defining multiple backends is not that nice for this task. Any other opinions about this? Best regards, Craig
Re: ForwardFor Option Not Working?
Hi Michael, On Mon, May 18, 2009 at 04:21:14PM +, Michael Tinnion wrote: Hi, I've very limited experience with HAProxy and I'm trying to get the ForwardFor option to work but with no joy. I've set the following in my 'listen' section option httpclose option forwardfor indeed, that's the correct way to do it. and I have set the following in my tomcat (hosted by JBoss) configuration: maxKeepAliveRequests=1 According to everything I have read this should make HAProxy add the X-Forwarded-For header to the requests. Unfortunately I am still not seeing this and the client IP address in the request has been set to that of the load balancer. Well, since almost everyone is using it, I don't really believe it does not work. Are you sure you have mode http in your listen section ? Also, how are you checking that the header is correctly added ? Maybe you just mis-wrote its name where you check it ? Regards, Willy
Re: difference between USE_TCPSPLICE and USE_LINUX_SPLICE?
On Mon, May 18, 2009 at 11:51:04AM -0700, Brian Kruger wrote: Hi, probably a question maybe covered elsewhere and I apologize as I couldn't find anything, but am curious what the difference is between USE_TCPSPLICE and USE_LINUX_SPLICE? I know TCP_SPLICE is a patch that's needed (possibly just for older kernels), but do they accomplish the same thing otherwise? (tcp splicing options to speed up connections) indeed, TCP_SPLICE is the first splicing implementation by Alexandre Cassen, as found on linux-l7sw.org. Eventhough it works, we found several oddities in its design and thought about redesigning it differently (for instance, it takes care of the connection till the end, and forwards data per packet, without using socket buffers). Since we have very limited time to work on this, we did not make much progress and at the same time the splice() system call appeared in the linux kernel. It works very differently, was a bit of a pain to get to work initially but finally does what we need : pass a given amount of data between socket buffers. I would say that the LINUX_SPLICE is more reliable by design and is recommended. However it's only available on very recent kernels (2.6.27), and most early versions are buggy. The TCPSPLICE is available for old kernels (starting from 2.6.18 if my memory serves me right), as well as 2.4, and could possibly be ported to other systems without too many difficulties. Hoping this clarifies the choice, Willy
Re: New to HAproxy - how to define custom health-check msg?
On Fri, May 22, 2009 at 09:34:39PM +0530, Sanjeev Kumar wrote: Newbie question: In response to http-health-chk string: HEAD /index.html HTTP/1.0 , if my server responds responds with only one line: HTTP/1.0 200 OK , will the health will be accepted ok. (HAproxy in not accepting this health-response in my setup). Does the health-chk response needs to be complete set of headers ?? Please recheck your script's output, as haproxy only verifies HTTP/1.0 and the first character of the status code (2 here), and is OK with that reply. So surely there is something odd. Regards, Willy
Re: HAProxy - Inline Monitoring?
Hi, On Fri, May 22, 2009 at 11:37:14AM -0700, Jonah Horowitz wrote: I¹m currently testing HAProxy for deployment. Right now we use NetScaler load balancers, and the provide a feature called ³inline monitoring². With inline monitoring the Netscaler will take a server out of rotation if it responds with a 5xx error to a client response. It does this separate from standard health checks. Is there a way to do this with HAProxy? No, and I don't want to do the same as it seems a little bit risky to me. However what is planned is to switch to fast health-checks when a number of 5xx errors is encountered. That way, it would significantly reduce the time to detect a server failure without the risk of taking a server out of the farm on random errors. Regards, Willy
Re: how to enable TCP/IP logging in HAproxy
Hi, On Mon, May 25, 2009 at 12:22:25PM +0530, Sanjeev Kumar wrote: I need to debug why Http-healthCheck respoanse is received by proxy-machine, but HAproxy says no. Event log just displays single message: server down. How to enable detailed TCP logging in HAproxy? haproxy supports several log formats and levels, but it does not log health checks. However, your mistake is quite easy to spot below : option httpchk HEAD /check.tst HTPP/1.0 See ? HTPP instead of HTTP, so most likely your server is responding with 400 Bad request, which you should see in its logs. Hmm, I also see something wrong with your IP address : server servA 192.168.2:9123 check This one is normally equal to 192.168.0.2, which might not be what you want. You should fix it too. When you have problems with health checks, it's always a good idea to try them by hand : # telnet 192.168.2.1 9123 HEAD /check.tst HTPP/1.0 (and wait for the server response) Last, you forgot to set client and server timeouts, which is a bad thing, because you will rely on the system to abort stuck connections and depending on the configuration, it may take between several hours to several days, and sometimes eternity. So please add timeout client and timeout server. Regards, Willy
Re: Persistence based on a server id url param
Hi Ryan, On Mon, Jun 01, 2009 at 12:22:57PM -0700, Ryan Schlesinger wrote: I've got haproxy set up (with 2 frontends) to load balance a php app which works great. However, we're using a java uploader applet that doesn't appear to handle cookies. It would be simple for me to have the uploader use a URL with the server id in it (just like we're already doing with the session id) but I don't see any way to get haproxy to treat that parameter as the actual server id. Using hashing is not an option as changing the number of running application servers is a normal occurrence for us. I also can't use the appsession directive as the haproxy session id cache isn't shared between the two frontends (both running an instance of haproxy). Can this be done with ACLs and I'm missing it? You could very well use ACLs to match your URL parameter in the frontend and switch to either backend 1 or backend 2 depending on the value. Alternatively, you could hash the URL parameter (balance url_param) but it would not necessarily be easy for your application to generate an URL param which will hash back to the same server. So I think that the ACL method is the most appropriate for your case. Basically you'd do that : frontend acl srv1 url_sub SERVERID=1 acl srv2 url_sub SERVERID=2 acl srv1_up nbsrv(bck1) gt 0 acl srv2_up nbsrv(bck2) gt 0 use_backend bck1 if srv1_up srv1 use_backend bck2 if srv2_up srv2 default_backend bck_lb backend bck_lb # Perform load-balancing. Servers state is tracked # from other backends. balance roundrobin server srv1 1.1.1.1 track bck1/srv1 server srv2 1.1.1.2 track bck2/srv2 ... backend bck1 balance roundrobin server srv1 1.1.1.1 check backend bck2 balance roundrobin server srv2 1.1.1.2 check That's just a guideline, but I think you should manage to get it working based on that. Regards, Willy
Re: Haproxy stop to serve http
Hi Luca, On Mon, Jun 22, 2009 at 01:38:53PM +0200, Luca Pimpolari - Multiplayer wrote: Hi to all, I'm using haproxy to serve our web infrastructure, it serves about 500/600 concurrency connection, with some peak to 1000/1200 concurrency connection. All work great, and performance are also so good, but sometimes haproxy stop to serve http traffic (mode http),instead other kind of traffic continues to work (mode tcp). I'm using haproxy 1.3.18, i attach configuration file. Kernel on machine is 2.6.26-2-686 on debian 5.0 Stops are sudden, and i'm unable to replicate it. When it happens haproxy daemon is still up, and continue to serve other kind of service (mode tcp), also stats stop to work. Any help ? I see that you don't have any timeout client in either your defaults section nor your frontends. So most likely after some users have failed to properly disconnect, all you connections are saturated and you cannot serve anybody anymore. And by the way, only one of your backends has timeouts, so I really suggest that you set them all in your defaults section. Also, please be careful, I see very large timeouts here. 330s for a server response in HTTP is way too long, nobody will wait that long! And having that for a health check or a connect is inappropriate too ! A typical connect timeout is around 5s. A client/server timeout depends on the application but we generally see between 10 and 60s. Regards, Willy
Re: stats are cut off
On Mon, Jun 22, 2009 at 04:22:44PM +0200, Krzysztof Oledzki wrote: On Mon, 22 Jun 2009, Angelo Höngens wrote: Hey guys and girls, Hello, I'm a happy user of HAProxy, and for one of my new projects I'm running into a small problem. I have a new configuration with a 120 different instances (one instance for every site on a couple of servers), and it looks like everything works fine. However, I have a problem with the stats page, it will only show the first 38,5 sites :) Please see this screenshot: http://files.hongens.nl/2009/06/22/haproxystats.png Somewhere it's cut off after a /tr tag. This is kind of annoying, because I use the stats page to see the status of the web server nodes behind HAProxy. Don't see anything interesting in the logs.. Any ideas anyone? Which version? Does it always break at the same position? Could you share your config with us? ;) it has the look and feel of version 1.2. I'm don't remember of such a bug there though. It might be worth checking the response length to see if it's a multiple of the response buffer size for instance. Regards, Willy
Re: Redirection with 301 for all subdomains with exception
On Mon, Jun 22, 2009 at 08:32:36PM +0200, Falco SCHMUTZ wrote: Hello everybody, Could you help to fix this configuration ? I need to redirect all sub domains except 5 (admin, pro, www, img*, domain.com without sub domain) to www.domain.com I test this setting, but did not work. acl good_subs url_beg admin pro www img* redirect location www.domain.com 301 if !good_subs the host name is not in the url but in the Host: header. So you must do that instead : acl good_subs hdr_beg(host) -i admin. pro. www. img I have no idea for http://domain.com to http://www.domain.com and i did not know if img with wildcard work. You can do that : acl good_subs hdr_beg(host) -i admin. pro. www. img domain.com For the wildcard you don't need anything special as hdr_reg() matches at the beginning of the field. However if you need finer combinations, check with the regexes. It will be harder to configure but with infinite combinations. Willy
Re: EPEL package upgraded from 1.3.14 - 1.3.18 config issues
On Wed, Jun 24, 2009 at 03:03:11PM +0200, Denis Braekhus wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Willy Tarreau wrote: The build optionns are different apparently. Please run haproxy -vv on both binaries, they will report their respective build options, and presumably the 1.3.18 version was built without a tproxy option. Hmm. Here is the output from -vv : HA-Proxy version 1.3.18 2009/05/10 Copyright 2000-2008 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -O2 -g OPTIONS = USE_REGPARM=1 USE_PCRE=1 vs HA-Proxy version 1.3.14.11 2008/12/04 Copyright 2000-2007 Willy Tarreau w...@1wt.eu Build options : TARGET = linux26 CPU = generic CC = gcc CFLAGS = -O2 -g OPTIONS = USE_REGPARM=1 USE_PCRE=1 No apparent difference in buildopts. I could investigate the source rpm though, if the transparent option should work even in 1.3.18? I will check, because whatever is in the RPM, the build options should prevail. Regards, Willy
Re: reset stats?
On Thu, Jun 25, 2009 at 12:51:23PM -0400, Dave Pascoe wrote: Is there a way to reset haproxy stats numbers without restarting haproxy or sending a -sf? no because it's a new process starting with new stats. I know it's sometimes annoying when you're doing the initial tuning of your config, but once it settles down, generally people don't touch it at all so this is not a problem anyore. Regards, Willy
Re: Haproxy testing
On Thu, Jun 18, 2009 at 08:11:35PM +0100, Chris Sarginson wrote: That's great Malcolm, I'll check that out, and sorry for the apallingly vague subject! warning, there are some issues with this patch. A careful code review shows that you can get nasty behaviour if the server returns no content-length (segfault) or an incorrect content-length. I know that someone is working on it in order to fix it. Regards, Willy
Re: option forwardfor except network issue
On Tue, Jun 16, 2009 at 04:06:36PM +0100, Sigurd Høgsbro wrote: Hello all, I'm trying to deploy haproxy as a replacement for the proxy-module in lighttpd 1.5svn (not yet released), and have managed to mostly configure it to my desires. I'm having problems getting haproxy to recognise all the RFC1918 networks as exception subnets - what is the correct syntax to exclude all of the 10/8, 172.16/12, 192.168/16 networks from X-Forwarded-For header rewriting for a given frontend? Below is the start of my frontend stanza. Cheers, Sigurd listen http bind:80 modehttp option httpclose option forwardfor except 10.0.0.0/8 option forwardfor except 172.16.0.0/12 option forwardfor except 192.168.0.0/16 only one network can be specified, so the last entry overrides the previous ones. I think it would not be too hard to implement ACL-based option forwardfor {if|unless} rule, which would solve your issue once for all. Anyone interested in working on it ? In the mean time I have another solution. You can do that using two distinct backends : frontend http bind:80 modehttp option httpclose acl private src 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 use_backend http-private if private default_backend http-public backend http-public modehttp option forwardfor ... backend http-private modehttp ... Regards, Willy
Re: tcp wierdness with mysql
On Mon, Jun 15, 2009 at 05:19:34PM -0700, Dima Brodsky wrote: Hi, I wonder if anybody has seen this problem with haproxy 1.3.17. We have a mysql server behind an haproxy and about 5% of the queries hang and cause haproxy to time out, i.e. the query returns after 150 seconds. On average the query we are issuing takes 0.1 seconds. If we increase the rate at which we hit the server a higher percentage of queries seem to time out. The config is below. could you add option tcplog in your listener and check what is observed in the logs ? I suspect that you're reaching a limit on the number of concurrent connections on the mysql server, and since your timeout connect is as large as the other ones, it's hard to tell. BTW, you should lower it. There is no reason to wait that long for a connect to succeed. A few seconds (4-5) is generally more than enough. Regards, Willy
haproxy to protect apache against Slowloris and Nkiller2 DoS attacks
Hi all, since I'm seeing worried people everywhere about the apache vulnerability as they call it (while it's just a reuse of a well-known weakness), and other people suggesting incomplete haproxy configuration files, I have prepared a generic haproxy configuration file to be installed without too much hassle in front of any server at risk, and I'm posting it here as it should help people find it more easily : http://haproxy.1wt.eu/download/1.3/examples/antidos.cfg It requires that apache is moved to 127.0.0.1:8080 and that haproxy is installed on pub:80 instead. It does no health check (since some people find it hard to make them work), and it is not a problem because there's only one server. I have tested it against the Slowloris script and the Nkiller2 tool published in phrack (which is a very interesting method BTW). I have not set any ACL, tarpit nor cookies so that the config remains very basic. But of course it could be extended to detect and block more precise patterns. Regards, Willy
Re: R: Delay problem
Hello, On Mon, Jun 29, 2009 at 04:44:13PM +0200, Carlo Granisso wrote: Ok, it seems that problem was in: contimeout clitimeout I've reduced these parameters and now seems that all is working fine. I've read haproxy documentation but I can't completly understand the meaning of Set the maximum inactivity time on the client side: this mean that after complete download of the page haproxy leave opened the connection until...: 1) Client do some operations 2) Timeout reached Probably my problem was the second point: page was correctly loaded and haproxy wait for other activity. Is it correct? I think it is even simpler than that. You have maxconn 300 on your servers, and you don't have option httpclose, which means that clients can maintain a keep-alive connection open an unused after they retrieve an object. By reducing timeout client, you are forcing those connections to die faster, but it's still not the right way to do this. Please simply add option httpclose and I'm sure the problem will definitely vanish. Regards, Willy
Re: Rising number of connections
On Thu, Jul 02, 2009 at 03:08:39PM -0400, John Marrett wrote: Have you perhaps incorrectly configured your SNMP tool to graph the value as a gauge instead of a counter (I assume that the SNMP module returns counters)? That would produce a continuously increasing graph. As to the old processes still running there were some bugs that caused this issue which have now been resolved. I can't find the specific version that this bug is resolved in, but I have been affected by it in the past. stupid question guys : are you sure you have set your timeouts ? What you describe is the behaviour of an instance without any timeout on which some clients randomly disappear from the net leaving a dead connection. Willy
Re: Redirection with 301 for all subdomains with exception
On Fri, Jul 03, 2009 at 12:25:46PM +0200, Falco SCHMUTZ wrote: Hello, I have one question more about redirection : We want to redirect one old domain to the new one with some conditions. We configure some acl like this and it's work fine : acl es path_beg /es redirect location http://www.newdomain.com/es/marruecos.html if es acl en path_beg /en redirect location http://www.newdomain.com/en/morocco.html if en acl www hdr(host) -i www. redirect location http://www.newdomain.com/maroc.html if www But we need one action when some users used old link maybe like this : http://www.olddomain.com/olddirectory/oldhtmlpage.* http://www.olddomain.com/es/olddirectory/oldhtmlpage.* http://www.olddomain.com/en/olddirectory/oldhtmlpage.* do you know if with the first acl configuration, we can redirect user to : http://www.newdomain.com/olddirectory/oldhtmlpage.* http://www.newdomain.com/es/olddirectory/oldhtmlpage.* http://www.newdomain.com/en/olddirectory/oldhtmlpage.* yes, you can do that using redirect prefix that way : acl olddom hdr_end(host) -i olddomain.com redirect prefix http://www.newdomain.com if olddom The redirect rule will then cause a Location: http://www.newdomain.com/olduri header to be emitted. It's typically useful to change domains or to switch between http and https. Regards, Willy
Re: Rising number of connections
On Sun, Jul 05, 2009 at 08:06:17PM +0100, Peter Miller wrote: Ah, we downloaded from the 'Download' section on the site, which is still defaulting to 1.3.17 rather than the 'Latest versions' which has the link to the 1.3.18 source. Luckily we're on 32-bit x86, but will upgrade asap. Oops, you're right :-( I'm fixing the page. I believe that I don't always think about updating that section. Regards, Willy
Re: Selective logging
Hi, On Tue, Jul 07, 2009 at 05:41:40PM +0100, Alex Forrow wrote: Hi, We have been using HAProxy very successfully on a busy website for a while now, sending all logs via syslog to a separate server. A single frontend is used to serve all public requests, and currently logs everything. We would like to just log requests for dynamic pages, is it possible to have HAProxy selectively log requests, either based on an acl, or ideally, backend? No, unfortunately it's not possible right now, and since the logs are configured in the frontend, you cannot even use the backend to make a difference. I think that the simplest solution would be to implement something like log disable if acl which would work both in the frontend and in the backend. I understand your requirement and in my opinion it really makes sense to log only dynamic pages. I'm adding that on the TODO list. Regards, Willy
Re: httpchk is marking Apache host as down
On Tue, Jul 07, 2009 at 02:50:18PM +0100, Pedro Mata-Mouros Fonseca wrote: Greetings, In the following configuration I'm doing an httpchk for an Apache host: backend hosts option httpchk server host1 127.0.0.1:8081 maxconn 50 check This is what shows up in the logs: 127.0.0.1 - - [07/Jul/2009:14:35:41 +0100] OPTIONS / HTTP/1.0 200 - 127.0.0.1 - - [07/Jul/2009:14:35:45 +0100] OPTIONS / HTTP/1.0 200 - 127.0.0.1 - - [07/Jul/2009:14:35:53 +0100] OPTIONS / HTTP/1.0 200 - 127.0.0.1 - - [07/Jul/2009:14:35:57 +0100] OPTIONS / HTTP/1.0 200 - This host should be up, however in the HAProxy console it is marked as DOWN and with a red colored line. Consequently HAProxy is returning a 503 Service Unavailable. This happens every once in a while, after the service is working normaly... Check your apache logs when this happens, and verify the date between two checks. It is possible that Apache is sometimes not responding or even not receiving the checks. Also ensure that apache ALWAYS returns 200 (which I assume it does). Also, I see maxconn 50 above. Are you sure that your apache is configured to accept at least 51 concurrent clients ? Regards, Willy
Re: Redirection with 301 for all subdomains with exception
Hi, On Mon, Jul 06, 2009 at 02:05:09PM +0200, Falco SCHMUTZ wrote: Hello, I m sorry to disturb you again, but some problem persist. This is my final configuration and it works fine : acl es path_beg /es redirect location http://www.newdomain.com/es/marruecos.html code 301 if es acl en path_beg /en redirect location http://www.newdomain.com/en/morocco.html code 301 if en acl www hdr(host) -i www. redirect location http://www.newdomain.com/maroc.html code 301 if www acl admin hdr_beg(host) -i admin redirect location https://admin.newdomain.com code 301 if admin acl pro hdr_beg(host) -i pro redirect location http://pro.newdomain.com code 301 if pro acl olddom hdr_end(host) -i olddomain.com redirect prefix http://www.newdomain.com code 301 if olddom But url with some page html do not rewrite correctly with extension path /es/ or /en/ : Ex : http://www.olddomain.com/es/promotions.html - http://www.newdomain.com/es/marruecos.html This is normal, as this is what you have written in the first rule. but we need http://www.olddomain.com/es/promotions.html - http://www.newdomain.com/es/promotions.html I think that the problem comes from the fact that you're only able to enumerate examples of what you need and not an exhaustive list. Once your list is exhaustive, it will become clear how to arrange your rules so that they do what you need. For information we really need this and currently work fine : http://www.olddomain.com/es/ - http://www.newdomain.com/es/marruecos.html Maybe you only want to redirect a URL ending with /es/ to /es/marruecos.html ? I don't really know, there's nothing very clear. Regards, Willy
Re: nginx 400 status code sometimes reported as 502 in haproxy
On Thu, Jul 09, 2009 at 04:47:23PM +0200, Jean-Baptiste Quenot wrote: I could reproduce the issue in a test setup. Haproxy is running on port 80 and nginx on port 83. The client sends a very long cookie header value (4104 bytes). I used tcpdump -s 0 -i lo -w dump port 83, loaded the dump in wireshark and exported as plain text to produce a readable output. Everytime I reload the page there are two HTTP requests, one for / and one for /favicon.ico. After a random number of reloads, haproxy sends a 502 instead of displaying the 400 error. I tried with and without option httpclose, it doesn't change the behavior. how did you compile your haproxy ? could you run haproxy -vv ? That large a cookie might sometimes hit the buffer limit (which by default is 8kB). Please find attached the TCP packets involved. Could you please send the pcap file instead, it's more readable for me :-) In the normal GET / case, at the end of the request we have: nginx: RST, ACK haproxy: ACK nginx: RST that's interesting because an RST means that a packet was received for a non-existing connection, typically something which was closed during the transfer. I find it strange that nginx sends you an RST during a transfer, it means it has already closed, which is not really expected (or maybe its request buffer size is close to the request size too). However in the last GET / we have: nginx: RST, ACK haproxy: RST, ACK Or maybe I'm misinterpreting the various pieces of information...? I think we will find everything in your trace, we're close to explain what you observe. Advice from a TCP expert is necessary :-) :-) Regards, Willy
Re: Capture and alter a 404 from an internal server
On Mon, Jul 20, 2009 at 10:11:16AM +0100, Pedro Mata-Mouros Fonseca wrote: Thank you so much Maciej, I will give it a try - although in that referenced email it seems like a scary thing to do... A hard thing to evaluate is the cost of having such rspirep processing in every response coming from that specific frontend... Is it too overwhelming to the performance? If you're running 1 request/s you should be careful not to add too many such statements, but at lower speeds, you will almost not notice the extra CPU usage, particularly if you've built with the PCRE library, which is extremely fast. I have seen large configurations where people use between 100 and 200 regexes per request and it does not appear to affect them that much. Wouldn't this just be a perfect candidate for having it's own directive, in the likes of errorfile and errorloc, but specifically only for errors returned by servers instead of only HAProxy? ;-) Something like: errorserverfile 404 /etc/haproxy/errorfiles/404generic.http errorserverloc 404 http://127.0.0.1:8080/404generic.html it might be, but I don't really know if we need the errorserverfile or not. Becase if we only need the errorserverloc above, then you will be able to do it using ACLs when they're usable on the response path. Regards, Willy
Re: Transparent proxy of SSL traffic using Pound to HAProxy backend patch and howto
On Mon, Jul 20, 2009 at 03:23:22PM +0100, Malcolm Turnbull wrote: Many thanks to Ivansceó Krisztián for working on the TPROXY patch for Pound for us, we can finally do SSL termination - HAProxy - backend with TPROXY. http://blog.loadbalancer.org/transparent-proxy-of-ssl-traffic-using-pound-to-haproxy-backend-patch-and-howto/ Patches to Pound are here: http://www.loadbalancer.org/download/PoundSSL-Tproxy/poundtp-2.4.5.tgz Willy, You mentioned that it may be more sensible to do something like: source 0.0.0.0 usesrc hdr(x-forwarded-for) rather than having 2 sets of TPROXY set up.. but I don't think this is possible yet? Unfortunately not yet. I've had to arbitrate between that and the ability to perform content-switching on TCP frontends and the priority went to the later. Another issue you might run into is the reduced number of source ports for the same source IP, because now you have the client, pound, and haproxy all using the same source IP, so you need to be careful that the client never hits haproxy directly on the same port as pound, otherwise it may use a same source port as pound and conflict with an existing session. A trick might consist in using a distinct port on haproxy for direct client connection and pound connections. Regards, Willy
Re: Still dropping TS sessions.
Hi guys, On Wed, Jul 22, 2009 at 08:52:05AM -0400, Guillaume Bourque wrote: Hi Paul I just retrun from vacation so I did'nt see your previous post, but 1 thing for sure haproxy CAN be use to dispatch RDP session, I have been doing this on a couple of site with ~80 users and 4 TS servers wihout any issue at all in the last year. I have look at your config and dont see what could be the problem. I definitely see a problem. Timeouts are too short for RDP (50 seconds). So after that time, if the client does nothing (eg: talk on the phone), his session expires. From what I've heard, people tend to set session timeouts between 8 and 24 hours on RDP. BTW, you might be very interested. Exceliance has developped and contributed RDP persistence ! This is in the development branch. Check the latest snapshot here : http://haproxy.1wt.eu/git/?p=haproxy.git basically, you just have to add the following statement in your backend : persist rdp-cookie And when a session comes in, haproxy will analyse the RDP packet and will look for an RDP cookie. If it has a matching server, it directs the connection to that server, otherwise it does load balancing. And we also have balance rdp-cookie which is used to balance on the msts RDP cookie specifying the user name (when it is available). Regards, Willy
Re: queing problems
Hi Fabian, On Mon, Jul 20, 2009 at 06:11:45PM +0200, Fabian wrote: Hi List, I'm trying to set up a simple tcp load balancing: The backend servers can only handle one request at a time and the requests take between 2-15 seconds to process. I want haproxy to distribute the tcp requests to any free backend server currently not processing a request (no active connection). If all backends are currently active I want to queue the pending requests globally and as soon as a backend becomes free the oldest request in the queue should be redirect to the free backend server. Unfortunatly I can't get the queing to work. When there are pending connections and a backend server becomes idle, it takes a long time before a pending connection is handed to the server. Sometimes the connection even gets a timeout despite the fact that a backend server is idle for already 20 seconds. Your configuration is right. I think that your problem is simply that when you have too many incoming requests, the time to process them all one at a time is too long for the last one to be served. When unspecified, the queue timeout is equivalent to the connect timeout (50s here). So I would suggest that you lower your connect timeout to 5s and set timeout queue 2m for instance. Also, I suggest that you enable a stats page to monitor the activity in real time. It's really useful to check queueing. You just have to add : listen : mode http stats uri / and you connect to this port with your browser. You'll see the backend queue size, and the max it reaches. Regards, Willy
Re: make on os x
Hi, On Thu, Jun 11, 2009 at 09:51:00AM +0200, Rapsey wrote: Sorry error in -vv output, TARGET = darwin Sergej On Thu, Jun 11, 2009 at 9:46 AM, Rapsey rap...@gmail.com wrote: I'm trying to build haproxy with kqueue on osx leopard, but I don't think it's working. There is no mention of DENABLE_KQUEUE anywhere when it's building it. This is the make I use: make Makefile.osx TARGET=darwin CPU=i686 USE_PCRE=1 all Ok you're not using the proper syntax, you need to use : make -f Makefile.osx TARGET=darwin CPU=i686 USE_PCRE=1 all Otherwise you tell make to build Makefile.osx, which already exists so no error is reported. Also, please don't use 1.3.17 as it has a nasty bug which can be triggered on 64-bit systems. Willy
Re: make on os x
On Thu, Jul 23, 2009 at 08:40:23AM +0200, Rapsey wrote: Yes thank you. I figured it out eventually and used the same command as you wrote to build, but kqueue was still not getting enabled. This is the make command I eventually figured out works without issues (uses the default Makefile): make TARGET=osx CPU=i686 USE_KQUEUE=1 USE_POLL=1 USE_PCRE=1 you're right, I wrote osx because you did, but it's TARGET=darwin which automatically enables KQUEUE. Maybe we should simplify this makefile since it only supports one OS. Regards, Willy
Re: queing problems
On Thu, Jul 23, 2009 at 03:08:52PM +0200, Fabian wrote: H Willy, Willy Tarreau schrieb: Your configuration is right. I think that your problem is simply that when you have too many incoming requests, the time to process them all one at a time is too long for the last one to be served. No, it seems like it was a bug in haproxy. The problem went away after I upgraded to the latest haproxy release. I previously used the haproxy package shipped with Ubuntu 8.04 (1.3.15.1). The haproxy recent news webpage mentions some bugfixes for maxconn=1 setups, seems like a encountered one of those fixed bugs. Ah indeed, if you've been using such a version, that can explain a lot of strange issues ;-) Thanks for the feedback! Willy
Re: make on os x
On Thu, Jul 23, 2009 at 01:31:28PM +0200, Rapsey wrote: Even with darwin kqueue was not enabled, I tried it. Why is there even a separate osx makefile if the default one works? I don't remember, it was contributed. I believe it was due to a different make install procedure, though I'm not certain. Willy