Hi Rajeev, > We are using Haproxy on top of Mesos cluster, We are doing dynamic reloads > for Haproxy based on marathon events (50-100 times in a day). We have nearly > 300 applications that are running on Mesos (300 virtual hosts in Haproxy).
That should be very doable; for context we reload HAProxy thousands of times per day and have around the same number of services. We do leverage our improvements to https://github.com/airbnb/synapse to minimize the number of reloads we have to do, but marathon is good at making us reload. Just curious, how do you have HAProxy deployed, is it running on a centralized machine somewhere or is it running on every host? > When we do dynamic reloads, Haproxy is taking long time to reload Haproxy, we > observed that for 50 applications takes 30-40secs to reload Haproxy. This seems very surprising to me unless you're doing something like SSL. Can you post a portion of your config? > We have a single config file for Haproxy, when we do reload all the > applications are getting reloaded (Front-ends), this causing downtime of all > applications. Is there anyway to reduce the downtime and impact on end-users. > > We tried this scenario, > "http://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html" > > By this if user requests while reload, the requests are queued and serving > after reload. Full disclaimer, I wrote that post, and I'm not sure that it will be all that useful to you if your clients are external or your reloads take > 30s. "The largest drawback is that this works only for outgoing links and not for incoming traffic." It would theoretically not be hard to extend to incoming traffic using ifb but I haven't worked on actually proving out that solution. If the reload takes > 30s that technique simply won't work (you'll be buffering connections for 30s, and likely dropping them). If the 30s reloads are unavoidable you will likely want to consider one of the alternative strategies mentioned in the post. For example you can just drop SYNs since the 1s penalty isn't that big of a deal (will still see 30s+ of unavailability), use nginx/haproxy to route in front of haproxy (can be a bit confusing and hard to work with), or make something similar to http://inside.unbounce.com/product-dev/haproxy-reloads/ (be wary you pay conntrack with a solution like that). > But if we do multiple reloads one after another, HaProxy old processes > persist even after reloading the HaProxy service, this is causing the serious > issue. > > root 7816 0.1 0.0 20024 3028 ? Ss 03:52 0:00 > /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf > 6778 > root 7817 0.0 0.0 20024 3148 ? Ss 03:52 0:00 > /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf > 6778 > > Is there any solution stop the previous process once after it serving the > request. That is expected behaviour afaik. Those processes are likely still alive because there are still open connections held against them. How long is the longest timeout on your backend servers? This is common with long lived TCP mode backends, but those apps are often resilient to losing the TCP connection so you may just be able to kill the haproxy instances (it's what we do). > Can we separate the configurations based on front-ends like in Nginx, so that > only those apps will effect if there is any changes in backend. I mean there is nothing that stops you from running multiple haproxy instances that bind to different ports. I think the right place to start though is figuring out why reloading takes so long, which can probably be figured out by looking at the config. Good luck, -Joey

