Hi Willy,
Nice to know that a fix is on its way. Looking forward to that. We are in a 
process of migrating from Windows/WebSphere and have another twenty-five 
Jetty-apps that will run on this environment. With health checks from all these 
applications the problem might be bigger than it is today. 

I have put "option nolinger" in all the backends with backend-check in our 
test-environment. This change will be merged into production on Monday, but it 
might take some time before we know for sure if this has improved the 
situation. Its only one week left to do changes before Christmas, so I am an 
not sure how many reloads there will be before next Year.

Thanks for great help so far. I will update You as soon as we get five or more 
successful reloads (or worst case, a reload that hangs in one minute again)

Regards
Terje

-----Opprinnelig melding-----
Fra: Willy Tarreau [mailto:w...@1wt.eu] 
Sendt: 5. desember 2012 22:43
Til: Borgen, Terje
Kopi: haproxy@formilux.org
Emne: Re: VS: Haparoxy hangs in one minute on config reload

Hi Terje,

On Wed, Dec 05, 2012 at 09:33:19AM +0100, Borgen, Terje wrote:
> Hi Willy,
> Thanks for Your quick response.
> I think You might be onto something here. We have a similar setup with 
> haproxy using port 80 and have never experienced this problem in that 
> environment.

OK.

> /proc/sys/net/ipv4/ip_local_port_range says 32768-61000, so nothing 
> special here. We have another similar problem when restarting the 
> Jetty-servers on the same server. We always get an error saying that 
> the port is in use and we have to wait one minute before it can start 
> again. The Jetty ports (as You can see in the config) are also outside 
> the ip_local_port_range. But this might be another problem since it happens 
> every restart.

Yes, typically a listening port bound without SO_REUSEADDR. Very common in fact.

> Some additional info:
> - We have two identical servers running apache http server, haproxy 
> and jetty servers. Most of the traffic hits the main server, and the 
> reload problem have never happened on the failover server. So this 
> problem might be "traffic-related".
> - For one week we changed the inter-parameter on the clusters from 
> default 2000 to 60000 leaving rise/fall as default. In that period the 
> problem never occurred.

OK, I see. The health checks are causing too many time-wait sockets.
This issue was very recently fixed (in 1.5-dev14) as haproxy now closes health 
check sockets with a TCP reset, thus avoiding the TIME_WAIT. I'm pretty sure 
they're the one causing the issue as I've experienced a similar one recently 
(reason why I fixed it :-)).

I have not backported this yet as I wanted to keep an observation period.

However you can try something : put "option nolinger" in your BACKENDS, not 
your frontends, otherwise some clients will experience truncated responses!!! 
All backend connections (including checks) will be closed by a reset and you 
should see much less TIME_WAIT sockets between haproxy and the servers.

Regards,
Willy


Reply via email to