Greetings,
On 03/09/2016 04:28 PM, matt wrote:
Yes. Regarding the different times, I've made some
editing in order to avoid exposing some information
about our endpoints/ip addresses, but they are
normal times.
Okay, just wanted to ensure that you expected a wide variety of times,
as seeing them for the first time in the logs when looking for another
issue can confuse debugging (or could indicate another issue to be
tracked down).
Besides from that, sounds great. I'll collect
some data tonight (im trying not to do this
now since our traffic is really high)
I'm thinking about requests being queued due to
the maxconn parameter (I have a global maxconn
of 4000, and a default of 3000). Could this be the
case? I'll take a look at haproxy stats too to see if
any of the limits is reached when the app is being
deployed
As HAProxy won't accept a connection if the global maxconn is reached
(until a slot opens up), the timings wouldn't show anything interesting
in that case (though with the logs looking normal and things still being
slow that would be the next item to be examined).
If a backend's servers are all at maxconn then the request will be
queued, and it will show up in the second timing column (Tq).
In general I'd advise keeping the global maxconn high enough so that all
the backend connection slots can get filled (as that way the logs will
make it clear where the issue is). The global maxconn should be low
enough so that the system can't run out of resources, but otherwise I'd
advise using the backends to limit connections (and that also allows the
returning 5xx errors instead of timeouts, which can confuse the
diagnosis if the timeout is the first thing that is seen).
- Chad
I'll let you know about this data recollection.
Thanks again for the help, is being super productive for me