On 04/21/2012 12:02 AM, Willy Tarreau wrote:
> Hello Samuele,
> 
hello!

> On Fri, Apr 20, 2012 at 05:19:55PM +0200, Samuele Giovanni Tonon wrote:
>> hello,
>>
>> I've seen some strange behaviour with haproxy and varnish:
>>
>> my arch is with haproxy (1.4.15-1 on ubuntu) on top and then 2 varnish
>> (3.0.2 on centos) on two different servers.
>>
>> This is a new arch we are experimenting, before we had a pulse service+
>> iptables redirect going directly to varnish.
>>
>> What we are now experimenting is a high loss in responsiveness after
>> moving to haproxy.
>>
>> we started with haproxy in http mode but it was too slow, so we swiched
>> to tcp mode  gaining a bit more speed
>> we still keep seing in the stats on lastchk high ms of response time like:
>> var01 - L7OK/302 in 765ms
>> var02 - L7OK/302 in 596ms
> 
> These checks are extremely long for a 302 ! On local networks, it's common
> to see "0ms" because the total check is below 1ms.

yes i agree, i tried setting up an http mode directly on an apache
and those check were max 26 ms.


>> then we started seeing problems: with the "old" architecture ) with
>> just iptables forwarding to one varnish ) we were running
>> webpagetest.org at 7 seconds first view  1.5 repeated view.
>>
>> with haproxy we went to 15 - 20 seconds for first view 7-8  seconds
>> repeated views.
> 
> These times are quite long in my opinion. How many objects do you have
> on your page ? This looks like TCP retransmits.

hmmm i'm not familiar with what you could mean with tcp retransmit how
could i see on the system, with just an netstat ?


>> At the moment we are a bit stuck as we can't undestand what is wrong:
>> dns is fine, network is fine, we don't see high load average or memory
>> exhaustion... everything seems ok; btw we are using vmware vm machines,
>> we even switched to e1000 ethernet cards to avoid some problems with
>> vmxnet .
> 
> Wait a minute, you're saying the most important thing at the last moment !
> Are there other VMs on the same hypervisor ? It's very common to see huge
> network latency degradation when this is the case (this should not be as
> bad as you're observing unless your VMs are saturated of course). Also,
> when you say e1000 cards, you mean that you're using physical NICs from
> the VM or that you're using the e1000 emulation ? Also, when you do your
> tests, what does the CPU load on the vm look like, and is there additional
> traffic on this VM ? could you run "vmstat 1 20" during the test ? 

well i guess i need to give you some more information: the whole
infrastracture is under vmware; i don't know if they are all under the
samy hypervisor but i'm not sure the MAIN cause it's cpu context
switching due  to the fact that by just redirecting the port (that is
what pulse does) we didnt see those numbers, however i'll try to see
on which hypervisor they are and try to split them to avoid cpu context
switching .
as for the rest, e1000 is as emulation nic on vm guest; do you want
vmstat on just the haproxy server or on the varnish too ?


>> this is our haproxy configuration, any suggestion, is there something
>> wrong ?
> 
> Overall it's fine. It would be nice if you enabled http logging and switched
> back to HTTP mode.  The logs will tell us where the time is lost. If this is
> too much a concern, then please at least provide a pcap file of the traffic
> on the VM so that we can see what happens. I suspect a large RTT due to inter-
> VM communications if there is some congestion anywhere, which would explain
> the huge health check time which is 1000 times what it should be.

i'll try both things, thanks for the suggestion


> BTW, I don't know if you pinned your VMs to physical CPUs, but one of the
> worst things to do is to have two VMs sharing the same CPU and communicating
> together. This creates high latencies and context switching rates because
> only one at a time can work on the CPU. This should be observable in the
> network capture.

unfortunately i'm working on a large environment, where infrastracture
is not done by me and i have no idea if  they are pinned or not; i'll
try to investigate .
meanwhile thanks a lot for the suggestion on how to debug.

Cheers
Samuele

Reply via email to