Willy,

Thanks as always for the very detailed and helpful answer.

I'll reply in-line, like you ;-)

On Sun, Apr 29, 2012 at 7:18 PM, Willy Tarreau <[email protected]> wrote:

> On Sun, Apr 29, 2012 at 05:25:01PM +0300, Bar Ziony wrote:
> > Hi Willy,
> >
> > Thanks for your time.
> >
> > I really didn't know this are such low results.
> >
> > I ran 'ab' from a different machine than haproxy and nginx (which are
> > different machines too). I also tried to run 'ab' from multiple machines
> > (not haproxy or nginx) and the results are pretty much / 3 the single
> > result 'ab' result...
>
> OK so this clearly means that the limitation comes from the tested
> components and not the machine running ab.


> > I'm using VPS machines from Linode.com, they are quite powerful. They're
> > based on Xen. I don't see the network card saturated.
>
> OK I see now. There's no point searching anywhere else. Once again you're
> a victim of the high overhead of virtualization that vendors like to
> pretend
> is almost unnoticeable :-(


The overhead is really that huge?

>
> > As for nf_conntrack, I have iptables enabled with rules as a firewall on
> > each machine, I stopped it on all involved machines and I still get those
> > results. nf_conntrack is compiled to the kernel (it's a kernel provided
> by
> > Linode) so I don't think I can disable it completely. Just not use it
> (and
> > not use any firewall between them).
>
> It's having the module loaded with default settings which is harmful, so
> even unloading the rules will not change anything. Anyway, now I'm pretty
> sure that the overhead caused by the default conntrack settings is nothing
> compared with the overhead of Xen.
>

Why is it harmful that it loaded with default setteings? Could it be
disabled?

>
> > Even if 6-7K is very low (for nginx directly), why is haproxy doing half
> > than that?
>
> That's quite simple : it has two sides so it must process twice the number
> of packets. Since you're virtualized, you're packet-bound. Most of the time
> is spent communicating with the host and with the network, so the more the
> packets and the less performance you get. That's why you're seeing a 2x
> increase even with nginx when enabling keep-alive.
>

1. Can you explain what does it mean that I'm packet-bound, and why is it
happening since I'm using virtualization?
2. When you say twice the number of packets, you mean: Client sends request
(as 1 or more packets) to haproxy which intercepts it, acts upon it and
sends a new request (1 or more packets) to the server, which then again
sends the response, that's why it's twice the number of packets? It's not
twice the bandwidth of using the web-server directly right?


> I'd say that your numbers are more or less in line with a recent benchmark
> we conducted at Exceliance and which is summarized below (each time the
> hardware was running a single VM) :
>
>
> http://blog.exceliance.fr/2012/04/24/hypervisors-virtual-network-performance-comparison-from-a-virtualized-load-balancer-point-of-view/
>
> (BTW you'll note that Xen was the worst performer here with 80% loss
>  compared to native performance).
>
> In your case it's very unlikely that you'd have dedicated hardware, and
> since you don't have access to the host, you don't know what its settings
> are, so I'd say that what you managed to reach is not that bad for such an
> environment.


> You should be able to slightly increase performance by adding the following
> options in your defaults section :
>
>   option tcp-smart-accept
>   option tcp-smart-connect
>

Thanks! I think it did help and now I get 3700 req/sec without -k , and
almost 5000 req/sec with -k.
I do have a small issue (it was there before I added these options): when
doing 'ab -n 10000 -c 60 http://lb-01/test.html', 'ab' gets stuck for a
second or two at the end, causing the req/sec to drop to around 2000
req/sec. If I Ctrl+c before the end, I see the numbers above. Is this
happening because of 'ab' or because of something with my setup? With -k it
doesn't happen. And I also think it doesn't always happen with the second,
passive LB (when I tested it).


> Each of them will save one packet during the TCP handshake, which may
> slightly compensate for the losses caused by virtualization. Note that
> I have also encountered a situation once where conntrack was loaded
> on the hypervisor and not tuned at all, resulting in extremely low
> performance. The effect is that the performance continuously drops as
> you add requests, until your source ports roll over and the performance
> remains stable. In your case, you run with only 10k reqs, which is not
> enough to measure the performance under such conditions. You should have
> one injecter running a constant load (eg: 1M requests in loops) and
> another one running the 10k reqs several times in a row to observe if
> the results are stable or not.
>

What do you mean by "until your source ports rollover" ? I'm sorry, but I
didn't quite understand the meaning of your proposed check?


>
> > about nginx static backend maxconn - what is a high maxconn number? Just
> > the limit I can see with 'ab'?
>
> It depends on your load, but nginx will have no problem handling as
> many concurrent requests as haproxy on static files. So not having
> a maxconn there makes sense. Otherwise you can limit it to a few
> thousands if you want, but the purpose of maxconn is to protect a
> server, so here there is not really anything to protect.
>

OK, great.

>
> Last point about virtualized environments, they're really fine if
> you're seeking costs before performance. However, if you're building
> a high traffic site (>6k req/s might qualify as a high traffic site),
> you'd be better with a real hardware. You would not want to fail such
> a site just for saving a few dollars a month. To give you an idea,
> even with a 15EUR/month dedibox consisting on a single-core Via Nano
> processor and which runs nf_conntrack, I can achieve 14300 req/s.
>

It's very unlikely that we'll move to dedicated boxes. It's not the money
(we have some to spare :)), but the maintainability and scalability of the
setup.
Everything is scalable in our setup, besides the LB, which just have a
passive failover machine with keepalived. We're also already deeply
"invested" in this setup and it will be a big pain to migrate. We don't
have the manpower for that.

I'm now afraid that it won't fit us in the long run, since we're now
peaking at 700 req/sec and averaging 350-400 req/sec. It's still a x4.5
difference than our LB's max, but it's spooky. Do you think that having a
bigger VM (more RAM, more CPU) will help?
Can you think of other options/tuning stuff, even not only to haproxy but
to the kernel , that could help out in this case?

How is it possible that such a low-performance dedicated box can handle
almost 5 times the req/sec rate as a powerful VM?

>
> Hoping this helps,
> Willy
>
>
Thanks again!
Bar.

Reply via email to