On Jul 4, 2014 9:45 AM, "Willy Tarreau" <[email protected]> wrote: > > On Fri, Jul 04, 2014 at 08:28:50AM +0200, Maxime Brugidou wrote:
> > I did exactly that in a second test later before going to sleep and went up > > to 50k session/sec. > > Ah great! Actually i couldnt figure out how to reproduce this consistently. I'm stuck to 30k session/s now but the CPU does not seem to be blocking. Using keep alive helps a lot though. > > I realized that too in my next tests and went back to 3. I forgot about the > > RX buffer which is at 256 by default. I tried raising it to 4096 (the > > maximum) with some little improvements. > > It generally does not help since too large buffers means larger rings > that are less efficient to process (ie take more cache space). 128-512 > are generally the best options for TCP, with 256 often being optimal. > OK I went back to 256 (default) > > I will try to do further testing now that I have a better understanding of > > this. Not sure if there any http tool like siege that are easier to monitor > > object size and latency? > > I'm used to use "inject" which I wrote many years ago. It provides one line > per second (like vmstat) with some metrics of req/s, data/s, avg resp time > and standard deviation. It doesn't support SSL nor keep-alive but I find it > useful enough not to switch to other tools. Legends are still in french but > it should not be a problem for you :-) > > http://git.formilux.org/?p=people/willy/inject.git > http://1wt.eu/tools/inject/ (for the doc) Thanks! I am using this now for my non-keepalive tests it's simple and convenient. > > > OK but you need first to ensure that you *can* max out the bandwidth, > > > otherwise it definitely indicates a setup problem. > > > > OK I'll try to do that with large objects first. I can also increase MTU > > with the backend, use splicing and maybe LRO it should max out the > > bandwidth. > > You should never need to increase MTU at such rates. Even at 40 Gbps I'm > working with 1500. Splicing tends to be slower than copying with many gig > NICs, so reserve it for 10G+ NICs unless your tests show that it's better. > LRO is useless at such low speeds OK I did not do any of these but with 75kB response I got above 950Mps which seems OK for me. > Whenever you want low latency or high bandwidth, you need to test hardware > before selecting the one you'll need. It took me 6 months to find hardware > capable of 10 Gbps in 2009. 10G NICs will provide you much better performance > even at rates below 1G. I know a few web sites very sensitive to response > time which have switched to 10G just for this reason. Myricom NICs will > provide you with a very low latency, but will hardly scale to 10G unless > you're mostly dealing with huge objects. Intel NICs will reach higher bit > rates, but come with a higher CPU usage so are not necessarily relevant > for rates of 1G or less. > > > I am still a bit disappointed by the connections speed I reach. > > You're too impatient :-) > After 1 day and 3 mails you doubled your performance ! > > > I'll update you later today trying out all the solutions and getting better > > data. > > OK. > > Willy > I can get acceptable performance now that I read all the benchmarks and tests online, especially with 2GHz and 1G. However 30k session/sec for one core with 150kpps Rx + 150kpps Tx with this Intel I350 NIC is still very low according to other benchmarks of the NIC going above 1Mpps with Linux and igb. Three questions: 1. If I set up a more recent kernel (CentOS 7 or Debian), do you think it can significantly help? 2. Do you use haproxy with bonding? If we want to add a second NIC and use bonding, do you think that we can use the second CPU socket with it? I can easily send IRQs with smp_affinity but if I add haproxy processes to the second socket I am not sure that they will handle the traffic from the second NIC exclusively. Is RPS/XPS the solution here? 3. With haproxy 1.5, am I right that since we use round robin we can't benefit of the http-keep-alive option? Do we need to switch to another algorithm ?

