Re: [ofa-general] SRP/mlx4 interrupts throttling performance

Cameron Harr Tue, 07 Oct 2008 08:47:19 -0700

Vladislav Bolkhovitin wrote:

Cameron Harr wrote:
Cameron Harr wrote:
This is still too high. Considering that each CS is about 1microsecond you can estimate how many IOPS's it costs you.
Dropping scst_threads down to 2, from 8, with 2 initiators, seems tomake a fairly significant difference, propelling me to a little over100K IOPs and putting the CS rate around 2:1, sometimes lower. 2threads gave the best performance compared to 1, 4 and 8.
Just as a status update, I've gotten my best performance withscst_threads=3 on 2 initiators, and using a separate QP for eachdrive an initiator is writing to. I'm getting pretty consistent112-115K IOPs using two initiators, each writing with 2 processes tothe same 2 physical targets, using 512B blocks. Adding the secondinitiator only bumps me up by about 20K IOPs, but as all the CPUs arepegged around 99%, I'll take that as a bottleneck. Also, as a notefrom Vlad's advice, the CS rate is now around 70K/s on 115K IOPs, soit's not too bad. Interrupts (where this thread started), are around200K/s - a lot higher than I thought they'd go, but I'm notcomplaining. :)
Actually, what you did is tune your workload so it put nicely on allthe participating threads and CPU cores, so all the threads stay eachon its own CPU core and gracefully pass commands during processing toeach other being busy almost all the time. I.e. you put your system insome kind of resonance. If you change your workload just a bit orLinux scheduler changed in the next kernel version, your tuning wouldbe destroyed.

This "resonance" thought actually crossed my mind. I later went and ranthe test locally and found that I got better performance via SRP than Idid locally (good marketing for you :) ). The local run, using nonetworking, gave me around 2 CS/IO. It appeared that when I added thesecond initiator, the requests from the 2 initiators for a single targetwould get coalesced, which would improve the performance.

So, I wouldn't overestimate your results. As I already wrote, the onlyreal fix is to remove all the unneeded context switches betweenthreads during commands processing. This fix would work not only oncarefully tuned artificial workloads, but on real life ones too. 5-10threads participating in a single command processing reminds me thefamous set of histories about how many people of some kind isnecessary to change a burnt out lamp ;)

Nice analogy :). I wish I knew how to eradicate the extra contextswitches. I'll try Bart's trick and see if I can get more info:

_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] SRP/mlx4 interrupts throttling performance

Reply via email to