Re: SRP initiator and iSER initiator performance

Vladislav Bolkhovitin Wed, 03 Mar 2010 12:23:00 -0800

Bart Van Assche, on 03/01/2010 11:38 PM wrote:

On Mon, Mar 1, 2010 at 9:12 PM, Vladislav Bolkhovitin <[email protected]<mailto:[email protected]>> wrote:
    [ ... ]
    It's good if my impression was wrong. But you've got suspiciously
    low IOPS numbers. On your hardware you should have much more. Seems
    you experienced a bottleneck on the initiator somewhere above the
    drivers level (fio? sg engine? IRQs or context switches count?), so
    your results could be not really related to the topic. Oprofile and
    lockstat output can shed more light on this.
The number of IOPS I obtained is really high considering that I used thesg I/O engine. This means that no buffering has been used and none ofthe I/O requests were combined into larger requests. I chose the sg I/Oengine on purpose in order to bypass the block layer. I was notinterested in record IOPS numbers but in a test where most of the timeis spent in the SRP / iSER initiator instead of the block layer.

116K IOPS'es isn't high, it's pretty low for QDR IB. Even 4Gbps FC canoverperform it. Remember, Microsoft has managed to get 1 million IOPS'esfrom 10GbE, but your card should be much faster. This is why I havestrong suspicious that the test is incorrect.

Let's estimate how much your IB card can achieve. It has 1us latency on1 byte packets, so it can perform at least 1 millions op/sec. This isthe upper bound estimation, because (1) if the card has multi-coresetup, this number can be several times bigger, and (2) it includes datatransfers. From other side, you can read data via your card on 2.9GB/s.If we consider that transferring a 512B packet has 100% overhead (thisis upper bound estimation too, because I can't believe that such a lowlatency HPC interconnect has so huge data transfer overhead), this willgive us that it can transfer 2.9 / (512 * 2) = 2.9 millions IOPS'es. So,your IB hardware should be capable to make at least 1 million I/Otransfers per second, which is 10 times bigger than you have.

So, you definitely need to find out the bottleneck. I would start fromchecking:

1. fio implemented not too effectively. It can be checked using nullioengine.

2. You have only one outstanding command at time (queue depth 1). Youcan check it during the test either using iostat on the initiator, or(better) on the SCST target in /proc/scsi_tgt/sessions and/proc/scsi_tgt/sgv files.

3. sg engine used by fio in indirect mode, i.e. it transfers databetween user and kernel spaces using data copy. Can be checked lookingat the fio's sources or using oprofile.


Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: SRP initiator and iSER initiator performance

Reply via email to