Re: [PATCH] IB/srp: use multiple CPU cores more effectively

Vladislav Bolkhovitin Mon, 02 Aug 2010 12:10:35 -0700

Bart Van Assche, on 08/02/2010 10:40 PM wrote:

On Mon, Aug 2, 2010 at 8:36 PM, David Dillow<[email protected]>  wrote:


On Mon, 2010-08-02 at 22:16 +0400, Vladislav Bolkhovitin wrote:

Bart Van Assche, on 08/02/2010 07:57 PM wrote:


block size  number of    IOPS        IOPS      IOPS
   in bytes    threads     without     with      with
    ($bs)     ($numjobs)  this patch  thread=n  thread=y
     512           1        25,400      25,400    23,100
     512         128       122,000     122,000   153,000
    4096           1        25,000      25,000    22,700
    4096         128       122,000     121,000   157,000
   65536           1        14,300      14,400    13,600
   65536           4        36,700      36,700    36,600
524288           1         3,470       3,430     3,420
524288           4         5,020       5,020     4,990

I'm interested to see how much your changes affected processing latency,
i.e. to measure execution latency before and after changes. You can't do
that with several threads, because latency = 1/bandwidth only if you
always have only one command at time. So, all those sophisticated
measurements can't substitute a plane old:


If my assumption that --numjobs=1 puts fio into a single-threaded mode
is correct, it seems that using this patch hurts individual command
latency, at least in a gross sense. The table listed above shows a ~9%
hit for single-threaded 0.5 KB and 4 KB requests, ~4.8% for 64 KB
requests, and ~1.4% for 512 KB requests. It seems to win @ lots of
requests and small block sizes, but still seems to hurt performance at
larger request sizes, though it seems they were tested with smaller
thread counts.

I've not reviewed the patch yet, but that's how I read the table above.
I'm assuming latency is hurt by the need to schedule the kernel thread,
but the batching helps increase the IOPS for low request sizes.


Please note that the user has to enable mode thread=y explicitly. The
default mode is thread=n and in that mode neither latency nor
throughput is affected by this patch.

Bart, you could also try xdd as a benchmark tool.


I'm familiar with xdd. However, I consider fio both as more powerful
and easier to user than xdd.

Bart, you simply can't measure your link/processing latency with it in atrustworthy manner. In my experience, it's too heavy wighted to measuresuch small objects, i.e. its internal overhead is >= the measured value.In the scientific terms it means that you have instrumental mistake intens-hundreds %%.


Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IB/srp: use multiple CPU cores more effectively

Reply via email to