> -----Original Message-----
> From: [email protected] [mailto:linux-block-
> [email protected]] On Behalf Of Hannes Reinecke
> Sent: Thursday, November 10, 2016 10:05 AM
> To: Jens Axboe <[email protected]>; Christoph Hellwig <[email protected]>
> Cc: SCSI Mailing List <[email protected]>; linux-
> [email protected]
> Subject: Reduced latency is killing performance
>
> Hi all,
>
> this really feels like a follow-up to the discussion we've had in
> Santa Fe, but finally I'm able to substantiate it with some numbers.
>
> I've made a patch to enable the megaraid_sas driver for multiqueue.
> While this is pretty straightforward (I'll be sending the patchset
> later on), the results are ... interesting.
>
> I've run the 'ssd-test.fio' script from Jens' repository, and these
> results for MQ/SQ (- is mq, + is sq):
>
> Run status group 0 (all jobs): [4 KiB sequential reads]
> - READ: io=10641MB, aggrb=181503KB/s
> + READ: io=18370MB, aggrb=312572KB/s
>
> Run status group 1 (all jobs): [4 KiB random reads]
> - READ: io=441444KB, aggrb=7303KB/s
> + READ: io=223108KB, aggrb=3707KB/s
>
> Run status group 2 (all jobs): [4 KiB sequential writes]
> - WRITE: io=22485MB, aggrb=383729KB/s
> + WRITE: io=47421MB, aggrb=807581KB/s
>
> Run status group 3 (all jobs): [4 KiB random writes]
> - WRITE: io=489852KB, aggrb=8110KB/s
> + WRITE: io=489748KB, aggrb=8134KB/s
>
> Disk stats (read/write):
> - sda: ios=2834412/5878578, merge=0/0
> + sda: ios=205278/2680329, merge=4552593/9580622
[deleted minb, maxb, mint, maxt, ticks, in_queue, and util above]
>
> As you can see, we're really losing performance in the multiqueue
> case.
> And the main reason for that is that we submit about _10 times_ as
> much I/O as we do for the single-queue case.
That script is running:
0) 4 KiB sequential reads
1) 4 KiB random reads
2) 4 KiB sequential writes
3) 4 KiB random writes
I think you're just seeing a lack of merges for the tiny sequential
workloads. Those are the ones where mq has lower aggrb results.
Check the value in /sys/block/sda/queue/nomerges. The values are
0=search for fast and slower merges
1=only attempt fast merges
2=don't attempt any merges
The SNIA Enterprise Solid State Storage Performance Test Specification
(SSS PTS-E) only measures 128 KiB and 1 MiB sequential IOs - it doesn't
test tiny sequential IOs. Applications may do anything, but I think
most understand that fewer, bigger transfers are more efficient
throughout the IO stack. A blocksize of 128 KiB would reduce those
IOs by 96%.
For hpsa, we often turned them off to avoid the overhead while running
applications generating decent-sized IOs on their own.
Note that the random read aggrb value doubled with mq, and random
writes showed no impact.
You might also want to set
cpus_allowed_policy=split
to keep threads from wandering across CPUs (and thus changing queues).
> So I guess having an I/O scheduler is critical, even for the scsi-mq
> case.
blk-mq still supports merges without any scheduler.
---
Robert Elliott, HPE Persistent Memory