I just mounted and configured my brand new LSI 3008-8i. This server had 1 SAS
expander connected to 2 backplanes (8 disks to the first backplane and no disks
connected to the second backplane). After some testing I found the SAS expander
was a bottleneck, so I removed it and connected the first backplane directly to
the controller.
The following results are from 4k 100% random reads (32QD) run in parallel on
each single SSD:
Raw SSDs [ 4k, 100% random reads, 32 Queue Depth]
ServeRaid m5110e (with SAS expander) [numjob=1]
read : io=5111.2MB, bw=87227KB/s, iops=21806, runt= 60002msec
read : io=4800.6MB, bw=81927KB/s, iops=20481, runt= 60002msec
read : io=4997.6MB, bw=85288KB/s, iops=21322, runt= 60002msec
read : io=4796.2MB, bw=81853KB/s, iops=20463, runt= 60001msec
read : io=5062.6MB, bw=86400KB/s, iops=21599, runt= 60001msec
read : io=4989.6MB, bw=85154KB/s, iops=21288, runt= 60001msec
Total read iops: 126,595 ( ~ 21,160 iops/disk)
Raw SSDs [ 4k, 100% random reads, 32 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander)
[numjob=1]
read : io=15032MB, bw=256544KB/s, iops=64136, runt= 60001msec
read : io=16679MB, bw=284656KB/s, iops=71163, runt= 60001msec
read : io=15046MB, bw=256779KB/s, iops=64194, runt= 60001msec
read : io=16667MB, bw=284444KB/s, iops=71111, runt= 60001msec
read : io=16692MB, bw=284867KB/s, iops=71216, runt= 60001msec
read : io=15149MB, bw=258534KB/s, iops=64633, runt= 60002msec
Total read iops: 406,453 ( ~ 67,742 iops/disk)
321% performance improvement.
I chose 4k 32QD because it should deliver the maximum iops and should clearly
show if the I/O is properly configured.
I don’t mind testing the embedded m5110e without the SAS expander because it
will be slower for sure.
> You might need to increase the number of jobs here. The primary reason for
> this parameter is to improve scaling when you’re single thread CPU bound.
> With numjob=1 FIO will use only a single thread and there’s only so much a
> single CPU core can do.
The HBA provided slightly better performance without removing the expander and
even more slightly faster after removing the expander, but then I tried
increasing numjob from 1 to 16 (tried also 12, 18, 20, 24 and 32 but found 16
to get higher iops) and the benchmarks returned expected results. I guess how
this relates with Postgres.. probably effective_io_concurrency, as suggested by
Merlin Moncure, should be the counterpart of numjob in fio?
> I was a little unclear on the disk cache part. It’s a setting, generally in
> the RAID controller / HBA. It’s also a filesystem level option in Linux
> (hdparm) and Windows (somewhere in device manager?). The reason to disable
> the disk cache is that it’s NOT protected against power loss protection on
> the MX300. So by disabling it you can ensure 100% write consistency at the
> cost of write performance. (using fully power protected drives will let you
> keep disk cache enabled)
I always enabled the write cache during my tests. I tried to disable it but
performance were too poor. Those SSD are consumer ones and don’t have any
capacitor :(
> > Why 64k and QD=4? I thought of 8k and larger QD. Will test as soon as
> > possible and report here the results :)
>
> It’s more representative of what you’ll see at the application level. (If
> you’ve got a running system, you can just use IOstat to see what your average
> QD is. (iostat -x 10, and it’s the column: avgqu-sz. Change from 10 seconds
> to whatever interval works best for your environment)
I tried your suggestion (64k, 70/30 random r/w, 4QD) on RAID0 and RAID10
(mdadm) with the new controller and the results are quite good if we think that
the underlying SSDs are consumer with original firmware (overprovisioned at
25%).
RAID10 is about 22% slower in both reads and writes compared to RAID0, at least
on a 1 minute run. The totals and averages were calculated from the whole fio
log output using the single jobs iops.
These are the results:
############################################################################
mdadm RAID0 [ 64k, 70% random reads, 30% random writes, 04 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander)
[numjob=16]
############################################################################
Run status group 0 (all jobs):
READ: io=75943MB, aggrb=1265.7MB/s, minb=80445KB/s, maxb=81576KB/s,
mint=60001msec, maxt=60004msec
WRITE: io=32585MB, aggrb=556072KB/s, minb=34220KB/s, maxb=35098KB/s,
mint=60001msec, maxt=60004msec
Disk stats (read/write):
md127: ios=1213256/520566, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=202541/86892, aggrmerge=0/0, aggrticks=490418/137398,
aggrin_queue=628566, aggrutil=99.20%
sdf: ios=202557/86818, merge=0/0, ticks=450384/131512, in_queue=582528,
util=98.58%
sdb: ios=202626/87184, merge=0/0, ticks=573448/177336, in_queue=751784,
util=99.20%
sdg: ios=202391/86810, merge=0/0, ticks=463644/137084, in_queue=601272,
util=98.46%
sde: ios=202462/86551, merge=0/0, ticks=470028/121424, in_queue=592500,
util=98.79%
sda: ios=202287/86697, merge=0/0, ticks=473312/121192, in_queue=595044,
util=98.95%
sdh: ios=202928/87293, merge=0/0, ticks=511696/135840, in_queue=648272,
util=99.14%
Total read iops: 20,242 ( ~ 3,374 iops/disk)
Total write iops: 8,679 ( ~ 1,447 iops/disk)
############################################################################
mdadm RAID10 [ 64k, 70% random reads, 30% random writes, 04 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander)
[numjob=16]
############################################################################
Run status group 0 (all jobs):
READ: io=58624MB, aggrb=976.11MB/s, minb=62125KB/s, maxb=62814KB/s,
mint=60001msec, maxt=60005msec
WRITE: io=25190MB, aggrb=429874KB/s, minb=26446KB/s, maxb=27075KB/s,
mint=60001msec, maxt=60005msec
Disk stats (read/write):
md127: ios=936349/402381, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
aggrios=156357/134348, aggrmerge=0/0, aggrticks=433286/262226,
aggrin_queue=696052, aggrutil=99.41%
sdf: ios=150239/134315, merge=0/0, ticks=298268/168472, in_queue=466852,
util=95.31%
sdb: ios=153088/133664, merge=0/0, ticks=329160/188060, in_queue=517432,
util=96.81%
sdg: ios=157361/135065, merge=0/0, ticks=658208/459168, in_queue=1118588,
util=99.16%
sde: ios=161361/134315, merge=0/0, ticks=476388/278628, in_queue=756056,
util=97.61%
sda: ios=160431/133664, merge=0/0, ticks=548620/329708, in_queue=878708,
util=99.41%
sdh: ios=155667/135065, merge=0/0, ticks=289072/149324, in_queue=438680,
util=96.71%
Total read iops: 15,625 ( ~ 2,604 iops/disk)
Total write iops: 6,709 ( ~ 1,118 iops/disk)
> > Do you have some HBA card to suggest? What do you think of LSI SAS3008? I
> > think it’s the same as the 3108 without RAID On Chip feature. Probably I
> > will buy a Lenovo HBA card with that chip. It seems blazing fast (1mln
> > IOPS) compared to the actual embedded RAID controller (LSI 2008).
>
> I’ve been able to consistently get the same performance out of any of the LSI
> based cards. The 3008 and 3108 both work great, regardless of vendor. Just
> test or read up on the different configuration parameters (read ahead, write
> back vs write through, disk cache)
Do you have any suggestion for fine tuning this controller? I’m referring to
parameters like nr_requests, queue_depth, etc.
Also, any way to optimize the various mdadm parameters available at
/sys/block/mdX/ ? I disabled the internal bitmap and write performance improved.
Thank you
Pietro Pugni