Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

Ryan Thu, 07 Feb 2019 04:17:04 -0800

I just ran your test on a cluster with 5 hosts 2x Intel 6130, 12x 860 Evo
2TB SSD per host (6 per SAS3008), 2x bonded 10GB NIC, 2x Arista switches.


Pool with 3x replication

rados bench -p scbench -b 4096 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for
up to 10 seconds or 0 objects
Object prefix: benchmark_data_dc1-kube-01_3458991
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
    0       0         0         0         0         0           -
 0
    1      16      5090      5074   19.7774   19.8203  0.00312568
0.00315352
    2      16     10441     10425   20.3276   20.9023  0.00332591
0.00307105
    3      16     15548     15532    20.201   19.9492  0.00337573
0.00309134
    4      16     20906     20890   20.3826   20.9297  0.00282902
0.00306437
    5      16     26107     26091   20.3686   20.3164  0.00269844
0.00306698
    6      16     31246     31230   20.3187   20.0742  0.00339814
0.00307462
    7      16     36372     36356   20.2753   20.0234  0.00286653
 0.0030813
    8      16     41470     41454   20.2293   19.9141  0.00272051
0.00308839
    9      16     46815     46799   20.3011   20.8789  0.00284063
0.00307738
Total time run:         10.0035
Total writes made:      51918
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     20.2734
Stddev Bandwidth:       0.464082
Max bandwidth (MB/sec): 20.9297
Min bandwidth (MB/sec): 19.8203
Average IOPS:           5189
Stddev IOPS:            118
Max IOPS:               5358
Min IOPS:               5074
Average Latency(s):     0.00308195
Stddev Latency(s):      0.00142825
Max latency(s):         0.0267947
Min latency(s):         0.00217364

rados bench -p scbench 10 rand
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
lat(s)
    0       0         0         0         0         0           -
 0
    1      15     39691     39676    154.95   154.984  0.00027022
0.000395993
    2      16     83701     83685   163.416    171.91 0.000318949
0.000375363
    3      15    129218    129203   168.199   177.805 0.000300898
0.000364647
    4      15    173733    173718   169.617   173.887 0.000311723
0.00036156
    5      15    216073    216058   168.769   165.391 0.000407594
0.000363371
    6      16    260381    260365   169.483   173.074 0.000323371
0.000361829
    7      15    306838    306823   171.193   181.477 0.000284247
0.000358199
    8      15    353675    353660   172.661   182.957 0.000338128
0.000355139
    9      15    399221    399206   173.243   177.914 0.000422527
0.00035393
Total time run:       10.0003
Total reads made:     446353
Read size:            4096
Object size:          4096
Bandwidth (MB/sec):   174.351
Average IOPS:         44633
Stddev IOPS:          2220
Max IOPS:             46837
Min IOPS:             39676
Average Latency(s):   0.000351679
Max latency(s):       0.00530195
Min latency(s):       0.000135292

On Thu, Feb 7, 2019 at 2:17 AM <[email protected]> wrote:

> Hi List
>
> We are in the process of moving to the next usecase for our ceph cluster
> (Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and
> that works fine.
>
> We're currently on luminous / bluestore, if upgrading is deemed to
> change what we're seeing then please let us know.
>
> We have 6 OSD hosts, each with a S4510 of 1TB with 1 SSD in each. Connected
> through a H700 MegaRaid Perc BBWC, EachDiskRaid0 - and scheduler set to
> deadline, nomerges = 1, rotational = 0.
>
> Each disk "should" give approximately 36K IOPS random write and the double
> random read.
>
> Pool is setup with a 3x replicaiton. We would like a "scaleout" setup of
> well performing SSD block devices - potentially to host databases and
> things like that. I ready through this nice document [0], I know the
> HW are radically different from mine, but I still think I'm in the
> very low end of what 6 x S4510 should be capable of doing.
>
> Since it is IOPS i care about I have lowered block size to 4096 -- 4M
> blocksize nicely saturates the NIC's in both directions.
>
>
> $ sudo rados bench -p scbench -b 4096 10 write --no-cleanup
> hints = 1
> Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for
> up to 10 seconds or 0 objects
> Object prefix: benchmark_data_torsk2_11207
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
>     0       0         0         0         0         0           -
>  0
>     1      16      5857      5841   22.8155   22.8164  0.00238437
> 0.00273434
>     2      15     11768     11753   22.9533   23.0938   0.0028559
> 0.00271944
>     3      16     17264     17248   22.4564   21.4648  0.00246666
> 0.00278101
>     4      16     22857     22841   22.3037   21.8477    0.002716
> 0.00280023
>     5      16     28462     28446   22.2213   21.8945  0.00220186
> 0.002811
>     6      16     34216     34200   22.2635   22.4766  0.00234315
> 0.00280552
>     7      16     39616     39600   22.0962   21.0938  0.00290661
> 0.00282718
>     8      16     45510     45494   22.2118   23.0234   0.0033541
> 0.00281253
>     9      16     50995     50979   22.1243   21.4258  0.00267282
> 0.00282371
>    10      16     56745     56729   22.1577   22.4609  0.00252583
>  0.0028193
> Total time run:         10.002668
> Total writes made:      56745
> Write size:             4096
> Object size:            4096
> Bandwidth (MB/sec):     22.1601
> Stddev Bandwidth:       0.712297
> Max bandwidth (MB/sec): 23.0938
> Min bandwidth (MB/sec): 21.0938
> Average IOPS:           5672
> Stddev IOPS:            182
> Max IOPS:               5912
> Min IOPS:               5400
> Average Latency(s):     0.00281953
> Stddev Latency(s):      0.00190771
> Max latency(s):         0.0834767
> Min latency(s):         0.00120945
>
> Min latency is fine -- but Max latency of 83ms ?
> Average IOPS @ 5672 ?
>
> $ sudo rados bench -p scbench  10 rand
> hints = 1
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
>     0       0         0         0         0         0           -
>  0
>     1      15     23329     23314   91.0537   91.0703 0.000349856
> 0.000679074
>     2      16     48555     48539   94.7884   98.5352 0.000499159
> 0.000652067
>     3      16     76193     76177   99.1747   107.961 0.000443877
> 0.000622775
>     4      15    103923    103908   101.459   108.324 0.000678589
> 0.000609182
>     5      15    132720    132705   103.663   112.488 0.000741734
> 0.000595998
>     6      15    161811    161796   105.323   113.637 0.000333166
> 0.000586323
>     7      15    190196    190181   106.115   110.879 0.000612227
> 0.000582014
>     8      15    221155    221140   107.966   120.934 0.000471219
> 0.000571944
>     9      16    251143    251127   108.984   117.137 0.000267528
> 0.000566659
> Total time run:       10.000640
> Total reads made:     282097
> Read size:            4096
> Object size:          4096
> Bandwidth (MB/sec):   110.187
> Average IOPS:         28207
> Stddev IOPS:          2357
> Max IOPS:             30959
> Min IOPS:             23314
> Average Latency(s):   0.000560402
> Max latency(s):       0.109804
> Min latency(s):       0.000212671
>
> This is also quite far from expected. I have 12GB of memory on the OSD
> daemon for caching on each host - close to idle cluster - thus 50GB+ for
> caching with a working set of < 6GB .. this should - in this case
> not really be bound by the underlying SSD. But if it were:
>
> IOPS/disk * num disks / replication => 95K * 6 / 3 => 190K or 6x off?
>
> No measureable service time in iostat when running tests, thus I have
> come to the conclusion that it has to be either client side, the
> network path, or the OSD-daemon that deliveres the increasing latency /
> decreased IOPS.
>
> Is there any suggestions on how to get more insigths in that?
>
> Has anyone replicated close to the number Micron are reporting on NVMe?
>
> Thanks a log.
>
> [0]
>
> https://www.micron.com/-/media/client/global/documents/products/other-documents/micron_9200_max_ceph_12,-d-,2,-d-,8_luminous_bluestore_reference_architecture.pdf?la=en
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?

Reply via email to