Hi, Alexandre.

Very good work!
Do you have a rpm-file?
Thanks.

2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER <[email protected]>:

> Hi,
>
> I have tested qemu with last tcmalloc 2.4, and the improvement is huge
> with iothread: 50k iops (+45%) !
>
>
>
> qemu : no iothread : glibc : iops=33395
> qemu : no-iothread : tcmalloc (2.2.1) : iops=34516 (+3%)
> qemu : no-iothread : jemmaloc : iops=42226 (+26%)
> qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%)
>
>
> qemu : iothread : glibc : iops=34516
> qemu : iothread : tcmalloc : iops=38676 (+12%)
> qemu : iothread : jemmaloc : iops=28023 (-19%)
> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)
>
>
>
>
>
> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)
> ------------------------------------------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10
> 05:54:24 2015
>   read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec
>     slat (usec): min=1, max=1136, avg= 3.54, stdev= 3.58
>     clat (usec): min=128, max=6262, avg=631.41, stdev=197.71
>      lat (usec): min=149, max=6265, avg=635.27, stdev=197.40
>     clat percentiles (usec):
>      |  1.00th=[  318],  5.00th=[  378], 10.00th=[  418], 20.00th=[  474],
>      | 30.00th=[  516], 40.00th=[  564], 50.00th=[  612], 60.00th=[  652],
>      | 70.00th=[  700], 80.00th=[  756], 90.00th=[  860], 95.00th=[  980],
>      | 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896],
>      | 99.99th=[ 3760]
>     bw (KB  /s): min=145608, max=249688, per=100.00%, avg=201108.00,
> stdev=21718.87
>     lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63%
>     lat (msec) : 2=4.46%, 4=0.03%, 10=0.01%
>   cpu          : usr=9.73%, sys=24.93%, ctx=66417, majf=0, minf=38
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>      issued    : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>    READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s,
> mint=26070msec, maxt=26070msec
>
> Disk stats (read/write):
>   vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840,
> util=99.73%
>
>
>
>
>
>
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10
> 06:05:06 2015
>   read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec
>     slat (usec): min=1, max=710, avg= 3.31, stdev= 3.35
>     clat (usec): min=191, max=4740, avg=884.66, stdev=315.65
>      lat (usec): min=289, max=4743, avg=888.31, stdev=315.51
>     clat percentiles (usec):
>      |  1.00th=[  462],  5.00th=[  516], 10.00th=[  548], 20.00th=[  596],
>      | 30.00th=[  652], 40.00th=[  764], 50.00th=[  868], 60.00th=[  940],
>      | 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416],
>      | 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640],
>      | 99.99th=[ 3632]
>     bw (KB  /s): min=98352, max=177328, per=99.91%, avg=143772.11,
> stdev=21782.39
>     lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01%
>     lat (msec) : 2=29.74%, 4=1.07%, 10=0.01%
>   cpu          : usr=7.10%, sys=16.90%, ctx=54855, majf=0, minf=38
>   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>      issued    : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
>      latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>    READ: io=5120.0MB, aggrb=143896KB/s, minb=143896KB/s, maxb=143896KB/s,
> mint=36435msec, maxt=36435msec
>
> Disk stats (read/write):
>   vdb: ios=1301357/0, merge=0/0, ticks=1033036/0, in_queue=1032716,
> util=99.85%
>
>
> ----- Mail original -----
> De: "aderumier" <[email protected]>
> À: "Robert LeBlanc" <[email protected]>
> Cc: "Mark Nelson" <[email protected]>, "ceph-devel" <
> [email protected]>, "pushpesh sharma" <[email protected]>,
> "ceph-users" <[email protected]>
> Envoyé: Mardi 9 Juin 2015 18:47:27
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>
> Hi Robert,
>
> >>What I found was that Ceph OSDs performed well with either
> >>tcmalloc or jemalloc (except when RocksDB was built with jemalloc
> >>instead of tcmalloc, I'm still working to dig into why that might be
> >>the case).
> yes,from my test, for osd tcmalloc is a little faster (but very little)
> than jemalloc.
>
>
>
> >>However, I found that tcmalloc with QEMU/KVM was very detrimental to
> >>small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much
> >>better for QEMU/KVM in the tests that we ran. [1]
>
>
> Just have done qemu test (4k randread - rbd_cache=off), I don't see speed
> regression with tcmalloc.
> with qemu iothread, tcmalloc have a speed increase over glib
> with qemu iothread, jemalloc have a speed decrease
>
> without iothread, jemalloc have a big speed increase
>
> this is with
> -qemu 2.3
> -tcmalloc 2.2.1
> -jemmaloc 3.6
> -libc6 2.19
>
>
> qemu : no iothread : glibc : iops=33395
> qemu : no-iothread : tcmalloc : iops=34516 (+3%)
> qemu : no-iothread : jemmaloc : iops=42226 (+26%)
>
> qemu : iothread : glibc : iops=34516
> qemu : iothread : tcmalloc : iops=38676 (+12%)
> qemu : iothread : jemmaloc : iops=28023 (-19%)
>
>
> (The benefit of iothreads is that we can scale with more disks in 1vm)
>
>
> fio results:
> ------------
>
> qemu : iothread : tcmalloc : iops=38676
> -----------------------------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=0): [r(1)] [100.0% done] [123.5MB/0KB/0KB /s] [31.6K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=1265: Tue Jun 9
> 18:16:53 2015
> read : io=5120.0MB, bw=154707KB/s, iops=38676, runt= 33889msec
> slat (usec): min=1, max=715, avg= 3.63, stdev= 3.42
> clat (usec): min=152, max=5736, avg=822.12, stdev=289.34
> lat (usec): min=231, max=5740, avg=826.10, stdev=289.08
> clat percentiles (usec):
> | 1.00th=[ 402], 5.00th=[ 466], 10.00th=[ 510], 20.00th=[ 572],
> | 30.00th=[ 636], 40.00th=[ 716], 50.00th=[ 780], 60.00th=[ 852],
> | 70.00th=[ 932], 80.00th=[ 1020], 90.00th=[ 1160], 95.00th=[ 1352],
> | 99.00th=[ 1800], 99.50th=[ 1944], 99.90th=[ 2256], 99.95th=[ 2448],
> | 99.99th=[ 3888]
> bw (KB /s): min=123888, max=198584, per=100.00%, avg=154824.40,
> stdev=16978.03
> lat (usec) : 250=0.01%, 500=8.91%, 750=36.44%, 1000=32.63%
> lat (msec) : 2=21.65%, 4=0.37%, 10=0.01%
> cpu : usr=8.29%, sys=19.76%, ctx=55882, majf=0, minf=39
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=5120.0MB, aggrb=154707KB/s, minb=154707KB/s, maxb=154707KB/s,
> mint=33889msec, maxt=33889msec
>
> Disk stats (read/write):
> vdb: ios=1302739/0, merge=0/0, ticks=934444/0, in_queue=934096, util=99.77%
>
>
>
> qemu : no-iothread : tcmalloc : iops=34516
> ---------------------------------------------
> Jobs: 1 (f=1): [r(1)] [100.0% done] [163.2MB/0KB/0KB /s] [41.8K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=896: Tue Jun 9
> 18:19:08 2015
> read : io=5120.0MB, bw=138065KB/s, iops=34516, runt= 37974msec
> slat (usec): min=1, max=708, avg= 3.98, stdev= 3.57
> clat (usec): min=208, max=11858, avg=921.43, stdev=333.61
> lat (usec): min=266, max=11862, avg=925.77, stdev=333.40
> clat percentiles (usec):
> | 1.00th=[ 434], 5.00th=[ 510], 10.00th=[ 564], 20.00th=[ 652],
> | 30.00th=[ 732], 40.00th=[ 812], 50.00th=[ 876], 60.00th=[ 940],
> | 70.00th=[ 1020], 80.00th=[ 1112], 90.00th=[ 1320], 95.00th=[ 1576],
> | 99.00th=[ 1992], 99.50th=[ 2128], 99.90th=[ 2736], 99.95th=[ 3248],
> | 99.99th=[ 4320]
> bw (KB /s): min=77312, max=185576, per=99.74%, avg=137709.88,
> stdev=16883.77
> lat (usec) : 250=0.01%, 500=4.36%, 750=27.61%, 1000=35.60%
> lat (msec) : 2=31.49%, 4=0.92%, 10=0.02%, 20=0.01%
> cpu : usr=7.19%, sys=19.52%, ctx=55903, majf=0, minf=38
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=5120.0MB, aggrb=138064KB/s, minb=138064KB/s, maxb=138064KB/s,
> mint=37974msec, maxt=37974msec
>
> Disk stats (read/write):
> vdb: ios=1309902/0, merge=0/0, ticks=1068768/0, in_queue=1068396,
> util=99.86%
>
>
>
> qemu : iothread : glibc : iops=34516
> -------------------------------------
>
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [133.4MB/0KB/0KB /s] [34.2K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=876: Tue Jun 9
> 18:24:01 2015
> read : io=5120.0MB, bw=137786KB/s, iops=34446, runt= 38051msec
> slat (usec): min=1, max=496, avg= 3.88, stdev= 3.66
> clat (usec): min=283, max=7515, avg=923.34, stdev=300.28
> lat (usec): min=286, max=7519, avg=927.58, stdev=300.02
> clat percentiles (usec):
> | 1.00th=[ 506], 5.00th=[ 564], 10.00th=[ 596], 20.00th=[ 652],
> | 30.00th=[ 724], 40.00th=[ 804], 50.00th=[ 884], 60.00th=[ 964],
> | 70.00th=[ 1048], 80.00th=[ 1144], 90.00th=[ 1304], 95.00th=[ 1448],
> | 99.00th=[ 1896], 99.50th=[ 2096], 99.90th=[ 2480], 99.95th=[ 2640],
> | 99.99th=[ 3984]
> bw (KB /s): min=102680, max=171112, per=100.00%, avg=137877.78,
> stdev=15521.30
> lat (usec) : 500=0.84%, 750=32.97%, 1000=30.82%
> lat (msec) : 2=34.65%, 4=0.71%, 10=0.01%
> cpu : usr=7.42%, sys=19.47%, ctx=52455, majf=0, minf=38
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=5120.0MB, aggrb=137785KB/s, minb=137785KB/s, maxb=137785KB/s,
> mint=38051msec, maxt=38051msec
>
> Disk stats (read/write):
> vdb: ios=1307426/0, merge=0/0, ticks=1051416/0, in_queue=1050972,
> util=99.85%
>
>
>
> qemu : no iothread : glibc : iops=33395
> -----------------------------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [125.4MB/0KB/0KB /s] [32.9K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=886: Tue Jun 9
> 18:27:18 2015
> read : io=5120.0MB, bw=133583KB/s, iops=33395, runt= 39248msec
> slat (usec): min=1, max=1054, avg= 3.86, stdev= 4.29
> clat (usec): min=139, max=12635, avg=952.85, stdev=335.51
> lat (usec): min=303, max=12638, avg=957.01, stdev=335.29
> clat percentiles (usec):
> | 1.00th=[ 516], 5.00th=[ 564], 10.00th=[ 596], 20.00th=[ 652],
> | 30.00th=[ 724], 40.00th=[ 820], 50.00th=[ 924], 60.00th=[ 996],
> | 70.00th=[ 1080], 80.00th=[ 1176], 90.00th=[ 1336], 95.00th=[ 1528],
> | 99.00th=[ 2096], 99.50th=[ 2320], 99.90th=[ 2672], 99.95th=[ 2928],
> | 99.99th=[ 4832]
> bw (KB /s): min=98136, max=171624, per=100.00%, avg=133682.64,
> stdev=19121.91
> lat (usec) : 250=0.01%, 500=0.57%, 750=32.57%, 1000=26.98%
> lat (msec) : 2=38.59%, 4=1.28%, 10=0.01%, 20=0.01%
> cpu : usr=9.24%, sys=15.92%, ctx=51219, majf=0, minf=38
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=5120.0MB, aggrb=133583KB/s, minb=133583KB/s, maxb=133583KB/s,
> mint=39248msec, maxt=39248msec
>
> Disk stats (read/write):
> vdb: ios=1304526/0, merge=0/0, ticks=1075020/0, in_queue=1074536,
> util=99.84%
>
>
>
> qemu : iothread : jemmaloc : iops=28023
> ----------------------------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [97.9% done] [155.2MB/0KB/0KB /s] [39.1K/0/0 iops]
> [eta 00m:01s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=899: Tue Jun 9
> 18:30:26 2015
> read : io=5120.0MB, bw=112094KB/s, iops=28023, runt= 46772msec
> slat (usec): min=1, max=467, avg= 4.33, stdev= 4.77
> clat (usec): min=253, max=11307, avg=1135.63, stdev=346.55
> lat (usec): min=256, max=11309, avg=1140.39, stdev=346.22
> clat percentiles (usec):
> | 1.00th=[ 510], 5.00th=[ 628], 10.00th=[ 700], 20.00th=[ 820],
> | 30.00th=[ 924], 40.00th=[ 1032], 50.00th=[ 1128], 60.00th=[ 1224],
> | 70.00th=[ 1320], 80.00th=[ 1416], 90.00th=[ 1560], 95.00th=[ 1688],
> | 99.00th=[ 2096], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2832],
> | 99.99th=[ 3760]
> bw (KB /s): min=91792, max=174416, per=99.90%, avg=111985.27,
> stdev=17381.70
> lat (usec) : 500=0.80%, 750=13.10%, 1000=23.33%
> lat (msec) : 2=61.30%, 4=1.46%, 10=0.01%, 20=0.01%
> cpu : usr=7.12%, sys=17.43%, ctx=54507, majf=0, minf=38
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=5120.0MB, aggrb=112094KB/s, minb=112094KB/s, maxb=112094KB/s,
> mint=46772msec, maxt=46772msec
>
> Disk stats (read/write):
> vdb: ios=1309169/0, merge=0/0, ticks=1305796/0, in_queue=1305376,
> util=98.68%
>
>
>
> qemu : non-iothread : jemmaloc : iops=42226
> --------------------------------------------
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=libaio, iodepth=32
> fio-2.1.11
> Starting 1 process
> Jobs: 1 (f=1): [r(1)] [100.0% done] [171.2MB/0KB/0KB /s] [43.9K/0/0 iops]
> [eta 00m:00s]
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=892: Tue Jun 9
> 18:34:11 2015
> read : io=5120.0MB, bw=177130KB/s, iops=44282, runt= 29599msec
> slat (usec): min=1, max=527, avg= 3.80, stdev= 3.74
> clat (usec): min=174, max=3841, avg=717.08, stdev=237.53
> lat (usec): min=210, max=3844, avg=721.23, stdev=237.22
> clat percentiles (usec):
> | 1.00th=[ 354], 5.00th=[ 422], 10.00th=[ 462], 20.00th=[ 516],
> | 30.00th=[ 572], 40.00th=[ 628], 50.00th=[ 684], 60.00th=[ 740],
> | 70.00th=[ 804], 80.00th=[ 884], 90.00th=[ 1004], 95.00th=[ 1128],
> | 99.00th=[ 1544], 99.50th=[ 1672], 99.90th=[ 1928], 99.95th=[ 2064],
> | 99.99th=[ 2608]
> bw (KB /s): min=138120, max=230816, per=100.00%, avg=177192.14,
> stdev=23440.79
> lat (usec) : 250=0.01%, 500=16.24%, 750=45.93%, 1000=27.46%
> lat (msec) : 2=10.30%, 4=0.07%
> cpu : usr=10.14%, sys=23.84%, ctx=60938, majf=0, minf=39
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
> READ: io=5120.0MB, aggrb=177130KB/s, minb=177130KB/s, maxb=177130KB/s,
> mint=29599msec, maxt=29599msec
>
> Disk stats (read/write):
> vdb: ios=1303992/0, merge=0/0, ticks=798008/0, in_queue=797636, util=99.80%
>
>
>
> ----- Mail original -----
> De: "Robert LeBlanc" <[email protected]>
> À: "aderumier" <[email protected]>
> Cc: "Mark Nelson" <[email protected]>, "ceph-devel" <
> [email protected]>, "pushpesh sharma" <[email protected]>,
> "ceph-users" <[email protected]>
> Envoyé: Mardi 9 Juin 2015 18:00:29
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> I also saw a similar performance increase by using alternative memory
> allocators. What I found was that Ceph OSDs performed well with either
> tcmalloc or jemalloc (except when RocksDB was built with jemalloc
> instead of tcmalloc, I'm still working to dig into why that might be
> the case).
>
> However, I found that tcmalloc with QEMU/KVM was very detrimental to
> small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much
> better for QEMU/KVM in the tests that we ran. [1]
>
> I'm currently looking into I/O bottlenecks around the 16KB range and
> I'm seeing a lot of time in thread creation and destruction, the
> memory allocators are quite a bit down the list (both fio with
> ioengine rbd and on the OSDs). I wonder what the difference can be.
> I've tried using the async messenger but there wasn't a huge
> difference. [2]
>
> Further down the rabbit hole....
>
> [1] https://www.mail-archive.com/[email protected]/msg20197.html
> [2] https://www.mail-archive.com/[email protected]/msg23982.html
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v0.13.1
> Comment: https://www.mailvelope.com
>
> wsFcBAEBCAAQBQJVdw2ZCRDmVDuy+mK58QAA4MwP/1vt65cvTyyVGGSGRrE8
> unuWjafMHzl486XH+EaVrDVTXFVFOoncJ6kugSpD7yavtCpZNdhsIaTRZguU
> YpfAppNAJU5biSwNv9QPI7kPP2q2+I7Z8ZkvhcVnkjIythoeNnSjV7zJrw87
> afq46GhPHqEXdjp3rOB4RRPniOMnub5oU6QRnKn3HPW8Dx9ZqTeCofRDnCY2
> S695Dt1gzt0ERUOgrUUkt0FQJdkkV6EURcUschngjtEd5727VTLp02HivVl3
> vDYWxQHPK8oS6Xe8GOW0JjulwiqlYotSlrqSU5FMU5gozbk9zMFPIUW1e+51
> 9ART8Ta2ItMhPWtAhRwwvxgy51exCy9kBc+m+ptKW5XRUXOImGcOQxszPGOO
> qIIOG1vVG/GBmo/0i6tliqBFYdXmw1qFV7tFiIbisZRH7Q/1NahjYTHqHhu3
> Dv61T6WrerD+9N6S1Lrz1QYe2Fqa56BHhHSXM82NE86SVxEvUkoGegQU+c7b
> 6rY1JvuJHJzva7+M2XHApYCchCs4a1Yyd1qWB7yThJD57RIyX1TOg0+siV13
> R+v6wxhQU0vBovH+5oAWmCZaPNT+F0Uvs3xWAxxaIR9r83wMj9qQeBZTKVzQ
> 1aFIi15KqAwOp12yWCmrqKTeXhjwYQNd8viCQCGN7AQyPglmzfbuEHalVjz4
> oSJX
> =k281
> -----END PGP SIGNATURE-----
> ----------------
> Robert LeBlanc
> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1
>
>
> On Tue, Jun 9, 2015 at 6:02 AM, Alexandre DERUMIER <[email protected]>
> wrote:
> >>>Frankly, I'm a little impressed that without RBD cache we can hit 80K
> >>>IOPS from 1 VM!
> >
> > Note that theses result are not in a vm (fio-rbd on host), so in a vm
> we'll have overhead.
> > (I'm planning to send results in qemu soon)
> >
> >>>How fast are the SSDs in those 3 OSDs?
> >
> > Theses results are with datas in buffer memory of osd nodes.
> >
> > When reading fulling on ssd (intel s3500),
> >
> > For 1 client,
> >
> > I'm around 33k iops without cache and 32k iops with cache, with 1 osd.
> > I'm around 55k iops without cache and 38k iops with cache, with 3 osd.
> >
> > with multiple clients jobs, I can reach around 70kiops by osd , and 250k
> iops by osd when datas are in buffer.
> >
> > (cpus servers/clients are 2x 10 cores 3,1ghz e5 xeon)
> >
> >
> >
> > small tip :
> > I'm using tcmalloc for fio-rbd or rados bench to improve latencies by
> around 20%
> >
> > LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 fio ...
> > LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 rados bench ...
> >
> > as a lot of time is spent in malloc/free
> >
> >
> > (qemu support also tcmalloc since some months , I'll bench it too
> > https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg05372.html)
> >
> >
> >
> > I'll try to send full bench results soon, from 1 to 18 ssd osd.
> >
> >
> >
> >
> > ----- Mail original -----
> > De: "Mark Nelson" <[email protected]>
> > À: "aderumier" <[email protected]>, "pushpesh sharma" <
> [email protected]>
> > Cc: "ceph-devel" <[email protected]>, "ceph-users" <
> [email protected]>
> > Envoyé: Mardi 9 Juin 2015 13:36:31
> > Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
> >
> > Hi All,
> >
> > In the past we've hit some performance issues with RBD cache that we've
> > fixed, but we've never really tried pushing a single VM beyond 40+K read
> > IOPS in testing (or at least I never have). I suspect there's a couple
> > of possibilities as to why it might be slower, but perhaps joshd can
> > chime in as he's more familiar with what that code looks like.
> >
> > Frankly, I'm a little impressed that without RBD cache we can hit 80K
> > IOPS from 1 VM! How fast are the SSDs in those 3 OSDs?
> >
> > Mark
> >
> > On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote:
> >> It's seem that the limit is mainly going in high queue depth (+- > 16)
> >>
> >> Here the result in iops with 1client- 4krandread- 3osd - with
> differents queue depth size.
> >> rbd_cache is almost the same than without cache with queue depth <16
> >>
> >>
> >> cache
> >> -----
> >> qd1: 1651
> >> qd2: 3482
> >> qd4: 7958
> >> qd8: 17912
> >> qd16: 36020
> >> qd32: 42765
> >> qd64: 46169
> >>
> >> no cache
> >> --------
> >> qd1: 1748
> >> qd2: 3570
> >> qd4: 8356
> >> qd8: 17732
> >> qd16: 41396
> >> qd32: 78633
> >> qd64: 79063
> >> qd128: 79550
> >>
> >>
> >> ----- Mail original -----
> >> De: "aderumier" <[email protected]>
> >> À: "pushpesh sharma" <[email protected]>
> >> Cc: "ceph-devel" <[email protected]>, "ceph-users" <
> [email protected]>
> >> Envoyé: Mardi 9 Juin 2015 09:28:21
> >> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k
> >>
> >> Hi,
> >>
> >>>> We tried adding more RBDs to single VM, but no luck.
> >>
> >> If you want to scale with more disks in a single qemu vm, you need to
> use iothread feature from qemu and assign 1 iothread by disk (works with
> virtio-blk).
> >> It's working for me, I can scale with adding more disks.
> >>
> >>
> >> My bench here are done with fio-rbd on host.
> >> I can scale up to 400k iops with 10clients-rbd_cache=off on a single
> host and around 250kiops 10clients-rbdcache=on.
> >>
> >>
> >> I just wonder why I don't have performance decrease around 30k iops
> with 1osd.
> >>
> >> I'm going to see if this tracker
> >> http://tracker.ceph.com/issues/11056
> >>
> >> could be the cause.
> >>
> >> (My master build was done some week ago)
> >>
> >>
> >>
> >> ----- Mail original -----
> >> De: "pushpesh sharma" <[email protected]>
> >> À: "aderumier" <[email protected]>
> >> Cc: "ceph-devel" <[email protected]>, "ceph-users" <
> [email protected]>
> >> Envoyé: Mardi 9 Juin 2015 09:21:04
> >> Objet: Re: rbd_cache, limiting read on high iops around 40k
> >>
> >> Hi Alexandre,
> >>
> >> We have also seen something very similar on Hammer(0.94-1). We were
> doing some benchmarking for VMs hosted on hypervisor (QEMU-KVM,
> openstack-juno). Each Ubuntu-VM has a RBD as root disk, and 1 RBD as
> additional storage. For some strange reason it was not able to scale 4K- RR
> iops on each VM beyond 35-40k. We tried adding more RBDs to single VM, but
> no luck. However increasing number of VMs to 4 on a single hypervisor did
> scale to some extent. After this there was no much benefit we got from
> adding more VMs.
> >>
> >> Here is the trend we have seen, x-axis is number of hypervisor, each
> hypervisor has 4 VM, each VM has 1 RBD:-
> >>
> >>
> >>
> >>
> >> VDbench is used as benchmarking tool. We were not saturating network
> and CPUs at OSD nodes. We were not able to saturate CPUs at hypervisors,
> and that is where we were suspecting of some throttling effect. However we
> haven't setted any such limits from nova or kvm end. We tried some CPU
> pinning and other KVM related tuning as well, but no luck.
> >>
> >> We tried the same experiment on a bare metal. It was 4K RR IOPs were
> scaling from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling
> beyond that point the numbers were actually degrading. (Single pipe more
> congestion effect)
> >>
> >> We never suspected that rbd cache enable could be detrimental to
> performance. It would nice to route cause the problem if that is the case.
> >>
> >> On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER <
> [email protected] > wrote:
> >>
> >>
> >> Hi,
> >>
> >> I'm doing benchmark (ceph master branch), with randread 4k qdepth=32,
> >> and rbd_cache=true seem to limit the iops around 40k
> >>
> >>
> >> no cache
> >> --------
> >> 1 client - rbd_cache=false - 1osd : 38300 iops
> >> 1 client - rbd_cache=false - 2osd : 69073 iops
> >> 1 client - rbd_cache=false - 3osd : 78292 iops
> >>
> >>
> >> cache
> >> -----
> >> 1 client - rbd_cache=true - 1osd : 38100 iops
> >> 1 client - rbd_cache=true - 2osd : 42457 iops
> >> 1 client - rbd_cache=true - 3osd : 45823 iops
> >>
> >>
> >>
> >> Is it expected ?
> >>
> >>
> >>
> >> fio result rbd_cache=false 3 osd
> >> --------------------------------
> >> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32
> >> fio-2.1.11
> >> Starting 1 process
> >> rbd engine: RBD version: 0.1.9
> >> Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0
> iops] [eta 00m:00s]
> >> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9
> 07:48:42 2015
> >> read : io=10000MB, bw=313169KB/s, iops=78292, runt= 32698msec
> >> slat (usec): min=5, max=530, avg=11.77, stdev= 6.77
> >> clat (usec): min=70, max=2240, avg=336.08, stdev=94.82
> >> lat (usec): min=101, max=2247, avg=347.84, stdev=95.49
> >> clat percentiles (usec):
> >> | 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262],
> >> | 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346],
> >> | 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506],
> >> | 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948],
> >> | 99.99th=[ 1176]
> >> bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34,
> stdev=25196.21
> >> lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23%
> >> lat (msec) : 2=0.03%, 4=0.01%
> >> cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452
> >> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%,
> >=64=0.0%
> >> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> >> complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0%
> >> issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0
> >> latency : target=0, window=0, percentile=100.00%, depth=32
> >>
> >> Run status group 0 (all jobs):
> >> READ: io=10000MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s,
> mint=32698msec, maxt=32698msec
> >>
> >> Disk stats (read/write):
> >> dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%,
> aggrios=0/24, aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
> >> sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00%
> >>
> >>
> >>
> >>
> >> fio result rbd_cache=true 3osd
> >> ------------------------------
> >>
> >> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K,
> ioengine=rbd, iodepth=32
> >> fio-2.1.11
> >> Starting 1 process
> >> rbd engine: RBD version: 0.1.9
> >> Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0
> iops] [eta 00m:00s]
> >> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun 9
> 07:47:30 2015
> >> read : io=10000MB, bw=183296KB/s, iops=45823, runt= 55866msec
> >> slat (usec): min=7, max=805, avg=21.26, stdev=15.84
> >> clat (usec): min=101, max=4602, avg=478.55, stdev=143.73
> >> lat (usec): min=123, max=4669, avg=499.80, stdev=146.03
> >> clat percentiles (usec):
> >> | 1.00th=[ 227], 5.00th=[ 274], 10.00th=[ 306], 20.00th=[ 350],
> >> | 30.00th=[ 390], 40.00th=[ 430], 50.00th=[ 470], 60.00th=[ 506],
> >> | 70.00th=[ 548], 80.00th=[ 596], 90.00th=[ 660], 95.00th=[ 724],
> >> | 99.00th=[ 844], 99.50th=[ 908], 99.90th=[ 1112], 99.95th=[ 1288],
> >> | 99.99th=[ 2192]
> >> bw (KB /s): min=115280, max=204416, per=100.00%, avg=183315.10,
> stdev=15079.93
> >> lat (usec) : 250=2.42%, 500=55.61%, 750=38.48%, 1000=3.28%
> >> lat (msec) : 2=0.19%, 4=0.01%, 10=0.01%
> >> cpu : usr=60.27%, sys=12.01%, ctx=2995393, majf=0, minf=14100
> >> IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=13.5%, 16=81.0%, 32=5.3%,
> >=64=0.0%
> >> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> >> complete : 0=0.0%, 4=95.0%, 8=0.1%, 16=1.0%, 32=4.0%, 64=0.0%, >=64=0.0%
> >> issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0
> >> latency : target=0, window=0, percentile=100.00%, depth=32
> >>
> >> Run status group 0 (all jobs):
> >> READ: io=10000MB, aggrb=183295KB/s, minb=183295KB/s, maxb=183295KB/s,
> mint=55866msec, maxt=55866msec
> >>
> >> Disk stats (read/write):
> >> dm-0: ios=0/61, merge=0/0, ticks=0/8, in_queue=8, util=0.01%,
> aggrios=0/29, aggrmerge=0/32, aggrticks=0/8, aggrin_queue=8, aggrutil=0.01%
> >> sda: ios=0/29, merge=0/32, ticks=0/8, in_queue=8, util=0.01%
> >>
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to