Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Alexandre DERUMIER Tue, 16 Jun 2015 09:39:45 -0700

Hi,

some news about qemu with tcmalloc vs jemmaloc.


I'm testing with multiple disks (with iothreads) in 1 qemu guest.

And if tcmalloc is a little faster than jemmaloc,

I have hit a lot of time the tcmalloc::ThreadCache::ReleaseToCentralCache bug.

increasing TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, don't help.


with multiple disk, I'm around 200k iops with tcmalloc (before hitting the bug) 
and 350kiops with jemmaloc.

The problem is that when I hit malloc bug, I'm around 4000-10000 iops, and only 
way to fix is is to restart qemu ...



----- Mail original -----
De: "pushpesh sharma" <[email protected]>
À: "aderumier" <[email protected]>
Cc: "Somnath Roy" <[email protected]>, "Irek Fasikhov" 
<[email protected]>, "ceph-devel" <[email protected]>, "ceph-users" 
<[email protected]>
Envoyé: Vendredi 12 Juin 2015 08:58:21
Objet: Re: rbd_cache, limiting read on high iops around 40k

Thanks, posted the question in openstack list. Hopefully will get some 
expert opinion. 

On Fri, Jun 12, 2015 at 11:33 AM, Alexandre DERUMIER 
<[email protected]> wrote: 
> Hi, 
> 
> here a libvirt xml sample from libvirt src 
> 
> (you need to define <iothreads> number, then assign then in disks). 
> 
> I don't use openstack, so I really don't known how it's working with it. 
> 
> 
> <domain type='qemu'> 
> <name>QEMUGuest1</name> 
> <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> 
> <memory unit='KiB'>219136</memory> 
> <currentMemory unit='KiB'>219136</currentMemory> 
> <vcpu placement='static'>2</vcpu> 
> <iothreads>2</iothreads> 
> <os> 
> <type arch='i686' machine='pc'>hvm</type> 
> <boot dev='hd'/> 
> </os> 
> <clock offset='utc'/> 
> <on_poweroff>destroy</on_poweroff> 
> <on_reboot>restart</on_reboot> 
> <on_crash>destroy</on_crash> 
> <devices> 
> <emulator>/usr/bin/qemu</emulator> 
> <disk type='file' device='disk'> 
> <driver name='qemu' type='raw' iothread='1'/> 
> <source file='/var/lib/libvirt/images/iothrtest1.img'/> 
> <target dev='vdb' bus='virtio'/> 
> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> 
> </disk> 
> <disk type='file' device='disk'> 
> <driver name='qemu' type='raw' iothread='2'/> 
> <source file='/var/lib/libvirt/images/iothrtest2.img'/> 
> <target dev='vdc' bus='virtio'/> 
> </disk> 
> <controller type='usb' index='0'/> 
> <controller type='ide' index='0'/> 
> <controller type='pci' index='0' model='pci-root'/> 
> <memballoon model='none'/> 
> </devices> 
> </domain> 
> 
> 
> ----- Mail original ----- 
> De: "pushpesh sharma" <[email protected]> 
> À: "aderumier" <[email protected]> 
> Cc: "Somnath Roy" <[email protected]>, "Irek Fasikhov" 
> <[email protected]>, "ceph-devel" <[email protected]>, "ceph-users" 
> <[email protected]> 
> Envoyé: Vendredi 12 Juin 2015 07:52:41 
> Objet: Re: rbd_cache, limiting read on high iops around 40k 
> 
> Hi Alexandre, 
> 
> I agree with your rational, of one iothread per disk. CPU consumed in 
> IOwait is pretty high in each VM. But I am not finding a way to set 
> the same on a nova instance. I am using openstack Juno with QEMU+KVM. 
> As per libvirt documentation for setting iothreads, I can edit 
> domain.xml directly and achieve the same effect. However in as in 
> openstack env domain xml is created by nova with some additional 
> metadata, so editing the domain xml using 'virsh edit' does not seems 
> to work(I agree, it is not a very cloud way of doing things, but a 
> hack). Changes made there vanish after saving them, due to reason 
> libvirt validation fails on the same. 
> 
> #virsh dumpxml instance-000000c5 > vm.xml 
> #virt-xml-validate vm.xml 
> Relax-NG validity error : Extra element cpu in interleave 
> vm.xml:1: element domain: Relax-NG validity error : Element domain 
> failed to validate content 
> vm.xml fails to validate 
> 
> Second approach I took was to setting QoS in volumes types. But there 
> is no option to set iothreads per volume, there are parameter realted 
> to max_read/wrirte ops/bytes. 
> 
> Thirdly, editing Nova flavor and proving extra specs like 
> hw:cpu_socket/thread/core, can change guest CPU topology however again 
> no way to set iothread. It does accept hw_disk_iothreads(no type check 
> in place, i believe ), but can not pass the same in domain.xml. 
> 
> Could you suggest me a way to set the same. 
> 
> -Pushpesh 
> 
> On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER 
> <[email protected]> wrote: 
>>>>I need to try out the performance on qemu soon and may come back to you if 
>>>>I need some qemu setting trick :-) 
>> 
>> Sure no problem. 
>> 
>> (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
>> iothread by disk) 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Somnath Roy" <[email protected]> 
>> À: "aderumier" <[email protected]>, "Irek Fasikhov" <[email protected]> 
>> Cc: "ceph-devel" <[email protected]>, "pushpesh sharma" 
>> <[email protected]>, "ceph-users" <[email protected]> 
>> Envoyé: Mercredi 10 Juin 2015 09:06:32 
>> Objet: RE: rbd_cache, limiting read on high iops around 40k 
>> 
>> Hi Alexandre, 
>> Thanks for sharing the data. 
>> I need to try out the performance on qemu soon and may come back to you if I 
>> need some qemu setting trick :-) 
>> 
>> Regards 
>> Somnath 
>> 
>> -----Original Message----- 
>> From: ceph-users [mailto:[email protected]] On Behalf Of 
>> Alexandre DERUMIER 
>> Sent: Tuesday, June 09, 2015 10:42 PM 
>> To: Irek Fasikhov 
>> Cc: ceph-devel; pushpesh sharma; ceph-users 
>> Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>> 
>>>>Very good work! 
>>>>Do you have a rpm-file? 
>>>>Thanks. 
>> no sorry, I'm have compiled it manually (and I'm using debian jessie as 
>> client) 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Irek Fasikhov" <[email protected]> 
>> À: "aderumier" <[email protected]> 
>> Cc: "Robert LeBlanc" <[email protected]>, "ceph-devel" 
>> <[email protected]>, "pushpesh sharma" <[email protected]>, 
>> "ceph-users" <[email protected]> 
>> Envoyé: Mercredi 10 Juin 2015 07:21:42 
>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>> 
>> Hi, Alexandre. 
>> 
>> Very good work! 
>> Do you have a rpm-file? 
>> Thanks. 
>> 
>> 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER < [email protected] > : 
>> 
>> 
>> Hi, 
>> 
>> I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
>> iothread: 50k iops (+45%) ! 
>> 
>> 
>> 
>> qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc 
>> (2.2.1) : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) 
>> qemu : no-iothread : tcmalloc (2.4) : iops=35974 (+7%) 
>> 
>> 
>> qemu : iothread : glibc : iops=34516 
>> qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : 
>> iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
>> 
>> 
>> 
>> 
>> 
>> qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%) 
>> ------------------------------------------------------ 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=libaio, iodepth=32 
>> fio-2.1.11 
>> Starting 1 process 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] 
>> [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 
>> 05:54:24 2015 read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec 
>> slat (usec): min=1, max=1136, avg= 3.54, stdev= 3.58 clat (usec): min=128, 
>> max=6262, avg=631.41, stdev=197.71 lat (usec): min=149, max=6265, 
>> avg=635.27, stdev=197.40 clat percentiles (usec): 
>> | 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474], 
>> | 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652], 
>> | 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980], 
>> | 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896], 
>> | 99.99th=[ 3760] 
>> bw (KB /s): min=145608, max=249688, per=100.00%, avg=201108.00, 
>> stdev=21718.87 lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63% 
>> lat (msec) : 2=4.46%, 4=0.03%, 10=0.01% cpu : usr=9.73%, sys=24.93%, 
>> ctx=66417, majf=0, minf=38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 
>> 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 
>> 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 
>> 32=0.1%, 64=0.0%, >=64=0.0% issued : total=r=1310720/w=0/d=0, 
>> short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=5120.0MB, aggrb=201107KB/s, minb=201107KB/s, maxb=201107KB/s, 
>> mint=26070msec, maxt=26070msec 
>> 
>> Disk stats (read/write): 
>> vdb: ios=1302555/0, merge=0/0, ticks=715176/0, in_queue=714840, util=99.73% 
>> 
>> 
>> 
>> 
>> 
>> 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=libaio, iodepth=32 
>> fio-2.1.11 
>> Starting 1 process 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [158.7MB/0KB/0KB /s] [40.6K/0/0 iops] 
>> [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=889: Wed Jun 10 
>> 06:05:06 2015 read : io=5120.0MB, bw=143897KB/s, iops=35974, runt= 36435msec 
>> slat (usec): min=1, max=710, avg= 3.31, stdev= 3.35 clat (usec): min=191, 
>> max=4740, avg=884.66, stdev=315.65 lat (usec): min=289, max=4743, 
>> avg=888.31, stdev=315.51 clat percentiles (usec): 
>> | 1.00th=[ 462], 5.00th=[ 516], 10.00th=[ 548], 20.00th=[ 596], 
>> | 30.00th=[ 652], 40.00th=[ 764], 50.00th=[ 868], 60.00th=[ 940], 
>> | 70.00th=[ 1004], 80.00th=[ 1096], 90.00th=[ 1256], 95.00th=[ 1416], 
>> | 99.00th=[ 2024], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2640], 
>> | 99.99th=[ 3632] 
>> bw (KB /s): min=98352, max=177328, per=99.91%, avg=143772.11, stdev=21782.39 
>> lat (usec) : 250=0.01%, 500=3.48%, 750=35.69%, 1000=30.01% lat (msec) : 
>> 2=29.74%, 4=1.07%, 10=0.01% cpu : usr=7.10%, sys=16.90%, ctx=54855, majf=0, 
>> minf=38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, 
>> >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, 
>> >=64=0.0% issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 latency : 
>> target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=5120.0MB, aggrb=143896KB/s, minb=143896KB/s, maxb=143896KB/s, 
>> mint=36435msec, maxt=36435msec 
>> 
>> Disk stats (read/write): 
>> vdb: ios=1301357/0, merge=0/0, ticks=1033036/0, in_queue=1032716, 
>> util=99.85% 
>> 
>> 
>> ----- Mail original ----- 
>> De: "aderumier" < [email protected] > 
>> À: "Robert LeBlanc" < [email protected] > 
>> Cc: "Mark Nelson" < [email protected] >, "ceph-devel" < 
>> [email protected] >, "pushpesh sharma" < [email protected] >, 
>> "ceph-users" < [email protected] > 
>> Envoyé: Mardi 9 Juin 2015 18:47:27 
>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>> 
>> Hi Robert, 
>> 
>>>>What I found was that Ceph OSDs performed well with either tcmalloc or 
>>>>jemalloc (except when RocksDB was built with jemalloc instead of 
>>>>tcmalloc, I'm still working to dig into why that might be the case). 
>> yes,from my test, for osd tcmalloc is a little faster (but very little) than 
>> jemalloc. 
>> 
>> 
>> 
>>>>However, I found that tcmalloc with QEMU/KVM was very detrimental to 
>>>>small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much 
>>>>better for QEMU/KVM in the tests that we ran. [1] 
>> 
>> 
>> Just have done qemu test (4k randread - rbd_cache=off), I don't see speed 
>> regression with tcmalloc. 
>> with qemu iothread, tcmalloc have a speed increase over glib 
>> with qemu iothread, jemalloc have a speed decrease 
>> 
>> without iothread, jemalloc have a big speed increase 
>> 
>> this is with 
>> -qemu 2.3 
>> -tcmalloc 2.2.1 
>> -jemmaloc 3.6 
>> -libc6 2.19 
>> 
>> 
>> qemu : no iothread : glibc : iops=33395 
>> qemu : no-iothread : tcmalloc : iops=34516 (+3%) 
>> qemu : no-iothread : jemmaloc : iops=42226 (+26%) 
>> 
>> qemu : iothread : glibc : iops=34516 
>> qemu : iothread : tcmalloc : iops=38676 (+12%) 
>> qemu : iothread : jemmaloc : iops=28023 (-19%) 
>> 
>> 
>> (The benefit of iothreads is that we can scale with more disks in 1vm) 
>> 
>> 
>> fio results: 
>> ------------ 
>> 
>> qemu : iothread : tcmalloc : iops=38676 
>> ----------------------------------------- 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=libaio, iodepth=32 
>> fio-2.1.11 
>> Starting 1 process 
>> Jobs: 1 (f=0): [r(1)] [100.0% done] [123.5MB/0KB/0KB /s] [31.6K/0/0 iops] 
>> [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=1265: Tue Jun 9 
>> 18:16:53 2015 
>> read : io=5120.0MB, bw=154707KB/s, iops=38676, runt= 33889msec 
>> slat (usec): min=1, max=715, avg= 3.63, stdev= 3.42 
>> clat (usec): min=152, max=5736, avg=822.12, stdev=289.34 
>> lat (usec): min=231, max=5740, avg=826.10, stdev=289.08 
>> clat percentiles (usec): 
>> | 1.00th=[ 402], 5.00th=[ 466], 10.00th=[ 510], 20.00th=[ 572], 
>> | 30.00th=[ 636], 40.00th=[ 716], 50.00th=[ 780], 60.00th=[ 852], 
>> | 70.00th=[ 932], 80.00th=[ 1020], 90.00th=[ 1160], 95.00th=[ 1352], 
>> | 99.00th=[ 1800], 99.50th=[ 1944], 99.90th=[ 2256], 99.95th=[ 2448], 
>> | 99.99th=[ 3888] 
>> bw (KB /s): min=123888, max=198584, per=100.00%, avg=154824.40, 
>> stdev=16978.03 
>> lat (usec) : 250=0.01%, 500=8.91%, 750=36.44%, 1000=32.63% 
>> lat (msec) : 2=21.65%, 4=0.37%, 10=0.01% 
>> cpu : usr=8.29%, sys=19.76%, ctx=55882, majf=0, minf=39 
>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
>> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=5120.0MB, aggrb=154707KB/s, minb=154707KB/s, maxb=154707KB/s, 
>> mint=33889msec, maxt=33889msec 
>> 
>> Disk stats (read/write): 
>> vdb: ios=1302739/0, merge=0/0, ticks=934444/0, in_queue=934096, util=99.77% 
>> 
>> 
>> 
>> qemu : no-iothread : tcmalloc : iops=34516 
>> --------------------------------------------- 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [163.2MB/0KB/0KB /s] [41.8K/0/0 iops] 
>> [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=896: Tue Jun 9 18:19:08 
>> 2015 
>> read : io=5120.0MB, bw=138065KB/s, iops=34516, runt= 37974msec 
>> slat (usec): min=1, max=708, avg= 3.98, stdev= 3.57 
>> clat (usec): min=208, max=11858, avg=921.43, stdev=333.61 
>> lat (usec): min=266, max=11862, avg=925.77, stdev=333.40 
>> clat percentiles (usec): 
>> | 1.00th=[ 434], 5.00th=[ 510], 10.00th=[ 564], 20.00th=[ 652], 
>> | 30.00th=[ 732], 40.00th=[ 812], 50.00th=[ 876], 60.00th=[ 940], 
>> | 70.00th=[ 1020], 80.00th=[ 1112], 90.00th=[ 1320], 95.00th=[ 1576], 
>> | 99.00th=[ 1992], 99.50th=[ 2128], 99.90th=[ 2736], 99.95th=[ 3248], 
>> | 99.99th=[ 4320] 
>> bw (KB /s): min=77312, max=185576, per=99.74%, avg=137709.88, stdev=16883.77 
>> lat (usec) : 250=0.01%, 500=4.36%, 750=27.61%, 1000=35.60% 
>> lat (msec) : 2=31.49%, 4=0.92%, 10=0.02%, 20=0.01% 
>> cpu : usr=7.19%, sys=19.52%, ctx=55903, majf=0, minf=38 
>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
>> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=5120.0MB, aggrb=138064KB/s, minb=138064KB/s, maxb=138064KB/s, 
>> mint=37974msec, maxt=37974msec 
>> 
>> Disk stats (read/write): 
>> vdb: ios=1309902/0, merge=0/0, ticks=1068768/0, in_queue=1068396, 
>> util=99.86% 
>> 
>> 
>> 
>> qemu : iothread : glibc : iops=34516 
>> ------------------------------------- 
>> 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=libaio, iodepth=32 
>> fio-2.1.11 
>> Starting 1 process 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [133.4MB/0KB/0KB /s] [34.2K/0/0 iops] 
>> [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=876: Tue Jun 9 18:24:01 
>> 2015 
>> read : io=5120.0MB, bw=137786KB/s, iops=34446, runt= 38051msec 
>> slat (usec): min=1, max=496, avg= 3.88, stdev= 3.66 
>> clat (usec): min=283, max=7515, avg=923.34, stdev=300.28 
>> lat (usec): min=286, max=7519, avg=927.58, stdev=300.02 
>> clat percentiles (usec): 
>> | 1.00th=[ 506], 5.00th=[ 564], 10.00th=[ 596], 20.00th=[ 652], 
>> | 30.00th=[ 724], 40.00th=[ 804], 50.00th=[ 884], 60.00th=[ 964], 
>> | 70.00th=[ 1048], 80.00th=[ 1144], 90.00th=[ 1304], 95.00th=[ 1448], 
>> | 99.00th=[ 1896], 99.50th=[ 2096], 99.90th=[ 2480], 99.95th=[ 2640], 
>> | 99.99th=[ 3984] 
>> bw (KB /s): min=102680, max=171112, per=100.00%, avg=137877.78, 
>> stdev=15521.30 
>> lat (usec) : 500=0.84%, 750=32.97%, 1000=30.82% 
>> lat (msec) : 2=34.65%, 4=0.71%, 10=0.01% 
>> cpu : usr=7.42%, sys=19.47%, ctx=52455, majf=0, minf=38 
>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
>> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=5120.0MB, aggrb=137785KB/s, minb=137785KB/s, maxb=137785KB/s, 
>> mint=38051msec, maxt=38051msec 
>> 
>> Disk stats (read/write): 
>> vdb: ios=1307426/0, merge=0/0, ticks=1051416/0, in_queue=1050972, 
>> util=99.85% 
>> 
>> 
>> 
>> qemu : no iothread : glibc : iops=33395 
>> ----------------------------------------- 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=libaio, iodepth=32 
>> fio-2.1.11 
>> Starting 1 process 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [125.4MB/0KB/0KB /s] [32.9K/0/0 iops] 
>> [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=886: Tue Jun 9 18:27:18 
>> 2015 
>> read : io=5120.0MB, bw=133583KB/s, iops=33395, runt= 39248msec 
>> slat (usec): min=1, max=1054, avg= 3.86, stdev= 4.29 
>> clat (usec): min=139, max=12635, avg=952.85, stdev=335.51 
>> lat (usec): min=303, max=12638, avg=957.01, stdev=335.29 
>> clat percentiles (usec): 
>> | 1.00th=[ 516], 5.00th=[ 564], 10.00th=[ 596], 20.00th=[ 652], 
>> | 30.00th=[ 724], 40.00th=[ 820], 50.00th=[ 924], 60.00th=[ 996], 
>> | 70.00th=[ 1080], 80.00th=[ 1176], 90.00th=[ 1336], 95.00th=[ 1528], 
>> | 99.00th=[ 2096], 99.50th=[ 2320], 99.90th=[ 2672], 99.95th=[ 2928], 
>> | 99.99th=[ 4832] 
>> bw (KB /s): min=98136, max=171624, per=100.00%, avg=133682.64, 
>> stdev=19121.91 
>> lat (usec) : 250=0.01%, 500=0.57%, 750=32.57%, 1000=26.98% 
>> lat (msec) : 2=38.59%, 4=1.28%, 10=0.01%, 20=0.01% 
>> cpu : usr=9.24%, sys=15.92%, ctx=51219, majf=0, minf=38 
>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
>> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=5120.0MB, aggrb=133583KB/s, minb=133583KB/s, maxb=133583KB/s, 
>> mint=39248msec, maxt=39248msec 
>> 
>> Disk stats (read/write): 
>> vdb: ios=1304526/0, merge=0/0, ticks=1075020/0, in_queue=1074536, 
>> util=99.84% 
>> 
>> 
>> 
>> qemu : iothread : jemmaloc : iops=28023 
>> ---------------------------------------- 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=libaio, iodepth=32 
>> fio-2.1.11 
>> Starting 1 process 
>> Jobs: 1 (f=1): [r(1)] [97.9% done] [155.2MB/0KB/0KB /s] [39.1K/0/0 iops] 
>> [eta 00m:01s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=899: Tue Jun 9 18:30:26 
>> 2015 
>> read : io=5120.0MB, bw=112094KB/s, iops=28023, runt= 46772msec 
>> slat (usec): min=1, max=467, avg= 4.33, stdev= 4.77 
>> clat (usec): min=253, max=11307, avg=1135.63, stdev=346.55 
>> lat (usec): min=256, max=11309, avg=1140.39, stdev=346.22 
>> clat percentiles (usec): 
>> | 1.00th=[ 510], 5.00th=[ 628], 10.00th=[ 700], 20.00th=[ 820], 
>> | 30.00th=[ 924], 40.00th=[ 1032], 50.00th=[ 1128], 60.00th=[ 1224], 
>> | 70.00th=[ 1320], 80.00th=[ 1416], 90.00th=[ 1560], 95.00th=[ 1688], 
>> | 99.00th=[ 2096], 99.50th=[ 2224], 99.90th=[ 2544], 99.95th=[ 2832], 
>> | 99.99th=[ 3760] 
>> bw (KB /s): min=91792, max=174416, per=99.90%, avg=111985.27, stdev=17381.70 
>> lat (usec) : 500=0.80%, 750=13.10%, 1000=23.33% 
>> lat (msec) : 2=61.30%, 4=1.46%, 10=0.01%, 20=0.01% 
>> cpu : usr=7.12%, sys=17.43%, ctx=54507, majf=0, minf=38 
>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
>> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=5120.0MB, aggrb=112094KB/s, minb=112094KB/s, maxb=112094KB/s, 
>> mint=46772msec, maxt=46772msec 
>> 
>> Disk stats (read/write): 
>> vdb: ios=1309169/0, merge=0/0, ticks=1305796/0, in_queue=1305376, 
>> util=98.68% 
>> 
>> 
>> 
>> qemu : non-iothread : jemmaloc : iops=42226 
>> -------------------------------------------- 
>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>> ioengine=libaio, iodepth=32 
>> fio-2.1.11 
>> Starting 1 process 
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [171.2MB/0KB/0KB /s] [43.9K/0/0 iops] 
>> [eta 00m:00s] 
>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=892: Tue Jun 9 18:34:11 
>> 2015 
>> read : io=5120.0MB, bw=177130KB/s, iops=44282, runt= 29599msec 
>> slat (usec): min=1, max=527, avg= 3.80, stdev= 3.74 
>> clat (usec): min=174, max=3841, avg=717.08, stdev=237.53 
>> lat (usec): min=210, max=3844, avg=721.23, stdev=237.22 
>> clat percentiles (usec): 
>> | 1.00th=[ 354], 5.00th=[ 422], 10.00th=[ 462], 20.00th=[ 516], 
>> | 30.00th=[ 572], 40.00th=[ 628], 50.00th=[ 684], 60.00th=[ 740], 
>> | 70.00th=[ 804], 80.00th=[ 884], 90.00th=[ 1004], 95.00th=[ 1128], 
>> | 99.00th=[ 1544], 99.50th=[ 1672], 99.90th=[ 1928], 99.95th=[ 2064], 
>> | 99.99th=[ 2608] 
>> bw (KB /s): min=138120, max=230816, per=100.00%, avg=177192.14, 
>> stdev=23440.79 
>> lat (usec) : 250=0.01%, 500=16.24%, 750=45.93%, 1000=27.46% 
>> lat (msec) : 2=10.30%, 4=0.07% 
>> cpu : usr=10.14%, sys=23.84%, ctx=60938, majf=0, minf=39 
>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% 
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% 
>> issued : total=r=1310720/w=0/d=0, short=r=0/w=0/d=0 
>> latency : target=0, window=0, percentile=100.00%, depth=32 
>> 
>> Run status group 0 (all jobs): 
>> READ: io=5120.0MB, aggrb=177130KB/s, minb=177130KB/s, maxb=177130KB/s, 
>> mint=29599msec, maxt=29599msec 
>> 
>> Disk stats (read/write): 
>> vdb: ios=1303992/0, merge=0/0, ticks=798008/0, in_queue=797636, util=99.80% 
>> 
>> 
>> 
>> ----- Mail original ----- 
>> De: "Robert LeBlanc" < [email protected] > 
>> À: "aderumier" < [email protected] > 
>> Cc: "Mark Nelson" < [email protected] >, "ceph-devel" < 
>> [email protected] >, "pushpesh sharma" < [email protected] >, 
>> "ceph-users" < [email protected] > 
>> Envoyé: Mardi 9 Juin 2015 18:00:29 
>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>> 
>> -----BEGIN PGP SIGNED MESSAGE----- 
>> Hash: SHA256 
>> 
>> I also saw a similar performance increase by using alternative memory 
>> allocators. What I found was that Ceph OSDs performed well with either 
>> tcmalloc or jemalloc (except when RocksDB was built with jemalloc 
>> instead of tcmalloc, I'm still working to dig into why that might be 
>> the case). 
>> 
>> However, I found that tcmalloc with QEMU/KVM was very detrimental to 
>> small I/O, but provided huge gains in I/O >=1MB. Jemalloc was much 
>> better for QEMU/KVM in the tests that we ran. [1] 
>> 
>> I'm currently looking into I/O bottlenecks around the 16KB range and 
>> I'm seeing a lot of time in thread creation and destruction, the 
>> memory allocators are quite a bit down the list (both fio with 
>> ioengine rbd and on the OSDs). I wonder what the difference can be. 
>> I've tried using the async messenger but there wasn't a huge 
>> difference. [2] 
>> 
>> Further down the rabbit hole.... 
>> 
>> [1] https://www.mail-archive.com/[email protected]/msg20197.html 
>> [2] https://www.mail-archive.com/[email protected]/msg23982.html 
>> -----BEGIN PGP SIGNATURE----- 
>> Version: Mailvelope v0.13.1 
>> Comment: https://www.mailvelope.com 
>> 
>> wsFcBAEBCAAQBQJVdw2ZCRDmVDuy+mK58QAA4MwP/1vt65cvTyyVGGSGRrE8 
>> unuWjafMHzl486XH+EaVrDVTXFVFOoncJ6kugSpD7yavtCpZNdhsIaTRZguU 
>> YpfAppNAJU5biSwNv9QPI7kPP2q2+I7Z8ZkvhcVnkjIythoeNnSjV7zJrw87 
>> afq46GhPHqEXdjp3rOB4RRPniOMnub5oU6QRnKn3HPW8Dx9ZqTeCofRDnCY2 
>> S695Dt1gzt0ERUOgrUUkt0FQJdkkV6EURcUschngjtEd5727VTLp02HivVl3 
>> vDYWxQHPK8oS6Xe8GOW0JjulwiqlYotSlrqSU5FMU5gozbk9zMFPIUW1e+51 
>> 9ART8Ta2ItMhPWtAhRwwvxgy51exCy9kBc+m+ptKW5XRUXOImGcOQxszPGOO 
>> qIIOG1vVG/GBmo/0i6tliqBFYdXmw1qFV7tFiIbisZRH7Q/1NahjYTHqHhu3 
>> Dv61T6WrerD+9N6S1Lrz1QYe2Fqa56BHhHSXM82NE86SVxEvUkoGegQU+c7b 
>> 6rY1JvuJHJzva7+M2XHApYCchCs4a1Yyd1qWB7yThJD57RIyX1TOg0+siV13 
>> R+v6wxhQU0vBovH+5oAWmCZaPNT+F0Uvs3xWAxxaIR9r83wMj9qQeBZTKVzQ 
>> 1aFIi15KqAwOp12yWCmrqKTeXhjwYQNd8viCQCGN7AQyPglmzfbuEHalVjz4 
>> oSJX 
>> =k281 
>> -----END PGP SIGNATURE----- 
>> ---------------- 
>> Robert LeBlanc 
>> GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 
>> 
>> 
>> On Tue, Jun 9, 2015 at 6:02 AM, Alexandre DERUMIER < [email protected] > 
>> wrote: 
>>>>>Frankly, I'm a little impressed that without RBD cache we can hit 80K 
>>>>>IOPS from 1 VM! 
>>> 
>>> Note that theses result are not in a vm (fio-rbd on host), so in a vm we'll 
>>> have overhead. 
>>> (I'm planning to send results in qemu soon) 
>>> 
>>>>>How fast are the SSDs in those 3 OSDs? 
>>> 
>>> Theses results are with datas in buffer memory of osd nodes. 
>>> 
>>> When reading fulling on ssd (intel s3500), 
>>> 
>>> For 1 client, 
>>> 
>>> I'm around 33k iops without cache and 32k iops with cache, with 1 osd. 
>>> I'm around 55k iops without cache and 38k iops with cache, with 3 osd. 
>>> 
>>> with multiple clients jobs, I can reach around 70kiops by osd , and 250k 
>>> iops by osd when datas are in buffer. 
>>> 
>>> (cpus servers/clients are 2x 10 cores 3,1ghz e5 xeon) 
>>> 
>>> 
>>> 
>>> small tip : 
>>> I'm using tcmalloc for fio-rbd or rados bench to improve latencies by 
>>> around 20% 
>>> 
>>> LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 fio ... 
>>> LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 rados bench ... 
>>> 
>>> as a lot of time is spent in malloc/free 
>>> 
>>> 
>>> (qemu support also tcmalloc since some months , I'll bench it too 
>>> https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg05372.html ) 
>>> 
>>> 
>>> 
>>> I'll try to send full bench results soon, from 1 to 18 ssd osd. 
>>> 
>>> 
>>> 
>>> 
>>> ----- Mail original ----- 
>>> De: "Mark Nelson" < [email protected] > 
>>> À: "aderumier" < [email protected] >, "pushpesh sharma" < 
>>> [email protected] > 
>>> Cc: "ceph-devel" < [email protected] >, "ceph-users" < 
>>> [email protected] > 
>>> Envoyé: Mardi 9 Juin 2015 13:36:31 
>>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>>> 
>>> Hi All, 
>>> 
>>> In the past we've hit some performance issues with RBD cache that we've 
>>> fixed, but we've never really tried pushing a single VM beyond 40+K read 
>>> IOPS in testing (or at least I never have). I suspect there's a couple 
>>> of possibilities as to why it might be slower, but perhaps joshd can 
>>> chime in as he's more familiar with what that code looks like. 
>>> 
>>> Frankly, I'm a little impressed that without RBD cache we can hit 80K 
>>> IOPS from 1 VM! How fast are the SSDs in those 3 OSDs? 
>>> 
>>> Mark 
>>> 
>>> On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote: 
>>>> It's seem that the limit is mainly going in high queue depth (+- > 16) 
>>>> 
>>>> Here the result in iops with 1client- 4krandread- 3osd - with differents 
>>>> queue depth size. 
>>>> rbd_cache is almost the same than without cache with queue depth <16 
>>>> 
>>>> 
>>>> cache 
>>>> ----- 
>>>> qd1: 1651 
>>>> qd2: 3482 
>>>> qd4: 7958 
>>>> qd8: 17912 
>>>> qd16: 36020 
>>>> qd32: 42765 
>>>> qd64: 46169 
>>>> 
>>>> no cache 
>>>> -------- 
>>>> qd1: 1748 
>>>> qd2: 3570 
>>>> qd4: 8356 
>>>> qd8: 17732 
>>>> qd16: 41396 
>>>> qd32: 78633 
>>>> qd64: 79063 
>>>> qd128: 79550 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "aderumier" < [email protected] > 
>>>> À: "pushpesh sharma" < [email protected] > 
>>>> Cc: "ceph-devel" < [email protected] >, "ceph-users" < 
>>>> [email protected] > 
>>>> Envoyé: Mardi 9 Juin 2015 09:28:21 
>>>> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
>>>> 
>>>> Hi, 
>>>> 
>>>>>> We tried adding more RBDs to single VM, but no luck. 
>>>> 
>>>> If you want to scale with more disks in a single qemu vm, you need to use 
>>>> iothread feature from qemu and assign 1 iothread by disk (works with 
>>>> virtio-blk). 
>>>> It's working for me, I can scale with adding more disks. 
>>>> 
>>>> 
>>>> My bench here are done with fio-rbd on host. 
>>>> I can scale up to 400k iops with 10clients-rbd_cache=off on a single host 
>>>> and around 250kiops 10clients-rbdcache=on. 
>>>> 
>>>> 
>>>> I just wonder why I don't have performance decrease around 30k iops with 
>>>> 1osd. 
>>>> 
>>>> I'm going to see if this tracker 
>>>> http://tracker.ceph.com/issues/11056 
>>>> 
>>>> could be the cause. 
>>>> 
>>>> (My master build was done some week ago) 
>>>> 
>>>> 
>>>> 
>>>> ----- Mail original ----- 
>>>> De: "pushpesh sharma" < [email protected] > 
>>>> À: "aderumier" < [email protected] > 
>>>> Cc: "ceph-devel" < [email protected] >, "ceph-users" < 
>>>> [email protected] > 
>>>> Envoyé: Mardi 9 Juin 2015 09:21:04 
>>>> Objet: Re: rbd_cache, limiting read on high iops around 40k 
>>>> 
>>>> Hi Alexandre, 
>>>> 
>>>> We have also seen something very similar on Hammer(0.94-1). We were doing 
>>>> some benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). 
>>>> Each Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. 
>>>> For some strange reason it was not able to scale 4K- RR iops on each VM 
>>>> beyond 35-40k. We tried adding more RBDs to single VM, but no luck. 
>>>> However increasing number of VMs to 4 on a single hypervisor did scale to 
>>>> some extent. After this there was no much benefit we got from adding more 
>>>> VMs. 
>>>> 
>>>> Here is the trend we have seen, x-axis is number of hypervisor, each 
>>>> hypervisor has 4 VM, each VM has 1 RBD:- 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> VDbench is used as benchmarking tool. We were not saturating network and 
>>>> CPUs at OSD nodes. We were not able to saturate CPUs at hypervisors, and 
>>>> that is where we were suspecting of some throttling effect. However we 
>>>> haven't setted any such limits from nova or kvm end. We tried some CPU 
>>>> pinning and other KVM related tuning as well, but no luck. 
>>>> 
>>>> We tried the same experiment on a bare metal. It was 4K RR IOPs were 
>>>> scaling from 40K(1 RBD) to 180K(4 RBDs). But after that rather than 
>>>> scaling beyond that point the numbers were actually degrading. (Single 
>>>> pipe more congestion effect) 
>>>> 
>>>> We never suspected that rbd cache enable could be detrimental to 
>>>> performance. It would nice to route cause the problem if that is the case. 
>>>> 
>>>> On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < [email protected] 
>>>> > wrote: 
>>>> 
>>>> 
>>>> Hi, 
>>>> 
>>>> I'm doing benchmark (ceph master branch), with randread 4k qdepth=32, 
>>>> and rbd_cache=true seem to limit the iops around 40k 
>>>> 
>>>> 
>>>> no cache 
>>>> -------- 
>>>> 1 client - rbd_cache=false - 1osd : 38300 iops 
>>>> 1 client - rbd_cache=false - 2osd : 69073 iops 
>>>> 1 client - rbd_cache=false - 3osd : 78292 iops 
>>>> 
>>>> 
>>>> cache 
>>>> ----- 
>>>> 1 client - rbd_cache=true - 1osd : 38100 iops 
>>>> 1 client - rbd_cache=true - 2osd : 42457 iops 
>>>> 1 client - rbd_cache=true - 3osd : 45823 iops 
>>>> 
>>>> 
>>>> 
>>>> Is it expected ? 
>>>> 
>>>> 
>>>> 
>>>> fio result rbd_cache=false 3 osd 
>>>> -------------------------------- 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 
>>>> fio-2.1.11 
>>>> Starting 1 process 
>>>> rbd engine: RBD version: 0.1.9 
>>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] 
>>>> [eta 00m:00s] 
>>>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 
>>>> 07:48:42 2015 
>>>> read : io=10000MB, bw=313169KB/s, iops=78292, runt= 32698msec 
>>>> slat (usec): min=5, max=530, avg=11.77, stdev= 6.77 
>>>> clat (usec): min=70, max=2240, avg=336.08, stdev=94.82 
>>>> lat (usec): min=101, max=2247, avg=347.84, stdev=95.49 
>>>> clat percentiles (usec): 
>>>> | 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262], 
>>>> | 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346], 
>>>> | 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506], 
>>>> | 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948], 
>>>> | 99.99th=[ 1176] 
>>>> bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34, 
>>>> stdev=25196.21 
>>>> lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23% 
>>>> lat (msec) : 2=0.03%, 4=0.01% 
>>>> cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452 
>>>> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0% 
>>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>> complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0% 
>>>> issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 
>>>> latency : target=0, window=0, percentile=100.00%, depth=32 
>>>> 
>>>> Run status group 0 (all jobs): 
>>>> READ: io=10000MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s, 
>>>> mint=32698msec, maxt=32698msec 
>>>> 
>>>> Disk stats (read/write): 
>>>> dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, 
>>>> aggrios=0/24, aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, 
>>>> aggrutil=0.00% 
>>>> sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00% 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> fio result rbd_cache=true 3osd 
>>>> ------------------------------ 
>>>> 
>>>> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
>>>> ioengine=rbd, iodepth=32 
>>>> fio-2.1.11 
>>>> Starting 1 process 
>>>> rbd engine: RBD version: 0.1.9 
>>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops] 
>>>> [eta 00m:00s] 
>>>> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun 9 
>>>> 07:47:30 2015 
>>>> read : io=10000MB, bw=183296KB/s, iops=45823, runt= 55866msec 
>>>> slat (usec): min=7, max=805, avg=21.26, stdev=15.84 
>>>> clat (usec): min=101, max=4602, avg=478.55, stdev=143.73 
>>>> lat (usec): min=123, max=4669, avg=499.80, stdev=146.03 
>>>> clat percentiles (usec): 
>>>> | 1.00th=[ 227], 5.00th=[ 274], 10.00th=[ 306], 20.00th=[ 350], 
>>>> | 30.00th=[ 390], 40.00th=[ 430], 50.00th=[ 470], 60.00th=[ 506], 
>>>> | 70.00th=[ 548], 80.00th=[ 596], 90.00th=[ 660], 95.00th=[ 724], 
>>>> | 99.00th=[ 844], 99.50th=[ 908], 99.90th=[ 1112], 99.95th=[ 1288], 
>>>> | 99.99th=[ 2192] 
>>>> bw (KB /s): min=115280, max=204416, per=100.00%, avg=183315.10, 
>>>> stdev=15079.93 
>>>> lat (usec) : 250=2.42%, 500=55.61%, 750=38.48%, 1000=3.28% 
>>>> lat (msec) : 2=0.19%, 4=0.01%, 10=0.01% 
>>>> cpu : usr=60.27%, sys=12.01%, ctx=2995393, majf=0, minf=14100 
>>>> IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=13.5%, 16=81.0%, 32=5.3%, >=64=0.0% 
>>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>>>> complete : 0=0.0%, 4=95.0%, 8=0.1%, 16=1.0%, 32=4.0%, 64=0.0%, >=64=0.0% 
>>>> issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 
>>>> latency : target=0, window=0, percentile=100.00%, depth=32 
>>>> 
>>>> Run status group 0 (all jobs): 
>>>> READ: io=10000MB, aggrb=183295KB/s, minb=183295KB/s, maxb=183295KB/s, 
>>>> mint=55866msec, maxt=55866msec 
>>>> 
>>>> Disk stats (read/write): 
>>>> dm-0: ios=0/61, merge=0/0, ticks=0/8, in_queue=8, util=0.01%, 
>>>> aggrios=0/29, aggrmerge=0/32, aggrticks=0/8, aggrin_queue=8, 
>>>> aggrutil=0.01% 
>>>> sda: ios=0/29, merge=0/32, ticks=0/8, in_queue=8, util=0.01% 
>>>> 
>>> _______________________________________________ 
>>> ceph-users mailing list 
>>> [email protected] 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> _______________________________________________ 
>> ceph-users mailing list 
>> [email protected] 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> С уважением, Фасихов Ирек Нургаязович 
>> Моб.: +79229045757 
>> _______________________________________________ 
>> ceph-users mailing list 
>> [email protected] 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> 
>> ________________________________ 
>> 
>> PLEASE NOTE: The information contained in this electronic mail message is 
>> intended only for the use of the designated recipient(s) named above. If the 
>> reader of this message is not the intended recipient, you are hereby 
>> notified that you have received this message in error and that any review, 
>> dissemination, distribution, or copying of this message is strictly 
>> prohibited. If you have received this communication in error, please notify 
>> the sender by telephone or e-mail (as shown above) immediately and destroy 
>> any and all copies of this message in your possession (whether hard copies 
>> or electronically stored copies). 
>> 
> 
> 
> 
> -- 
> -Pushpesh 
> 
> 
> 



-- 
-Pushpesh 


_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Reply via email to