Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Alexandre DERUMIER Tue, 09 Jun 2015 05:03:08 -0700

>>Frankly, I'm a little impressed that without RBD cache we can hit 80K 
>>IOPS from 1 VM!


Note that theses result are not in a vm (fio-rbd on host), so in a vm we'll 
have overhead.
(I'm planning to send results in qemu soon)

>>How fast are the SSDs in those 3 OSDs? 

Theses results are with datas in buffer memory of osd nodes.

When reading fulling on ssd (intel s3500),

For 1 client, 

I'm around 33k iops without cache and 32k iops with cache, with 1 osd.
I'm around 55k iops without cache and 38k iops with cache, with 3 osd.

with multiple clients jobs, I can reach around 70kiops by osd , and 250k iops 
by osd when datas are in buffer.

(cpus servers/clients are 2x 10 cores 3,1ghz e5 xeon)



small tip : 
I'm using tcmalloc for fio-rbd or rados bench to improve latencies by around 20%

LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 fio ...
LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 rados bench ...

as a lot of time is spent in malloc/free 


(qemu support also tcmalloc since some months , I'll bench it too
  https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg05372.html)



I'll try to send full bench results soon, from 1 to 18 ssd osd.




----- Mail original -----
De: "Mark Nelson" <mnel...@redhat.com>
À: "aderumier" <aderum...@odiso.com>, "pushpesh sharma" <pushpesh....@gmail.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
<ceph-us...@lists.ceph.com>
Envoyé: Mardi 9 Juin 2015 13:36:31
Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Hi All, 

In the past we've hit some performance issues with RBD cache that we've 
fixed, but we've never really tried pushing a single VM beyond 40+K read 
IOPS in testing (or at least I never have). I suspect there's a couple 
of possibilities as to why it might be slower, but perhaps joshd can 
chime in as he's more familiar with what that code looks like. 

Frankly, I'm a little impressed that without RBD cache we can hit 80K 
IOPS from 1 VM! How fast are the SSDs in those 3 OSDs? 

Mark 

On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote: 
> It's seem that the limit is mainly going in high queue depth (+- > 16) 
> 
> Here the result in iops with 1client- 4krandread- 3osd - with differents 
> queue depth size. 
> rbd_cache is almost the same than without cache with queue depth <16 
> 
> 
> cache 
> ----- 
> qd1: 1651 
> qd2: 3482 
> qd4: 7958 
> qd8: 17912 
> qd16: 36020 
> qd32: 42765 
> qd64: 46169 
> 
> no cache 
> -------- 
> qd1: 1748 
> qd2: 3570 
> qd4: 8356 
> qd8: 17732 
> qd16: 41396 
> qd32: 78633 
> qd64: 79063 
> qd128: 79550 
> 
> 
> ----- Mail original ----- 
> De: "aderumier" <aderum...@odiso.com> 
> À: "pushpesh sharma" <pushpesh....@gmail.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
> <ceph-us...@lists.ceph.com> 
> Envoyé: Mardi 9 Juin 2015 09:28:21 
> Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k 
> 
> Hi, 
> 
>>> We tried adding more RBDs to single VM, but no luck. 
> 
> If you want to scale with more disks in a single qemu vm, you need to use 
> iothread feature from qemu and assign 1 iothread by disk (works with 
> virtio-blk). 
> It's working for me, I can scale with adding more disks. 
> 
> 
> My bench here are done with fio-rbd on host. 
> I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and 
> around 250kiops 10clients-rbdcache=on. 
> 
> 
> I just wonder why I don't have performance decrease around 30k iops with 
> 1osd. 
> 
> I'm going to see if this tracker 
> http://tracker.ceph.com/issues/11056 
> 
> could be the cause. 
> 
> (My master build was done some week ago) 
> 
> 
> 
> ----- Mail original ----- 
> De: "pushpesh sharma" <pushpesh....@gmail.com> 
> À: "aderumier" <aderum...@odiso.com> 
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" 
> <ceph-us...@lists.ceph.com> 
> Envoyé: Mardi 9 Juin 2015 09:21:04 
> Objet: Re: rbd_cache, limiting read on high iops around 40k 
> 
> Hi Alexandre, 
> 
> We have also seen something very similar on Hammer(0.94-1). We were doing 
> some benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). 
> Each Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For 
> some strange reason it was not able to scale 4K- RR iops on each VM beyond 
> 35-40k. We tried adding more RBDs to single VM, but no luck. However 
> increasing number of VMs to 4 on a single hypervisor did scale to some 
> extent. After this there was no much benefit we got from adding more VMs. 
> 
> Here is the trend we have seen, x-axis is number of hypervisor, each 
> hypervisor has 4 VM, each VM has 1 RBD:- 
> 
> 
> 
> 
> VDbench is used as benchmarking tool. We were not saturating network and CPUs 
> at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is 
> where we were suspecting of some throttling effect. However we haven't setted 
> any such limits from nova or kvm end. We tried some CPU pinning and other KVM 
> related tuning as well, but no luck. 
> 
> We tried the same experiment on a bare metal. It was 4K RR IOPs were scaling 
> from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling beyond 
> that point the numbers were actually degrading. (Single pipe more congestion 
> effect) 
> 
> We never suspected that rbd cache enable could be detrimental to performance. 
> It would nice to route cause the problem if that is the case. 
> 
> On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < aderum...@odiso.com > 
> wrote: 
> 
> 
> Hi, 
> 
> I'm doing benchmark (ceph master branch), with randread 4k qdepth=32, 
> and rbd_cache=true seem to limit the iops around 40k 
> 
> 
> no cache 
> -------- 
> 1 client - rbd_cache=false - 1osd : 38300 iops 
> 1 client - rbd_cache=false - 2osd : 69073 iops 
> 1 client - rbd_cache=false - 3osd : 78292 iops 
> 
> 
> cache 
> ----- 
> 1 client - rbd_cache=true - 1osd : 38100 iops 
> 1 client - rbd_cache=true - 2osd : 42457 iops 
> 1 client - rbd_cache=true - 3osd : 45823 iops 
> 
> 
> 
> Is it expected ? 
> 
> 
> 
> fio result rbd_cache=false 3 osd 
> -------------------------------- 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
> iodepth=32 
> fio-2.1.11 
> Starting 1 process 
> rbd engine: RBD version: 0.1.9 
> Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] 
> [eta 00m:00s] 
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 
> 07:48:42 2015 
> read : io=10000MB, bw=313169KB/s, iops=78292, runt= 32698msec 
> slat (usec): min=5, max=530, avg=11.77, stdev= 6.77 
> clat (usec): min=70, max=2240, avg=336.08, stdev=94.82 
> lat (usec): min=101, max=2247, avg=347.84, stdev=95.49 
> clat percentiles (usec): 
> | 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262], 
> | 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346], 
> | 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506], 
> | 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948], 
> | 99.99th=[ 1176] 
> bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34, 
> stdev=25196.21 
> lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23% 
> lat (msec) : 2=0.03%, 4=0.01% 
> cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452 
> IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0% 
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0% 
> issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 
> latency : target=0, window=0, percentile=100.00%, depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=10000MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s, 
> mint=32698msec, maxt=32698msec 
> 
> Disk stats (read/write): 
> dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/24, 
> aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% 
> sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00% 
> 
> 
> 
> 
> fio result rbd_cache=true 3osd 
> ------------------------------ 
> 
> rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, 
> iodepth=32 
> fio-2.1.11 
> Starting 1 process 
> rbd engine: RBD version: 0.1.9 
> Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops] 
> [eta 00m:00s] 
> rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun 9 
> 07:47:30 2015 
> read : io=10000MB, bw=183296KB/s, iops=45823, runt= 55866msec 
> slat (usec): min=7, max=805, avg=21.26, stdev=15.84 
> clat (usec): min=101, max=4602, avg=478.55, stdev=143.73 
> lat (usec): min=123, max=4669, avg=499.80, stdev=146.03 
> clat percentiles (usec): 
> | 1.00th=[ 227], 5.00th=[ 274], 10.00th=[ 306], 20.00th=[ 350], 
> | 30.00th=[ 390], 40.00th=[ 430], 50.00th=[ 470], 60.00th=[ 506], 
> | 70.00th=[ 548], 80.00th=[ 596], 90.00th=[ 660], 95.00th=[ 724], 
> | 99.00th=[ 844], 99.50th=[ 908], 99.90th=[ 1112], 99.95th=[ 1288], 
> | 99.99th=[ 2192] 
> bw (KB /s): min=115280, max=204416, per=100.00%, avg=183315.10, 
> stdev=15079.93 
> lat (usec) : 250=2.42%, 500=55.61%, 750=38.48%, 1000=3.28% 
> lat (msec) : 2=0.19%, 4=0.01%, 10=0.01% 
> cpu : usr=60.27%, sys=12.01%, ctx=2995393, majf=0, minf=14100 
> IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=13.5%, 16=81.0%, 32=5.3%, >=64=0.0% 
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
> complete : 0=0.0%, 4=95.0%, 8=0.1%, 16=1.0%, 32=4.0%, 64=0.0%, >=64=0.0% 
> issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 
> latency : target=0, window=0, percentile=100.00%, depth=32 
> 
> Run status group 0 (all jobs): 
> READ: io=10000MB, aggrb=183295KB/s, minb=183295KB/s, maxb=183295KB/s, 
> mint=55866msec, maxt=55866msec 
> 
> Disk stats (read/write): 
> dm-0: ios=0/61, merge=0/0, ticks=0/8, in_queue=8, util=0.01%, aggrios=0/29, 
> aggrmerge=0/32, aggrticks=0/8, aggrin_queue=8, aggrutil=0.01% 
> sda: ios=0/29, merge=0/32, ticks=0/8, in_queue=8, util=0.01% 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Reply via email to