>>Frankly, I'm a little impressed that without RBD cache we can hit 80K >>IOPS from 1 VM!
Note that theses result are not in a vm (fio-rbd on host), so in a vm we'll have overhead. (I'm planning to send results in qemu soon) >>How fast are the SSDs in those 3 OSDs? Theses results are with datas in buffer memory of osd nodes. When reading fulling on ssd (intel s3500), For 1 client, I'm around 33k iops without cache and 32k iops with cache, with 1 osd. I'm around 55k iops without cache and 38k iops with cache, with 3 osd. with multiple clients jobs, I can reach around 70kiops by osd , and 250k iops by osd when datas are in buffer. (cpus servers/clients are 2x 10 cores 3,1ghz e5 xeon) small tip : I'm using tcmalloc for fio-rbd or rados bench to improve latencies by around 20% LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 fio ... LD_PRELOAD=/usr/lib/libtcmalloc_minimal.so.4 rados bench ... as a lot of time is spent in malloc/free (qemu support also tcmalloc since some months , I'll bench it too https://lists.gnu.org/archive/html/qemu-devel/2015-03/msg05372.html) I'll try to send full bench results soon, from 1 to 18 ssd osd. ----- Mail original ----- De: "Mark Nelson" <mnel...@redhat.com> À: "aderumier" <aderum...@odiso.com>, "pushpesh sharma" <pushpesh....@gmail.com> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" <ceph-us...@lists.ceph.com> Envoyé: Mardi 9 Juin 2015 13:36:31 Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k Hi All, In the past we've hit some performance issues with RBD cache that we've fixed, but we've never really tried pushing a single VM beyond 40+K read IOPS in testing (or at least I never have). I suspect there's a couple of possibilities as to why it might be slower, but perhaps joshd can chime in as he's more familiar with what that code looks like. Frankly, I'm a little impressed that without RBD cache we can hit 80K IOPS from 1 VM! How fast are the SSDs in those 3 OSDs? Mark On 06/09/2015 03:36 AM, Alexandre DERUMIER wrote: > It's seem that the limit is mainly going in high queue depth (+- > 16) > > Here the result in iops with 1client- 4krandread- 3osd - with differents > queue depth size. > rbd_cache is almost the same than without cache with queue depth <16 > > > cache > ----- > qd1: 1651 > qd2: 3482 > qd4: 7958 > qd8: 17912 > qd16: 36020 > qd32: 42765 > qd64: 46169 > > no cache > -------- > qd1: 1748 > qd2: 3570 > qd4: 8356 > qd8: 17732 > qd16: 41396 > qd32: 78633 > qd64: 79063 > qd128: 79550 > > > ----- Mail original ----- > De: "aderumier" <aderum...@odiso.com> > À: "pushpesh sharma" <pushpesh....@gmail.com> > Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" > <ceph-us...@lists.ceph.com> > Envoyé: Mardi 9 Juin 2015 09:28:21 > Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k > > Hi, > >>> We tried adding more RBDs to single VM, but no luck. > > If you want to scale with more disks in a single qemu vm, you need to use > iothread feature from qemu and assign 1 iothread by disk (works with > virtio-blk). > It's working for me, I can scale with adding more disks. > > > My bench here are done with fio-rbd on host. > I can scale up to 400k iops with 10clients-rbd_cache=off on a single host and > around 250kiops 10clients-rbdcache=on. > > > I just wonder why I don't have performance decrease around 30k iops with > 1osd. > > I'm going to see if this tracker > http://tracker.ceph.com/issues/11056 > > could be the cause. > > (My master build was done some week ago) > > > > ----- Mail original ----- > De: "pushpesh sharma" <pushpesh....@gmail.com> > À: "aderumier" <aderum...@odiso.com> > Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "ceph-users" > <ceph-us...@lists.ceph.com> > Envoyé: Mardi 9 Juin 2015 09:21:04 > Objet: Re: rbd_cache, limiting read on high iops around 40k > > Hi Alexandre, > > We have also seen something very similar on Hammer(0.94-1). We were doing > some benchmarking for VMs hosted on hypervisor (QEMU-KVM, openstack-juno). > Each Ubuntu-VM has a RBD as root disk, and 1 RBD as additional storage. For > some strange reason it was not able to scale 4K- RR iops on each VM beyond > 35-40k. We tried adding more RBDs to single VM, but no luck. However > increasing number of VMs to 4 on a single hypervisor did scale to some > extent. After this there was no much benefit we got from adding more VMs. > > Here is the trend we have seen, x-axis is number of hypervisor, each > hypervisor has 4 VM, each VM has 1 RBD:- > > > > > VDbench is used as benchmarking tool. We were not saturating network and CPUs > at OSD nodes. We were not able to saturate CPUs at hypervisors, and that is > where we were suspecting of some throttling effect. However we haven't setted > any such limits from nova or kvm end. We tried some CPU pinning and other KVM > related tuning as well, but no luck. > > We tried the same experiment on a bare metal. It was 4K RR IOPs were scaling > from 40K(1 RBD) to 180K(4 RBDs). But after that rather than scaling beyond > that point the numbers were actually degrading. (Single pipe more congestion > effect) > > We never suspected that rbd cache enable could be detrimental to performance. > It would nice to route cause the problem if that is the case. > > On Tue, Jun 9, 2015 at 11:21 AM, Alexandre DERUMIER < aderum...@odiso.com > > wrote: > > > Hi, > > I'm doing benchmark (ceph master branch), with randread 4k qdepth=32, > and rbd_cache=true seem to limit the iops around 40k > > > no cache > -------- > 1 client - rbd_cache=false - 1osd : 38300 iops > 1 client - rbd_cache=false - 2osd : 69073 iops > 1 client - rbd_cache=false - 3osd : 78292 iops > > > cache > ----- > 1 client - rbd_cache=true - 1osd : 38100 iops > 1 client - rbd_cache=true - 2osd : 42457 iops > 1 client - rbd_cache=true - 3osd : 45823 iops > > > > Is it expected ? > > > > fio result rbd_cache=false 3 osd > -------------------------------- > rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, > iodepth=32 > fio-2.1.11 > Starting 1 process > rbd engine: RBD version: 0.1.9 > Jobs: 1 (f=1): [r(1)] [100.0% done] [307.5MB/0KB/0KB /s] [78.8K/0/0 iops] > [eta 00m:00s] > rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113548: Tue Jun 9 > 07:48:42 2015 > read : io=10000MB, bw=313169KB/s, iops=78292, runt= 32698msec > slat (usec): min=5, max=530, avg=11.77, stdev= 6.77 > clat (usec): min=70, max=2240, avg=336.08, stdev=94.82 > lat (usec): min=101, max=2247, avg=347.84, stdev=95.49 > clat percentiles (usec): > | 1.00th=[ 173], 5.00th=[ 209], 10.00th=[ 231], 20.00th=[ 262], > | 30.00th=[ 282], 40.00th=[ 302], 50.00th=[ 322], 60.00th=[ 346], > | 70.00th=[ 370], 80.00th=[ 402], 90.00th=[ 454], 95.00th=[ 506], > | 99.00th=[ 628], 99.50th=[ 692], 99.90th=[ 860], 99.95th=[ 948], > | 99.99th=[ 1176] > bw (KB /s): min=238856, max=360448, per=100.00%, avg=313402.34, > stdev=25196.21 > lat (usec) : 100=0.01%, 250=15.94%, 500=78.60%, 750=5.19%, 1000=0.23% > lat (msec) : 2=0.03%, 4=0.01% > cpu : usr=74.48%, sys=13.25%, ctx=703225, majf=0, minf=12452 > IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.8%, 16=87.0%, 32=12.1%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=91.6%, 8=3.4%, 16=4.5%, 32=0.4%, 64=0.0%, >=64=0.0% > issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=32 > > Run status group 0 (all jobs): > READ: io=10000MB, aggrb=313169KB/s, minb=313169KB/s, maxb=313169KB/s, > mint=32698msec, maxt=32698msec > > Disk stats (read/write): > dm-0: ios=0/45, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=0/24, > aggrmerge=0/21, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% > sda: ios=0/24, merge=0/21, ticks=0/0, in_queue=0, util=0.00% > > > > > fio result rbd_cache=true 3osd > ------------------------------ > > rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, > iodepth=32 > fio-2.1.11 > Starting 1 process > rbd engine: RBD version: 0.1.9 > Jobs: 1 (f=1): [r(1)] [100.0% done] [171.6MB/0KB/0KB /s] [43.1K/0/0 iops] > [eta 00m:00s] > rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=113389: Tue Jun 9 > 07:47:30 2015 > read : io=10000MB, bw=183296KB/s, iops=45823, runt= 55866msec > slat (usec): min=7, max=805, avg=21.26, stdev=15.84 > clat (usec): min=101, max=4602, avg=478.55, stdev=143.73 > lat (usec): min=123, max=4669, avg=499.80, stdev=146.03 > clat percentiles (usec): > | 1.00th=[ 227], 5.00th=[ 274], 10.00th=[ 306], 20.00th=[ 350], > | 30.00th=[ 390], 40.00th=[ 430], 50.00th=[ 470], 60.00th=[ 506], > | 70.00th=[ 548], 80.00th=[ 596], 90.00th=[ 660], 95.00th=[ 724], > | 99.00th=[ 844], 99.50th=[ 908], 99.90th=[ 1112], 99.95th=[ 1288], > | 99.99th=[ 2192] > bw (KB /s): min=115280, max=204416, per=100.00%, avg=183315.10, > stdev=15079.93 > lat (usec) : 250=2.42%, 500=55.61%, 750=38.48%, 1000=3.28% > lat (msec) : 2=0.19%, 4=0.01%, 10=0.01% > cpu : usr=60.27%, sys=12.01%, ctx=2995393, majf=0, minf=14100 > IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=13.5%, 16=81.0%, 32=5.3%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=95.0%, 8=0.1%, 16=1.0%, 32=4.0%, 64=0.0%, >=64=0.0% > issued : total=r=2560000/w=0/d=0, short=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=32 > > Run status group 0 (all jobs): > READ: io=10000MB, aggrb=183295KB/s, minb=183295KB/s, maxb=183295KB/s, > mint=55866msec, maxt=55866msec > > Disk stats (read/write): > dm-0: ios=0/61, merge=0/0, ticks=0/8, in_queue=8, util=0.01%, aggrios=0/29, > aggrmerge=0/32, aggrticks=0/8, aggrin_queue=8, aggrutil=0.01% > sda: ios=0/29, merge=0/32, ticks=0/8, in_queue=8, util=0.01% > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html