Re: [ceph-users] bad perf for librbd vs krbd using FIO

Somnath Roy Fri, 11 Sep 2015 11:30:22 -0700

Check this..

http://www.spinics.net/lists/ceph-users/msg16294.html


http://tracker.ceph.com/issues/9344

Thanks & Regards
Somnath

From: ceph-users [mailto:[email protected]] On Behalf Of Bill 
Sanders
Sent: Friday, September 11, 2015 11:17 AM
To: Jan Schermer
Cc: Rafael Lopez; [email protected]; Nick Fisk
Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO

Is there a thread on the mailing list (or LKML?) with some background about 
tcp_low_latency and TCP_NODELAY?
Bill

On Fri, Sep 11, 2015 at 2:30 AM, Jan Schermer 
<[email protected]<mailto:[email protected]>> wrote:
Can you try

echo 1 > /proc/sys/net/ipv4/tcp_low_latency

And see if it improves things? I remember there being an option to disable 
nagle completely, but it's gone apparently.

Jan

> On 11 Sep 2015, at 10:43, Nick Fisk <[email protected]<mailto:[email protected]>> 
> wrote:
>
>
>
>
>
>> -----Original Message-----
>> From: ceph-users 
>> [mailto:[email protected]<mailto:[email protected]>]
>>  On Behalf Of
>> Somnath Roy
>> Sent: 11 September 2015 06:23
>> To: Rafael Lopez <[email protected]<mailto:[email protected]>>
>> Cc: [email protected]<mailto:[email protected]>
>> Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO
>>
>> That’s probably because the krbd version you are using doesn’t have the
>> TCP_NODELAY patch. We have submitted it (and you can build it from latest
>> rbd source) , but, I am not sure when it will be in linux mainline.
>
> From memory it landed in 3.19, but there are also several issues with max IO 
> size, max nr_requests and readahead. I would suggest for testing, try one of 
> these:-
>
> http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/ra-bring-back/
>
>
>>
>> Thanks & Regards
>> Somnath
>>
>> From: Rafael Lopez 
>> [mailto:[email protected]<mailto:[email protected]>]
>> Sent: Thursday, September 10, 2015 10:12 PM
>> To: Somnath Roy
>> Cc: [email protected]<mailto:[email protected]>
>> Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO
>>
>> Ok I ran the two tests again with direct=1, smaller block size (4k) and 
>> smaller
>> total io (100m), disabled cache at ceph.conf side on client by adding:
>>
>> [client]
>> rbd cache = false
>> rbd cache max dirty = 0
>> rbd cache size = 0
>> rbd cache target dirty = 0
>>
>>
>> The result seems to have swapped around, now the librbd job is running
>> ~50% faster than the krbd job!
>>
>> ####### krbd job:
>>
>> [root@rcprsdc1r72-01-ac rafaell]# fio ext4_test
>> job1: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=16
>> fio-2.2.8
>> Starting 1 process
>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/571KB/0KB /s] [0/142/0 iops] [eta
>> 00m:00s]
>> job1: (groupid=0, jobs=1): err= 0: pid=29095: Fri Sep 11 14:48:21 2015
>>  write: io=102400KB, bw=647137B/s, iops=157, runt=162033msec
>>    clat (msec): min=2, max=25, avg= 6.32, stdev= 1.21
>>     lat (msec): min=2, max=25, avg= 6.32, stdev= 1.21
>>    clat percentiles (usec):
>>     |  1.00th=[ 2896],  5.00th=[ 4320], 10.00th=[ 4768], 20.00th=[ 5536],
>>     | 30.00th=[ 5920], 40.00th=[ 6176], 50.00th=[ 6432], 60.00th=[ 6624],
>>     | 70.00th=[ 6816], 80.00th=[ 7136], 90.00th=[ 7584], 95.00th=[ 7968],
>>     | 99.00th=[ 9024], 99.50th=[ 9664], 99.90th=[15808], 99.95th=[17536],
>>     | 99.99th=[19328]
>>    bw (KB  /s): min=  506, max= 1171, per=100.00%, avg=632.22, stdev=104.77
>>    lat (msec) : 4=2.88%, 10=96.69%, 20=0.43%, 50=0.01%
>>  cpu          : usr=0.17%, sys=0.71%, ctx=25634, majf=0, minf=35
>>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>> =64=0.0%
>>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> =64=0.0%
>>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> =64=0.0%
>>     issued    : total=r=0/w=25600/d=0, short=r=0/w=0/d=0,
>> drop=r=0/w=0/d=0
>>     latency   : target=0, window=0, percentile=100.00%, depth=16
>>
>> Run status group 0 (all jobs):
>>  WRITE: io=102400KB, aggrb=631KB/s, minb=631KB/s, maxb=631KB/s,
>> mint=162033msec, maxt=162033msec
>>
>> Disk stats (read/write):
>>  rbd0: ios=0/25638, merge=0/32, ticks=0/160765, in_queue=160745,
>> util=99.11%
>> [root@rcprsdc1r72-01-ac rafaell]#
>>
>> ###### librb job:
>>
>> [root@rcprsdc1r72-01-ac rafaell]# fio fio_rbd_test
>> job1: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=16
>> fio-2.2.8
>> Starting 1 process
>> rbd engine: RBD version: 0.1.9
>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/703KB/0KB /s] [0/175/0 iops] [eta
>> 00m:00s]
>> job1: (groupid=0, jobs=1): err= 0: pid=30568: Fri Sep 11 14:50:24 2015
>>  write: io=102400KB, bw=950141B/s, iops=231, runt=110360msec
>>    slat (usec): min=70, max=992, avg=115.05, stdev=30.07
>>    clat (msec): min=13, max=117, avg=67.91, stdev=24.93
>>     lat (msec): min=13, max=117, avg=68.03, stdev=24.93
>>    clat percentiles (msec):
>>     |  1.00th=[   19],  5.00th=[   26], 10.00th=[   38], 20.00th=[   40],
>>     | 30.00th=[   46], 40.00th=[   62], 50.00th=[   77], 60.00th=[   85],
>>     | 70.00th=[   88], 80.00th=[   91], 90.00th=[   95], 95.00th=[   99],
>>     | 99.00th=[  105], 99.50th=[  110], 99.90th=[  116], 99.95th=[  117],
>>     | 99.99th=[  118]
>>    bw (KB  /s): min=  565, max= 3174, per=100.00%, avg=935.74, stdev=407.67
>>    lat (msec) : 20=2.41%, 50=29.85%, 100=64.46%, 250=3.29%
>>  cpu          : usr=2.43%, sys=0.29%, ctx=7847, majf=0, minf=2750
>>  IO depths    : 1=6.2%, 2=12.5%, 4=25.0%, 8=50.0%, 16=6.2%, 32=0.0%,
>>> =64=0.0%
>>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> =64=0.0%
>>     complete  : 0=0.0%, 4=94.1%, 8=0.0%, 16=5.9%, 32=0.0%, 64=0.0%,
>>> =64=0.0%
>>     issued    : total=r=0/w=25600/d=0, short=r=0/w=0/d=0,
>> drop=r=0/w=0/d=0
>>     latency   : target=0, window=0, percentile=100.00%, depth=16
>>
>> Run status group 0 (all jobs):
>>  WRITE: io=102400KB, aggrb=927KB/s, minb=927KB/s, maxb=927KB/s,
>> mint=110360msec, maxt=110360msec
>>
>> Disk stats (read/write):
>>    dm-1: ios=240/369, merge=0/0, ticks=742/40, in_queue=782, util=0.38%,
>> aggrios=240/379, aggrmerge=0/19, aggrticks=742/41, aggrin_queue=783,
>> aggrutil=0.39%
>>  sda: ios=240/379, merge=0/19, ticks=742/41, in_queue=783, util=0.39%
>> [root@rcprsdc1r72-01-ac rafaell]#
>>
>>
>>
>> Confirmed speed (at least for krbd) using dd:
>> [root@rcprsdc1r72-01-ac rafaell]# dd if=/mnt/ssd/random100g
>> of=/mnt/rbd/dd_io_test bs=4k count=10000 oflag=direct
>> 10000+0 records in
>> 10000+0 records out
>> 40960000 bytes (41 MB) copied, 64.9799 s, 630 kB/s
>> [root@rcprsdc1r72-01-ac rafaell]#
>>
>>
>> Back to FIO, it's worse for 1M block size (librbd is about ~100% better 
>> perf).
>> 1M librbd:
>> Run status group 0 (all jobs):
>>  WRITE: io=1024.0MB, aggrb=112641KB/s, minb=112641KB/s,
>> maxb=112641KB/s, mint=9309msec, maxt=9309msec
>>
>> 1M krbd:
>> Run status group 0 (all jobs):
>>  WRITE: io=1024.0MB, aggrb=49939KB/s, minb=49939KB/s, maxb=49939KB/s,
>> mint=20997msec, maxt=20997msec
>>
>> Raf
>>
>> On 11 September 2015 at 14:33, Somnath Roy 
>> <[email protected]<mailto:[email protected]>>
>> wrote:
>> Only changing client side ceph.conf and rerunning the tests is sufficient.
>>
>> Thanks & Regards
>> Somnath
>>
>> From: Rafael Lopez 
>> [mailto:[email protected]<mailto:[email protected]>]
>> Sent: Thursday, September 10, 2015 8:58 PM
>> To: Somnath Roy
>> Cc: [email protected]<mailto:[email protected]>
>> Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO
>>
>> Thanks for the quick reply Somnath, will give this a try.
>>
>> In order to set the rbd cache settings, is it a matter of updating the 
>> ceph.conf
>> file on the client only prior to running the test, or do I need to inject 
>> args to all
>> OSDs ?
>>
>> Raf
>>
>>
>> On 11 September 2015 at 13:39, Somnath Roy 
>> <[email protected]<mailto:[email protected]>>
>> wrote:
>> It may be due to rbd cache effect..
>> Try the following..
>>
>> Run your test with direct = 1 both the cases and rbd_cache = false  (disable 
>> all
>> other rbd cache option as well). This should give you similar result like 
>> krbd.
>>
>> In direct =1 case, we saw ~10-20% degradation if we make rbd_cache = true.
>> But, direct = 0 case, it could be more as you are seeing..
>>
>> I think there is a delta (or need to tune properly) if you want to use rbd
>> cache.
>>
>> Thanks & Regards
>> Somnath
>>
>>
>>
>> From: ceph-users 
>> [mailto:[email protected]<mailto:[email protected]>]
>>  On Behalf Of
>> Rafael Lopez
>> Sent: Thursday, September 10, 2015 8:24 PM
>> To: [email protected]<mailto:[email protected]>
>> Subject: [ceph-users] bad perf for librbd vs krbd using FIO
>>
>> Hi all,
>>
>> I am seeing a big discrepancy between librbd and kRBD/ext4 performance
>> using FIO with single RBD image. RBD images are coming from same RBD
>> pool, same size and settings for both. The librbd results are quite bad by
>> comparison, and in addition if I scale up the kRBD FIO job with more
>> jobs/threads it increases up to 3-4x results below, but librbd doesn't seem 
>> to
>> scale much at all. I figured that it should be close to the kRBD result for a
>> single job/thread before parallelism comes into play though. RBD cache
>> settings are all default.
>>
>> I can see some obvious differences in FIO output, but not being well versed
>> with FIO I'm not sure what to make of it or where to start diagnosing the
>> discrepancy. Hunted around but haven't found anything useful, any
>> suggestions/insights would be appreciated.
>>
>> RBD cache settings:
>> [root@rcmktdc1r72-09-ac rafaell]# ceph --admin-daemon
>> /var/run/ceph/ceph-osd.659.asok config show | grep rbd_cache
>>    "rbd_cache": "true",
>>    "rbd_cache_writethrough_until_flush": "true",
>>    "rbd_cache_size": "33554432",
>>    "rbd_cache_max_dirty": "25165824",
>>    "rbd_cache_target_dirty": "16777216",
>>    "rbd_cache_max_dirty_age": "1",
>>    "rbd_cache_max_dirty_object": "0",
>>    "rbd_cache_block_writes_upfront": "false",
>> [root@rcmktdc1r72-09-ac rafaell]#
>>
>> This is the FIO job file for the kRBD job:
>>
>> [root@rcprsdc1r72-01-ac rafaell]# cat ext4_test
>> ; -- start job file --
>> [global]
>> rw=rw
>> size=100g
>> filename=/mnt/rbd/fio_test_file_ext4
>> rwmixread=0
>> rwmixwrite=100
>> percentage_random=0
>> bs=1024k
>> direct=0
>> iodepth=16
>> thread=1
>> numjobs=1
>> [job1]
>> ; -- end job file --
>>
>> [root@rcprsdc1r72-01-ac rafaell]#
>>
>> This is the FIO job file for the librbd job:
>>
>> [root@rcprsdc1r72-01-ac rafaell]# cat fio_rbd_test
>> ; -- start job file --
>> [global]
>> rw=rw
>> size=100g
>> rwmixread=0
>> rwmixwrite=100
>> percentage_random=0
>> bs=1024k
>> direct=0
>> iodepth=16
>> thread=1
>> numjobs=1
>> ioengine=rbd
>> rbdname=nas1-rds-stg31
>> pool=rbd
>> [job1]
>> ; -- end job file --
>>
>>
>> Here are the results:
>>
>> [root@rcprsdc1r72-01-ac rafaell]# fio ext4_test
>> job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=sync, iodepth=16
>> fio-2.2.8
>> Starting 1 thread
>> job1: Laying out IO file(s) (1 file(s) / 102400MB)
>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/321.7MB/0KB /s] [0/321/0 iops] [eta
>> 00m:00s]
>> job1: (groupid=0, jobs=1): err= 0: pid=37981: Fri Sep 11 12:33:13 2015
>>  write: io=102400MB, bw=399741KB/s, iops=390, runt=262314msec
>>    clat (usec): min=411, max=574082, avg=2492.91, stdev=7316.96
>>     lat (usec): min=418, max=574113, avg=2520.12, stdev=7318.53
>>    clat percentiles (usec):
>>     |  1.00th=[  446],  5.00th=[  458], 10.00th=[  474], 20.00th=[  510],
>>     | 30.00th=[ 1064], 40.00th=[ 1096], 50.00th=[ 1160], 60.00th=[ 1320],
>>     | 70.00th=[ 1592], 80.00th=[ 2448], 90.00th=[ 7712], 95.00th=[ 7904],
>>     | 99.00th=[11072], 99.50th=[11712], 99.90th=[13120], 99.95th=[73216],
>>     | 99.99th=[464896]
>>    bw (KB  /s): min=  264, max=2156544, per=100.00%, avg=412986.27,
>> stdev=375092.66
>>    lat (usec) : 500=18.68%, 750=7.43%, 1000=2.11%
>>    lat (msec) : 2=48.89%, 4=4.35%, 10=16.79%, 20=1.67%, 50=0.03%
>>    lat (msec) : 100=0.03%, 250=0.02%, 500=0.01%, 750=0.01%
>>  cpu          : usr=1.24%, sys=45.38%, ctx=19298, majf=0, minf=974
>>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>> =64=0.0%
>>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> =64=0.0%
>>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> =64=0.0%
>>     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0,
>> drop=r=0/w=0/d=0
>>     latency   : target=0, window=0, percentile=100.00%, depth=16
>>
>> Run status group 0 (all jobs):
>>  WRITE: io=102400MB, aggrb=399740KB/s, minb=399740KB/s,
>> maxb=399740KB/s, mint=262314msec, maxt=262314msec
>>
>> Disk stats (read/write):
>>  rbd0: ios=0/150890, merge=0/49, ticks=0/36117700, in_queue=36145277,
>> util=96.97%
>> [root@rcprsdc1r72-01-ac rafaell]#
>>
>> [root@rcprsdc1r72-01-ac rafaell]# fio fio_rbd_test
>> job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=rbd, iodepth=16
>> fio-2.2.8
>> Starting 1 thread
>> rbd engine: RBD version: 0.1.9
>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/65405KB/0KB /s] [0/63/0 iops] [eta
>> 00m:00s]
>> job1: (groupid=0, jobs=1): err= 0: pid=43960: Fri Sep 11 12:54:25 2015
>>  write: io=102400MB, bw=121882KB/s, iops=119, runt=860318msec
>>    slat (usec): min=355, max=7300, avg=908.97, stdev=361.02
>>    clat (msec): min=11, max=1468, avg=129.59, stdev=130.68
>>     lat (msec): min=12, max=1468, avg=130.50, stdev=130.69
>>    clat percentiles (msec):
>>     |  1.00th=[   21],  5.00th=[   26], 10.00th=[   29], 20.00th=[   34],
>>     | 30.00th=[   37], 40.00th=[   40], 50.00th=[   44], 60.00th=[   63],
>>     | 70.00th=[  233], 80.00th=[  241], 90.00th=[  269], 95.00th=[  367],
>>     | 99.00th=[  553], 99.50th=[  652], 99.90th=[  832], 99.95th=[  848],
>>     | 99.99th=[ 1369]
>>    bw (KB  /s): min=20363, max=248543, per=100.00%, avg=124381.19,
>> stdev=42313.29
>>    lat (msec) : 20=0.95%, 50=55.27%, 100=5.55%, 250=24.83%, 500=12.28%
>>    lat (msec) : 750=0.89%, 1000=0.21%, 2000=0.01%
>>  cpu          : usr=9.58%, sys=1.15%, ctx=23883, majf=0, minf=2751023
>>  IO depths    : 1=1.2%, 2=3.0%, 4=9.7%, 8=68.3%, 16=17.8%, 32=0.0%,
>>> =64=0.0%
>>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> =64=0.0%
>>     complete  : 0=0.0%, 4=92.5%, 8=4.3%, 16=3.2%, 32=0.0%, 64=0.0%,
>>> =64=0.0%
>>     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0,
>> drop=r=0/w=0/d=0
>>     latency   : target=0, window=0, percentile=100.00%, depth=16
>>
>> Run status group 0 (all jobs):
>>  WRITE: io=102400MB, aggrb=121882KB/s, minb=121882KB/s,
>> maxb=121882KB/s, mint=860318msec, maxt=860318msec
>>
>> Disk stats (read/write):
>>    dm-1: ios=0/2072, merge=0/0, ticks=0/233, in_queue=233, util=0.01%,
>> aggrios=1/2249, aggrmerge=7/559, aggrticks=9/254, aggrin_queue=261,
>> aggrutil=0.01%
>>  sda: ios=1/2249, merge=7/559, ticks=9/254, in_queue=261, util=0.01%
>> [root@rcprsdc1r72-01-ac rafaell]#
>>
>> Cheers,
>> Raf
>>
>>
>> --
>> Rafael Lopez
>> Data Storage Administrator
>> Servers & Storage (eSolutions)
>>
>>
>> ________________________________________
>>
>> PLEASE NOTE: The information contained in this electronic mail message is
>> intended only for the use of the designated recipient(s) named above. If the
>> reader of this message is not the intended recipient, you are hereby notified
>> that you have received this message in error and that any review,
>> dissemination, distribution, or copying of this message is strictly 
>> prohibited. If
>> you have received this communication in error, please notify the sender by
>> telephone or e-mail (as shown above) immediately and destroy any and all
>> copies of this message in your possession (whether hard copies or
>> electronically stored copies).
>>
>>
>>
>>
>> --
>> Rafael Lopez
>> Data Storage Administrator
>> Servers & Storage (eSolutions)
>> +61 3 990 59118<tel:%2B61%203%20990%2059118>
>>
>>
>>
>>
>>
>> --
>> Rafael Lopez
>> Data Storage Administrator
>> Servers & Storage (eSolutions)
>> +61 3 990 59118<tel:%2B61%203%20990%2059118>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]<mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] bad perf for librbd vs krbd using FIO

Reply via email to