That’s probably because the krbd version you are using doesn’t have the 
TCP_NODELAY patch. We have submitted it (and you can build it from latest rbd 
source) , but, I am not sure when it will be in linux mainline.

Thanks & Regards
Somnath

From: Rafael Lopez [mailto:[email protected]]
Sent: Thursday, September 10, 2015 10:12 PM
To: Somnath Roy
Cc: [email protected]
Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO

Ok I ran the two tests again with direct=1, smaller block size (4k) and smaller 
total io (100m), disabled cache at ceph.conf side on client by adding:

[client]
rbd cache = false
rbd cache max dirty = 0
rbd cache size = 0
rbd cache target dirty = 0


The result seems to have swapped around, now the librbd job is running ~50% 
faster than the krbd job!

####### krbd job:

[root@rcprsdc1r72-01-ac rafaell]# fio ext4_test
job1: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=16
fio-2.2.8
Starting 1 process
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/571KB/0KB /s] [0/142/0 iops] [eta 
00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=29095: Fri Sep 11 14:48:21 2015
  write: io=102400KB, bw=647137B/s, iops=157, runt=162033msec
    clat (msec): min=2, max=25, avg= 6.32, stdev= 1.21
     lat (msec): min=2, max=25, avg= 6.32, stdev= 1.21
    clat percentiles (usec):
     |  1.00th=[ 2896],  5.00th=[ 4320], 10.00th=[ 4768], 20.00th=[ 5536],
     | 30.00th=[ 5920], 40.00th=[ 6176], 50.00th=[ 6432], 60.00th=[ 6624],
     | 70.00th=[ 6816], 80.00th=[ 7136], 90.00th=[ 7584], 95.00th=[ 7968],
     | 99.00th=[ 9024], 99.50th=[ 9664], 99.90th=[15808], 99.95th=[17536],
     | 99.99th=[19328]
    bw (KB  /s): min=  506, max= 1171, per=100.00%, avg=632.22, stdev=104.77
    lat (msec) : 4=2.88%, 10=96.69%, 20=0.43%, 50=0.01%
  cpu          : usr=0.17%, sys=0.71%, ctx=25634, majf=0, minf=35
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=25600/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=102400KB, aggrb=631KB/s, minb=631KB/s, maxb=631KB/s, 
mint=162033msec, maxt=162033msec

Disk stats (read/write):
  rbd0: ios=0/25638, merge=0/32, ticks=0/160765, in_queue=160745, util=99.11%
[root@rcprsdc1r72-01-ac rafaell]#

###### librb job:

[root@rcprsdc1r72-01-ac rafaell]# fio fio_rbd_test
job1: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=16
fio-2.2.8
Starting 1 process
rbd engine: RBD version: 0.1.9
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/703KB/0KB /s] [0/175/0 iops] [eta 
00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=30568: Fri Sep 11 14:50:24 2015
  write: io=102400KB, bw=950141B/s, iops=231, runt=110360msec
    slat (usec): min=70, max=992, avg=115.05, stdev=30.07
    clat (msec): min=13, max=117, avg=67.91, stdev=24.93
     lat (msec): min=13, max=117, avg=68.03, stdev=24.93
    clat percentiles (msec):
     |  1.00th=[   19],  5.00th=[   26], 10.00th=[   38], 20.00th=[   40],
     | 30.00th=[   46], 40.00th=[   62], 50.00th=[   77], 60.00th=[   85],
     | 70.00th=[   88], 80.00th=[   91], 90.00th=[   95], 95.00th=[   99],
     | 99.00th=[  105], 99.50th=[  110], 99.90th=[  116], 99.95th=[  117],
     | 99.99th=[  118]
    bw (KB  /s): min=  565, max= 3174, per=100.00%, avg=935.74, stdev=407.67
    lat (msec) : 20=2.41%, 50=29.85%, 100=64.46%, 250=3.29%
  cpu          : usr=2.43%, sys=0.29%, ctx=7847, majf=0, minf=2750
  IO depths    : 1=6.2%, 2=12.5%, 4=25.0%, 8=50.0%, 16=6.2%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=94.1%, 8=0.0%, 16=5.9%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=25600/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=102400KB, aggrb=927KB/s, minb=927KB/s, maxb=927KB/s, 
mint=110360msec, maxt=110360msec

Disk stats (read/write):
    dm-1: ios=240/369, merge=0/0, ticks=742/40, in_queue=782, util=0.38%, 
aggrios=240/379, aggrmerge=0/19, aggrticks=742/41, aggrin_queue=783, 
aggrutil=0.39%
  sda: ios=240/379, merge=0/19, ticks=742/41, in_queue=783, util=0.39%
[root@rcprsdc1r72-01-ac rafaell]#



Confirmed speed (at least for krbd) using dd:
[root@rcprsdc1r72-01-ac rafaell]# dd if=/mnt/ssd/random100g 
of=/mnt/rbd/dd_io_test bs=4k count=10000 oflag=direct
10000+0 records in
10000+0 records out
40960000 bytes (41 MB) copied, 64.9799 s, 630 kB/s
[root@rcprsdc1r72-01-ac rafaell]#


Back to FIO, it's worse for 1M block size (librbd is about ~100% better perf).
1M librbd:
Run status group 0 (all jobs):
  WRITE: io=1024.0MB, aggrb=112641KB/s, minb=112641KB/s, maxb=112641KB/s, 
mint=9309msec, maxt=9309msec

1M krbd:
Run status group 0 (all jobs):
  WRITE: io=1024.0MB, aggrb=49939KB/s, minb=49939KB/s, maxb=49939KB/s, 
mint=20997msec, maxt=20997msec

Raf

On 11 September 2015 at 14:33, Somnath Roy 
<[email protected]<mailto:[email protected]>> wrote:
Only changing client side ceph.conf and rerunning the tests is sufficient.

Thanks & Regards
Somnath

From: Rafael Lopez 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, September 10, 2015 8:58 PM
To: Somnath Roy
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO

Thanks for the quick reply Somnath, will give this a try.

In order to set the rbd cache settings, is it a matter of updating the 
ceph.conf file on the client only prior to running the test, or do I need to 
inject args to all OSDs ?

Raf


On 11 September 2015 at 13:39, Somnath Roy 
<[email protected]<mailto:[email protected]>> wrote:
It may be due to rbd cache effect..
Try the following..

Run your test with direct = 1 both the cases and rbd_cache = false  (disable 
all other rbd cache option as well). This should give you similar result like 
krbd.

In direct =1 case, we saw ~10-20% degradation if we make rbd_cache = true.
But, direct = 0 case, it could be more as you are seeing..

I think there is a delta (or need to tune properly) if you want to use rbd 
cache.

Thanks & Regards
Somnath



From: ceph-users 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Rafael Lopez
Sent: Thursday, September 10, 2015 8:24 PM
To: [email protected]<mailto:[email protected]>
Subject: [ceph-users] bad perf for librbd vs krbd using FIO

Hi all,

I am seeing a big discrepancy between librbd and kRBD/ext4 performance using 
FIO with single RBD image. RBD images are coming from same RBD pool, same size 
and settings for both. The librbd results are quite bad by comparison, and in 
addition if I scale up the kRBD FIO job with more jobs/threads it increases up 
to 3-4x results below, but librbd doesn't seem to scale much at all. I figured 
that it should be close to the kRBD result for a single job/thread before 
parallelism comes into play though. RBD cache settings are all default.

I can see some obvious differences in FIO output, but not being well versed 
with FIO I'm not sure what to make of it or where to start diagnosing the 
discrepancy. Hunted around but haven't found anything useful, any 
suggestions/insights would be appreciated.

RBD cache settings:
[root@rcmktdc1r72-09-ac rafaell]# ceph --admin-daemon 
/var/run/ceph/ceph-osd.659.asok config show | grep rbd_cache
    "rbd_cache": "true",
    "rbd_cache_writethrough_until_flush": "true",
    "rbd_cache_size": "33554432",
    "rbd_cache_max_dirty": "25165824",
    "rbd_cache_target_dirty": "16777216",
    "rbd_cache_max_dirty_age": "1",
    "rbd_cache_max_dirty_object": "0",
    "rbd_cache_block_writes_upfront": "false",
[root@rcmktdc1r72-09-ac rafaell]#

This is the FIO job file for the kRBD job:

[root@rcprsdc1r72-01-ac rafaell]# cat ext4_test
; -- start job file --
[global]
rw=rw
size=100g
filename=/mnt/rbd/fio_test_file_ext4
rwmixread=0
rwmixwrite=100
percentage_random=0
bs=1024k
direct=0
iodepth=16
thread=1
numjobs=1
[job1]
; -- end job file --

[root@rcprsdc1r72-01-ac rafaell]#

This is the FIO job file for the librbd job:

[root@rcprsdc1r72-01-ac rafaell]# cat fio_rbd_test
; -- start job file --
[global]
rw=rw
size=100g
rwmixread=0
rwmixwrite=100
percentage_random=0
bs=1024k
direct=0
iodepth=16
thread=1
numjobs=1
ioengine=rbd
rbdname=nas1-rds-stg31
pool=rbd
[job1]
; -- end job file --


Here are the results:

[root@rcprsdc1r72-01-ac rafaell]# fio ext4_test
job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=sync, iodepth=16
fio-2.2.8
Starting 1 thread
job1: Laying out IO file(s) (1 file(s) / 102400MB)
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/321.7MB/0KB /s] [0/321/0 iops] [eta 
00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=37981: Fri Sep 11 12:33:13 
2015<tel:13%202015>
  write: io=102400MB, bw=399741KB/s, iops=390, runt=262314msec
    clat (usec): min=411, max=574082, avg=2492.91, stdev=7316.96
     lat (usec): min=418, max=574113, avg=2520.12, stdev=7318.53
    clat percentiles (usec):
     |  1.00th=[  446],  5.00th=[  458], 10.00th=[  474], 20.00th=[  510],
     | 30.00th=[ 1064], 40.00th=[ 1096], 50.00th=[ 1160], 60.00th=[ 1320],
     | 70.00th=[ 1592], 80.00th=[ 2448], 90.00th=[ 7712], 95.00th=[ 7904],
     | 99.00th=[11072], 99.50th=[11712], 99.90th=[13120], 99.95th=[73216],
     | 99.99th=[464896]
    bw (KB  /s): min=  264, max=2156544, per=100.00%, avg=412986.27, 
stdev=375092.66
    lat (usec) : 500=18.68%, 750=7.43%, 1000=2.11%
    lat (msec) : 2=48.89%, 4=4.35%, 10=16.79%, 20=1.67%, 50=0.03%
    lat (msec) : 100=0.03%, 250=0.02%, 500=0.01%, 750=0.01%
  cpu          : usr=1.24%, sys=45.38%, ctx=19298, majf=0, minf=974
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=102400MB, aggrb=399740KB/s, minb=399740KB/s, maxb=399740KB/s, 
mint=262314msec, maxt=262314msec

Disk stats (read/write):
  rbd0: ios=0/150890, merge=0/49, ticks=0/36117700, in_queue=36145277, 
util=96.97%
[root@rcprsdc1r72-01-ac rafaell]#

[root@rcprsdc1r72-01-ac rafaell]# fio fio_rbd_test
job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=rbd, iodepth=16
fio-2.2.8
Starting 1 thread
rbd engine: RBD version: 0.1.9
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/65405KB/0KB /s] [0/63/0 iops] [eta 
00m:00s]
job1: (groupid=0, jobs=1): err= 0: pid=43960: Fri Sep 11 12:54:25 2015
  write: io=102400MB, bw=121882KB/s, iops=119, runt=860318msec
    slat (usec): min=355, max=7300, avg=908.97, stdev=361.02
    clat (msec): min=11, max=1468, avg=129.59, stdev=130.68
     lat (msec): min=12, max=1468, avg=130.50, stdev=130.69
    clat percentiles (msec):
     |  1.00th=[   21],  5.00th=[   26], 10.00th=[   29], 20.00th=[   34],
     | 30.00th=[   37], 40.00th=[   40], 50.00th=[   44], 60.00th=[   63],
     | 70.00th=[  233], 80.00th=[  241], 90.00th=[  269], 95.00th=[  367],
     | 99.00th=[  553], 99.50th=[  652], 99.90th=[  832], 99.95th=[  848],
     | 99.99th=[ 1369]
    bw (KB  /s): min=20363, max=248543, per=100.00%, avg=124381.19, 
stdev=42313.29
    lat (msec) : 20=0.95%, 50=55.27%, 100=5.55%, 250=24.83%, 500=12.28%
    lat (msec) : 750=0.89%, 1000=0.21%, 2000=0.01%
  cpu          : usr=9.58%, sys=1.15%, ctx=23883, majf=0, minf=2751023
  IO depths    : 1=1.2%, 2=3.0%, 4=9.7%, 8=68.3%, 16=17.8%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=92.5%, 8=4.3%, 16=3.2%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: io=102400MB, aggrb=121882KB/s, minb=121882KB/s, maxb=121882KB/s, 
mint=860318msec, maxt=860318msec

Disk stats (read/write):
    dm-1: ios=0/2072, merge=0/0, ticks=0/233, in_queue=233, util=0.01%, 
aggrios=1/2249, aggrmerge=7/559, aggrticks=9/254, aggrin_queue=261, 
aggrutil=0.01%
  sda: ios=1/2249, merge=7/559, ticks=9/254, in_queue=261, util=0.01%
[root@rcprsdc1r72-01-ac rafaell]#

Cheers,
Raf


--
Rafael Lopez
Data Storage Administrator
Servers & Storage (eSolutions)
[http://assets.monash.edu/logos/logo.gif]

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).



--
Rafael Lopez
Data Storage Administrator
Servers & Storage (eSolutions)
+61 3 990 59118
[http://assets.monash.edu/logos/logo.gif]



--
Rafael Lopez
Data Storage Administrator
Servers & Storage (eSolutions)
+61 3 990 59118
[http://assets.monash.edu/logos/logo.gif]
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to