Is there a thread on the mailing list (or LKML?) with some background about
tcp_low_latency and TCP_NODELAY?

Bill

On Fri, Sep 11, 2015 at 2:30 AM, Jan Schermer <[email protected]> wrote:

> Can you try
>
> echo 1 > /proc/sys/net/ipv4/tcp_low_latency
>
> And see if it improves things? I remember there being an option to disable
> nagle completely, but it's gone apparently.
>
> Jan
>
> > On 11 Sep 2015, at 10:43, Nick Fisk <[email protected]> wrote:
> >
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: ceph-users [mailto:[email protected]] On Behalf
> Of
> >> Somnath Roy
> >> Sent: 11 September 2015 06:23
> >> To: Rafael Lopez <[email protected]>
> >> Cc: [email protected]
> >> Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO
> >>
> >> That’s probably because the krbd version you are using doesn’t have the
> >> TCP_NODELAY patch. We have submitted it (and you can build it from
> latest
> >> rbd source) , but, I am not sure when it will be in linux mainline.
> >
> > From memory it landed in 3.19, but there are also several issues with
> max IO size, max nr_requests and readahead. I would suggest for testing,
> try one of these:-
> >
> >
> http://gitbuilder.ceph.com/kernel-deb-precise-x86_64-basic/ref/ra-bring-back/
> >
> >
> >>
> >> Thanks & Regards
> >> Somnath
> >>
> >> From: Rafael Lopez [mailto:[email protected]]
> >> Sent: Thursday, September 10, 2015 10:12 PM
> >> To: Somnath Roy
> >> Cc: [email protected]
> >> Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO
> >>
> >> Ok I ran the two tests again with direct=1, smaller block size (4k) and
> smaller
> >> total io (100m), disabled cache at ceph.conf side on client by adding:
> >>
> >> [client]
> >> rbd cache = false
> >> rbd cache max dirty = 0
> >> rbd cache size = 0
> >> rbd cache target dirty = 0
> >>
> >>
> >> The result seems to have swapped around, now the librbd job is running
> >> ~50% faster than the krbd job!
> >>
> >> ####### krbd job:
> >>
> >> [root@rcprsdc1r72-01-ac rafaell]# fio ext4_test
> >> job1: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=16
> >> fio-2.2.8
> >> Starting 1 process
> >> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/571KB/0KB /s] [0/142/0 iops]
> [eta
> >> 00m:00s]
> >> job1: (groupid=0, jobs=1): err= 0: pid=29095: Fri Sep 11 14:48:21 2015
> >>  write: io=102400KB, bw=647137B/s, iops=157, runt=162033msec
> >>    clat (msec): min=2, max=25, avg= 6.32, stdev= 1.21
> >>     lat (msec): min=2, max=25, avg= 6.32, stdev= 1.21
> >>    clat percentiles (usec):
> >>     |  1.00th=[ 2896],  5.00th=[ 4320], 10.00th=[ 4768], 20.00th=[
> 5536],
> >>     | 30.00th=[ 5920], 40.00th=[ 6176], 50.00th=[ 6432], 60.00th=[
> 6624],
> >>     | 70.00th=[ 6816], 80.00th=[ 7136], 90.00th=[ 7584], 95.00th=[
> 7968],
> >>     | 99.00th=[ 9024], 99.50th=[ 9664], 99.90th=[15808],
> 99.95th=[17536],
> >>     | 99.99th=[19328]
> >>    bw (KB  /s): min=  506, max= 1171, per=100.00%, avg=632.22,
> stdev=104.77
> >>    lat (msec) : 4=2.88%, 10=96.69%, 20=0.43%, 50=0.01%
> >>  cpu          : usr=0.17%, sys=0.71%, ctx=25634, majf=0, minf=35
> >>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >>> =64=0.0%
> >>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>> =64=0.0%
> >>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>> =64=0.0%
> >>     issued    : total=r=0/w=25600/d=0, short=r=0/w=0/d=0,
> >> drop=r=0/w=0/d=0
> >>     latency   : target=0, window=0, percentile=100.00%, depth=16
> >>
> >> Run status group 0 (all jobs):
> >>  WRITE: io=102400KB, aggrb=631KB/s, minb=631KB/s, maxb=631KB/s,
> >> mint=162033msec, maxt=162033msec
> >>
> >> Disk stats (read/write):
> >>  rbd0: ios=0/25638, merge=0/32, ticks=0/160765, in_queue=160745,
> >> util=99.11%
> >> [root@rcprsdc1r72-01-ac rafaell]#
> >>
> >> ###### librb job:
> >>
> >> [root@rcprsdc1r72-01-ac rafaell]# fio fio_rbd_test
> >> job1: (g=0): rw=rw, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=16
> >> fio-2.2.8
> >> Starting 1 process
> >> rbd engine: RBD version: 0.1.9
> >> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/703KB/0KB /s] [0/175/0 iops]
> [eta
> >> 00m:00s]
> >> job1: (groupid=0, jobs=1): err= 0: pid=30568: Fri Sep 11 14:50:24 2015
> >>  write: io=102400KB, bw=950141B/s, iops=231, runt=110360msec
> >>    slat (usec): min=70, max=992, avg=115.05, stdev=30.07
> >>    clat (msec): min=13, max=117, avg=67.91, stdev=24.93
> >>     lat (msec): min=13, max=117, avg=68.03, stdev=24.93
> >>    clat percentiles (msec):
> >>     |  1.00th=[   19],  5.00th=[   26], 10.00th=[   38], 20.00th=[
>  40],
> >>     | 30.00th=[   46], 40.00th=[   62], 50.00th=[   77], 60.00th=[
>  85],
> >>     | 70.00th=[   88], 80.00th=[   91], 90.00th=[   95], 95.00th=[
>  99],
> >>     | 99.00th=[  105], 99.50th=[  110], 99.90th=[  116], 99.95th=[
> 117],
> >>     | 99.99th=[  118]
> >>    bw (KB  /s): min=  565, max= 3174, per=100.00%, avg=935.74,
> stdev=407.67
> >>    lat (msec) : 20=2.41%, 50=29.85%, 100=64.46%, 250=3.29%
> >>  cpu          : usr=2.43%, sys=0.29%, ctx=7847, majf=0, minf=2750
> >>  IO depths    : 1=6.2%, 2=12.5%, 4=25.0%, 8=50.0%, 16=6.2%, 32=0.0%,
> >>> =64=0.0%
> >>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>> =64=0.0%
> >>     complete  : 0=0.0%, 4=94.1%, 8=0.0%, 16=5.9%, 32=0.0%, 64=0.0%,
> >>> =64=0.0%
> >>     issued    : total=r=0/w=25600/d=0, short=r=0/w=0/d=0,
> >> drop=r=0/w=0/d=0
> >>     latency   : target=0, window=0, percentile=100.00%, depth=16
> >>
> >> Run status group 0 (all jobs):
> >>  WRITE: io=102400KB, aggrb=927KB/s, minb=927KB/s, maxb=927KB/s,
> >> mint=110360msec, maxt=110360msec
> >>
> >> Disk stats (read/write):
> >>    dm-1: ios=240/369, merge=0/0, ticks=742/40, in_queue=782, util=0.38%,
> >> aggrios=240/379, aggrmerge=0/19, aggrticks=742/41, aggrin_queue=783,
> >> aggrutil=0.39%
> >>  sda: ios=240/379, merge=0/19, ticks=742/41, in_queue=783, util=0.39%
> >> [root@rcprsdc1r72-01-ac rafaell]#
> >>
> >>
> >>
> >> Confirmed speed (at least for krbd) using dd:
> >> [root@rcprsdc1r72-01-ac rafaell]# dd if=/mnt/ssd/random100g
> >> of=/mnt/rbd/dd_io_test bs=4k count=10000 oflag=direct
> >> 10000+0 records in
> >> 10000+0 records out
> >> 40960000 bytes (41 MB) copied, 64.9799 s, 630 kB/s
> >> [root@rcprsdc1r72-01-ac rafaell]#
> >>
> >>
> >> Back to FIO, it's worse for 1M block size (librbd is about ~100% better
> perf).
> >> 1M librbd:
> >> Run status group 0 (all jobs):
> >>  WRITE: io=1024.0MB, aggrb=112641KB/s, minb=112641KB/s,
> >> maxb=112641KB/s, mint=9309msec, maxt=9309msec
> >>
> >> 1M krbd:
> >> Run status group 0 (all jobs):
> >>  WRITE: io=1024.0MB, aggrb=49939KB/s, minb=49939KB/s, maxb=49939KB/s,
> >> mint=20997msec, maxt=20997msec
> >>
> >> Raf
> >>
> >> On 11 September 2015 at 14:33, Somnath Roy <[email protected]>
> >> wrote:
> >> Only changing client side ceph.conf and rerunning the tests is
> sufficient.
> >>
> >> Thanks & Regards
> >> Somnath
> >>
> >> From: Rafael Lopez [mailto:[email protected]]
> >> Sent: Thursday, September 10, 2015 8:58 PM
> >> To: Somnath Roy
> >> Cc: [email protected]
> >> Subject: Re: [ceph-users] bad perf for librbd vs krbd using FIO
> >>
> >> Thanks for the quick reply Somnath, will give this a try.
> >>
> >> In order to set the rbd cache settings, is it a matter of updating the
> ceph.conf
> >> file on the client only prior to running the test, or do I need to
> inject args to all
> >> OSDs ?
> >>
> >> Raf
> >>
> >>
> >> On 11 September 2015 at 13:39, Somnath Roy <[email protected]>
> >> wrote:
> >> It may be due to rbd cache effect..
> >> Try the following..
> >>
> >> Run your test with direct = 1 both the cases and rbd_cache = false
> (disable all
> >> other rbd cache option as well). This should give you similar result
> like krbd.
> >>
> >> In direct =1 case, we saw ~10-20% degradation if we make rbd_cache =
> true.
> >> But, direct = 0 case, it could be more as you are seeing..
> >>
> >> I think there is a delta (or need to tune properly) if you want to use
> rbd
> >> cache.
> >>
> >> Thanks & Regards
> >> Somnath
> >>
> >>
> >>
> >> From: ceph-users [mailto:[email protected]] On Behalf
> Of
> >> Rafael Lopez
> >> Sent: Thursday, September 10, 2015 8:24 PM
> >> To: [email protected]
> >> Subject: [ceph-users] bad perf for librbd vs krbd using FIO
> >>
> >> Hi all,
> >>
> >> I am seeing a big discrepancy between librbd and kRBD/ext4 performance
> >> using FIO with single RBD image. RBD images are coming from same RBD
> >> pool, same size and settings for both. The librbd results are quite bad
> by
> >> comparison, and in addition if I scale up the kRBD FIO job with more
> >> jobs/threads it increases up to 3-4x results below, but librbd doesn't
> seem to
> >> scale much at all. I figured that it should be close to the kRBD result
> for a
> >> single job/thread before parallelism comes into play though. RBD cache
> >> settings are all default.
> >>
> >> I can see some obvious differences in FIO output, but not being well
> versed
> >> with FIO I'm not sure what to make of it or where to start diagnosing
> the
> >> discrepancy. Hunted around but haven't found anything useful, any
> >> suggestions/insights would be appreciated.
> >>
> >> RBD cache settings:
> >> [root@rcmktdc1r72-09-ac rafaell]# ceph --admin-daemon
> >> /var/run/ceph/ceph-osd.659.asok config show | grep rbd_cache
> >>    "rbd_cache": "true",
> >>    "rbd_cache_writethrough_until_flush": "true",
> >>    "rbd_cache_size": "33554432",
> >>    "rbd_cache_max_dirty": "25165824",
> >>    "rbd_cache_target_dirty": "16777216",
> >>    "rbd_cache_max_dirty_age": "1",
> >>    "rbd_cache_max_dirty_object": "0",
> >>    "rbd_cache_block_writes_upfront": "false",
> >> [root@rcmktdc1r72-09-ac rafaell]#
> >>
> >> This is the FIO job file for the kRBD job:
> >>
> >> [root@rcprsdc1r72-01-ac rafaell]# cat ext4_test
> >> ; -- start job file --
> >> [global]
> >> rw=rw
> >> size=100g
> >> filename=/mnt/rbd/fio_test_file_ext4
> >> rwmixread=0
> >> rwmixwrite=100
> >> percentage_random=0
> >> bs=1024k
> >> direct=0
> >> iodepth=16
> >> thread=1
> >> numjobs=1
> >> [job1]
> >> ; -- end job file --
> >>
> >> [root@rcprsdc1r72-01-ac rafaell]#
> >>
> >> This is the FIO job file for the librbd job:
> >>
> >> [root@rcprsdc1r72-01-ac rafaell]# cat fio_rbd_test
> >> ; -- start job file --
> >> [global]
> >> rw=rw
> >> size=100g
> >> rwmixread=0
> >> rwmixwrite=100
> >> percentage_random=0
> >> bs=1024k
> >> direct=0
> >> iodepth=16
> >> thread=1
> >> numjobs=1
> >> ioengine=rbd
> >> rbdname=nas1-rds-stg31
> >> pool=rbd
> >> [job1]
> >> ; -- end job file --
> >>
> >>
> >> Here are the results:
> >>
> >> [root@rcprsdc1r72-01-ac rafaell]# fio ext4_test
> >> job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=sync, iodepth=16
> >> fio-2.2.8
> >> Starting 1 thread
> >> job1: Laying out IO file(s) (1 file(s) / 102400MB)
> >> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/321.7MB/0KB /s] [0/321/0 iops]
> [eta
> >> 00m:00s]
> >> job1: (groupid=0, jobs=1): err= 0: pid=37981: Fri Sep 11 12:33:13 2015
> >>  write: io=102400MB, bw=399741KB/s, iops=390, runt=262314msec
> >>    clat (usec): min=411, max=574082, avg=2492.91, stdev=7316.96
> >>     lat (usec): min=418, max=574113, avg=2520.12, stdev=7318.53
> >>    clat percentiles (usec):
> >>     |  1.00th=[  446],  5.00th=[  458], 10.00th=[  474], 20.00th=[
> 510],
> >>     | 30.00th=[ 1064], 40.00th=[ 1096], 50.00th=[ 1160], 60.00th=[
> 1320],
> >>     | 70.00th=[ 1592], 80.00th=[ 2448], 90.00th=[ 7712], 95.00th=[
> 7904],
> >>     | 99.00th=[11072], 99.50th=[11712], 99.90th=[13120],
> 99.95th=[73216],
> >>     | 99.99th=[464896]
> >>    bw (KB  /s): min=  264, max=2156544, per=100.00%, avg=412986.27,
> >> stdev=375092.66
> >>    lat (usec) : 500=18.68%, 750=7.43%, 1000=2.11%
> >>    lat (msec) : 2=48.89%, 4=4.35%, 10=16.79%, 20=1.67%, 50=0.03%
> >>    lat (msec) : 100=0.03%, 250=0.02%, 500=0.01%, 750=0.01%
> >>  cpu          : usr=1.24%, sys=45.38%, ctx=19298, majf=0, minf=974
> >>  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> >>> =64=0.0%
> >>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>> =64=0.0%
> >>     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>> =64=0.0%
> >>     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0,
> >> drop=r=0/w=0/d=0
> >>     latency   : target=0, window=0, percentile=100.00%, depth=16
> >>
> >> Run status group 0 (all jobs):
> >>  WRITE: io=102400MB, aggrb=399740KB/s, minb=399740KB/s,
> >> maxb=399740KB/s, mint=262314msec, maxt=262314msec
> >>
> >> Disk stats (read/write):
> >>  rbd0: ios=0/150890, merge=0/49, ticks=0/36117700, in_queue=36145277,
> >> util=96.97%
> >> [root@rcprsdc1r72-01-ac rafaell]#
> >>
> >> [root@rcprsdc1r72-01-ac rafaell]# fio fio_rbd_test
> >> job1: (g=0): rw=rw, bs=1M-1M/1M-1M/1M-1M, ioengine=rbd, iodepth=16
> >> fio-2.2.8
> >> Starting 1 thread
> >> rbd engine: RBD version: 0.1.9
> >> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/65405KB/0KB /s] [0/63/0 iops]
> [eta
> >> 00m:00s]
> >> job1: (groupid=0, jobs=1): err= 0: pid=43960: Fri Sep 11 12:54:25 2015
> >>  write: io=102400MB, bw=121882KB/s, iops=119, runt=860318msec
> >>    slat (usec): min=355, max=7300, avg=908.97, stdev=361.02
> >>    clat (msec): min=11, max=1468, avg=129.59, stdev=130.68
> >>     lat (msec): min=12, max=1468, avg=130.50, stdev=130.69
> >>    clat percentiles (msec):
> >>     |  1.00th=[   21],  5.00th=[   26], 10.00th=[   29], 20.00th=[
>  34],
> >>     | 30.00th=[   37], 40.00th=[   40], 50.00th=[   44], 60.00th=[
>  63],
> >>     | 70.00th=[  233], 80.00th=[  241], 90.00th=[  269], 95.00th=[
> 367],
> >>     | 99.00th=[  553], 99.50th=[  652], 99.90th=[  832], 99.95th=[
> 848],
> >>     | 99.99th=[ 1369]
> >>    bw (KB  /s): min=20363, max=248543, per=100.00%, avg=124381.19,
> >> stdev=42313.29
> >>    lat (msec) : 20=0.95%, 50=55.27%, 100=5.55%, 250=24.83%, 500=12.28%
> >>    lat (msec) : 750=0.89%, 1000=0.21%, 2000=0.01%
> >>  cpu          : usr=9.58%, sys=1.15%, ctx=23883, majf=0, minf=2751023
> >>  IO depths    : 1=1.2%, 2=3.0%, 4=9.7%, 8=68.3%, 16=17.8%, 32=0.0%,
> >>> =64=0.0%
> >>     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >>> =64=0.0%
> >>     complete  : 0=0.0%, 4=92.5%, 8=4.3%, 16=3.2%, 32=0.0%, 64=0.0%,
> >>> =64=0.0%
> >>     issued    : total=r=0/w=102400/d=0, short=r=0/w=0/d=0,
> >> drop=r=0/w=0/d=0
> >>     latency   : target=0, window=0, percentile=100.00%, depth=16
> >>
> >> Run status group 0 (all jobs):
> >>  WRITE: io=102400MB, aggrb=121882KB/s, minb=121882KB/s,
> >> maxb=121882KB/s, mint=860318msec, maxt=860318msec
> >>
> >> Disk stats (read/write):
> >>    dm-1: ios=0/2072, merge=0/0, ticks=0/233, in_queue=233, util=0.01%,
> >> aggrios=1/2249, aggrmerge=7/559, aggrticks=9/254, aggrin_queue=261,
> >> aggrutil=0.01%
> >>  sda: ios=1/2249, merge=7/559, ticks=9/254, in_queue=261, util=0.01%
> >> [root@rcprsdc1r72-01-ac rafaell]#
> >>
> >> Cheers,
> >> Raf
> >>
> >>
> >> --
> >> Rafael Lopez
> >> Data Storage Administrator
> >> Servers & Storage (eSolutions)
> >>
> >>
> >> ________________________________________
> >>
> >> PLEASE NOTE: The information contained in this electronic mail message
> is
> >> intended only for the use of the designated recipient(s) named above.
> If the
> >> reader of this message is not the intended recipient, you are hereby
> notified
> >> that you have received this message in error and that any review,
> >> dissemination, distribution, or copying of this message is strictly
> prohibited. If
> >> you have received this communication in error, please notify the sender
> by
> >> telephone or e-mail (as shown above) immediately and destroy any and all
> >> copies of this message in your possession (whether hard copies or
> >> electronically stored copies).
> >>
> >>
> >>
> >>
> >> --
> >> Rafael Lopez
> >> Data Storage Administrator
> >> Servers & Storage (eSolutions)
> >> +61 3 990 59118
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Rafael Lopez
> >> Data Storage Administrator
> >> Servers & Storage (eSolutions)
> >> +61 3 990 59118
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to