Dear moderator, i subscribed to ceph list today, could you please post my
message?

---------- Forwarded message ----------
From: Sergey Kotov <grayc...@gmail.com>
Date: 2018-03-06 10:52 GMT+03:00
Subject: [ceph bad performance], can't find a bottleneck
To: ceph-users@lists.ceph.com
Cc: Житенев Алексей <zhitene...@wipline.ru>, Anna Anikina <anik...@gmail.com
>


Good day.

Can you please help us to find bottleneck in our ceph installations.
We have 3 SSD-only clusters with different HW, but situation is the same -
overall i/o operations between client & ceph lower than 1/6 of summary
performance all ssd.

For example -
One of our cluster has 4-nodes with ssd Toshiba 2Tb Enterprise drives,
installed on Ubuntu server 16.04.
Servers are connected to the 10G switches. Latency between modes is about
0.1ms. Ethernet utilisation is low.

# uname -a
Linux storage01 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC
2017 x86_64 x86_64 x86_64 GNU/Linux

# ceph osd versions
{
    "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 55
}


When we map rbd image direct on the storage nodes via krbd, performance is
not good enough.
We use fio for testing. Even we run randwrite with 4k block size test in
multiple thread mode, our drives don't have utilisation higher then 30% and
lat is ok.

At the same time iostat tool displays 100% utilisation on /dev/rbdX.

Also we can't enable rbd_cache, because of using scst iscsi over rbd mapped
images.

How can we resolve the issue?

Ceph config:

[global]
fsid = beX482fX-6a91-46dX-ad22-21a8a2696abX
mon_initial_members = storage01, storage02, storage03
mon_host = X.Y.Z.1,X.Y.Z.2,X.Y.Z.3
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public_network = X.Y.Z.0/24
filestore_xattr_use_omap = true
osd_pool_default_size = 2
osd_pool_default_min_size = 1
osd_pool_default_pg_num = 1024
osd_journal_size = 10240
osd_mkfs_type = xfs
filestore_op_threads = 16
filestore_wbthrottle_enable = False
throttler_perf_counter = False
osd crush update on start = false

[osd]
osd_scrub_begin_hour = 1
osd_scrub_end_hour = 6
osd_scrub_priority = 1

osd_enable_op_tracker = False
osd_max_backfills = 1
osd heartbeat grace = 20
osd heartbeat interval = 5
osd recovery max active = 1
osd recovery max single start = 1
osd recovery op priority = 1
osd recovery threads = 1
osd backfill scan max = 16
osd backfill scan min = 4
osd max scrubs = 1
osd scrub interval randomize ratio = 1.0
osd disk thread ioprio class = idle
osd disk thread ioprio priority = 0
osd scrub chunk max = 1
osd scrub chunk min = 1
osd deep scrub stride = 1048576
osd scrub load threshold = 5.0
osd scrub sleep = 0.1

[client]
rbd_cache = false


Sample fio tests:

root@storage04:~# fio --name iops --rw randread --bs 4k --filename
/dev/rbd2 --numjobs 12 --ioengine=libaio --group_reporting
iops: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
...
fio-2.2.10
Starting 12 processes
^Cbs: 12 (f=12): [r(12)] [1.2% done] [128.4MB/0KB/0KB /s] [32.9K/0/0 iops]
[eta 16m:49s]
fio: terminating on signal 2

iops: (groupid=0, jobs=12): err= 0: pid=29812: Sun Feb 11 23:59:19 2018
  read : io=1367.8MB, bw=126212KB/s, iops=31553, runt= 11097msec
    slat (usec): min=1, max=59700, avg=375.92, stdev=495.19
    clat (usec): min=0, max=377, avg= 1.12, stdev= 3.16
     lat (usec): min=1, max=59702, avg=377.61, stdev=495.32
    clat percentiles (usec):
     |  1.00th=[    0],  5.00th=[    0], 10.00th=[    1], 20.00th=[    1],
     | 30.00th=[    1], 40.00th=[    1], 50.00th=[    1], 60.00th=[    1],
     | 70.00th=[    1], 80.00th=[    1], 90.00th=[    1], 95.00th=[    2],
     | 99.00th=[    2], 99.50th=[    2], 99.90th=[   73], 99.95th=[   78],
     | 99.99th=[  115]
    bw (KB  /s): min= 8536, max=11944, per=8.33%, avg=10516.45, stdev=635.32
    lat (usec) : 2=91.74%, 4=7.93%, 10=0.14%, 20=0.09%, 50=0.01%
    lat (usec) : 100=0.07%, 250=0.03%, 500=0.01%
  cpu          : usr=1.32%, sys=3.69%, ctx=329556, majf=0, minf=134
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued    : total=r=350144/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: io=1367.8MB, aggrb=126212KB/s, minb=126212KB/s, maxb=126212KB/s,
mint=11097msec, maxt=11097msec

Disk stats (read/write):
  rbd2: ios=323072/0, merge=0/0, ticks=124268/0, in_queue=124680,
util=99.31%


root@storage04:~# fio --name iops --rw randwrite --bs 4k --filename
/dev/rbd2 --numjobs 12 --ioengine=libaio --group_reporting
iops: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
...
fio-2.2.10
Starting 12 processes
^Cbs: 12 (f=12): [w(12)] [25.0% done] [0KB/713.5MB/0KB /s] [0/183K/0 iops]
[eta 00m:45s]
fio: terminating on signal 2
fio: job 'iops' hasn't exited in 60 seconds, it appears to be stuck. Doing
forceful exit of this job.

iops: (groupid=0, jobs=12): err= 0: pid=30327: Mon Feb 12 00:01:04 2018
  write: io=32020MB, bw=2172.5MB/s, iops=556149, runt= 14739msec
    slat (usec): min=1, max=255011, avg=20.00, stdev=1110.12
    clat (usec): min=0, max=4426, avg= 0.35, stdev= 2.32
     lat (usec): min=1, max=255015, avg=20.43, stdev=1110.18
    clat percentiles (usec):
     |  1.00th=[    0],  5.00th=[    0], 10.00th=[    0], 20.00th=[    0],
     | 30.00th=[    0], 40.00th=[    0], 50.00th=[    0], 60.00th=[    0],
     | 70.00th=[    1], 80.00th=[    1], 90.00th=[    1], 95.00th=[    1],
     | 99.00th=[    1], 99.50th=[    1], 99.90th=[    2], 99.95th=[    3],
     | 99.99th=[   81]
    bw (KB  /s): min=16895, max=641688, per=8.68%, avg=193110.20,
stdev=150217.09
    lat (usec) : 2=99.88%, 4=0.08%, 10=0.01%, 20=0.01%, 50=0.01%
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 10=0.01%
  cpu          : usr=5.51%, sys=23.27%, ctx=8723, majf=0, minf=3528
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued    : total=r=0/w=8197085/d=0, short=r=0/w=0/d=0,
drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=32020MB, aggrb=2172.5MB/s, minb=2172.5MB/s, maxb=2172.5MB/s,
mint=14739msec, maxt=14739msec

Disk stats (read/write):
  rbd2: ios=0/30044, merge=0/330950, ticks=0/1601972, in_queue=1603328,
util=87.57%


root@storage04:~# fio --name iops --rw randwrite --bs 4k --filename
/dev/rbd2 --numjobs 12 --ioengine=libaio --group_reporting --direct=1
iops: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1
...
fio-2.2.10
Starting 12 processes
^Cbs: 12 (f=12): [w(12)] [0.5% done] [0KB/14304KB/0KB /s] [0/3576/0 iops]
[eta 03h:42m:23s]]
fio: terminating on signal 2

iops: (groupid=0, jobs=12): err= 0: pid=31552: Mon Feb 12 00:02:39 2018
  write: io=617132KB, bw=9554.8KB/s, iops=2388, runt= 64589msec
    slat (usec): min=3, max=463, avg=11.56, stdev=14.17
    clat (usec): min=930, max=999957, avg=5004.58, stdev=30299.43
     lat (usec): min=938, max=999975, avg=5016.50, stdev=30299.58
    clat percentiles (usec):
     |  1.00th=[ 1192],  5.00th=[ 1304], 10.00th=[ 1400], 20.00th=[ 1512],
     | 30.00th=[ 1608], 40.00th=[ 1720], 50.00th=[ 1832], 60.00th=[ 1960],
     | 70.00th=[ 2160], 80.00th=[ 2480], 90.00th=[ 4256], 95.00th=[12096],
     | 99.00th=[51968], 99.50th=[81408], 99.90th=[733184], 99.95th=[864256],
     | 99.99th=[946176]
    bw (KB  /s): min=    4, max= 2251, per=9.30%, avg=888.88, stdev=453.38
    lat (usec) : 1000=0.02%
    lat (msec) : 2=62.38%, 4=26.86%, 10=5.16%, 20=1.82%, 50=2.66%
    lat (msec) : 100=0.75%, 250=0.22%, 500=0.03%, 750=0.01%, 1000=0.09%
  cpu          : usr=0.24%, sys=0.54%, ctx=154351, majf=0, minf=149
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued    : total=r=0/w=154283/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=617132KB, aggrb=9554KB/s, minb=9554KB/s, maxb=9554KB/s,
mint=64589msec, maxt=64589msec

Disk stats (read/write):
  rbd2: ios=0/154277, merge=0/0, ticks=0/767860, in_queue=768740,
util=99.94%


root@storage04:~# fio --name iops --rw randwrite --bs 4k --filename
/dev/rbd2 --numjobs 12 --iodepth=32 --ioengine=libaio --group_reporting
--direct=1 --runtime=100
iops: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32
...
fio-2.2.10
Starting 12 processes
Jobs: 12 (f=12): [w(12)] [100.0% done] [0KB/49557KB/0KB /s] [0/12.4K/0
iops] [eta 00m:00s]
iops: (groupid=0, jobs=12): err= 0: pid=32632: Mon Feb 12 00:04:51 2018
  write: io=3093.6MB, bw=31675KB/s, iops=7918, runt=100009msec
    slat (usec): min=2, max=973942, avg=1510.77, stdev=18783.09
    clat (msec): min=1, max=1719, avg=46.97, stdev=111.64
     lat (msec): min=1, max=1719, avg=48.49, stdev=113.62
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[    8], 10.00th=[   10], 20.00th=[   13],
     | 30.00th=[   16], 40.00th=[   18], 50.00th=[   21], 60.00th=[   25],
     | 70.00th=[   31], 80.00th=[   43], 90.00th=[   68], 95.00th=[  131],
     | 99.00th=[  742], 99.50th=[  824], 99.90th=[  963], 99.95th=[  988],
     | 99.99th=[ 1045]
    bw (KB  /s): min=    6, max= 8320, per=9.51%, avg=3011.84, stdev=1956.64
    lat (msec) : 2=0.07%, 4=0.55%, 10=10.28%, 20=37.47%, 50=35.89%
    lat (msec) : 100=9.47%, 250=3.45%, 500=0.61%, 750=1.26%, 1000=0.92%
    lat (msec) : 2000=0.03%
  cpu          : usr=0.28%, sys=0.58%, ctx=265641, majf=0, minf=146
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
     issued    : total=r=0/w=791939/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: io=3093.6MB, aggrb=31674KB/s, minb=31674KB/s, maxb=31674KB/s,
mint=100009msec, maxt=100009msec

Disk stats (read/write):
  rbd2: ios=0/789529, merge=0/0, ticks=0/12345428, in_queue=12383232,
util=100.00%



root@storage04:~# fio --name iops --rw randwrite --bs 21k --filename
/dev/rbd2 --numjobs 64 --iodepth=4 --ioengine=libaio --group_reporting
--direct=1 --runtime=100
iops: (g=0): rw=randwrite, bs=21K-21K/21K-21K/21K-21K, ioengine=libaio,
iodepth=4
...
fio-2.2.10
Starting 64 processes
^Cbs: 64 (f=64): [w(64)] [28.7% done] [0KB/98028KB/0KB /s] [0/4668/0 iops]
[eta 01m:12s]
fio: terminating on signal 2
Jobs: 9 (f=9): 
[E(1),w(1),_(2),w(1),_(3),w(2),_(2),w(1),_(15),w(1),_(7),w(1),_(13),w(1),_(2),w(1),_(10)]
[1.9% done] [0KB/4804KB/0KB /s] [0/228/0 iops] [eta 27m:16s]
iops: (groupid=0, jobs=64): err= 0: pid=5221: Mon Feb 12 00:10:52 2018
  write: io=1967.9MB, bw=67393KB/s, iops=3209, runt= 29900msec
    slat (usec): min=3, max=1211.3K, avg=14387.91, stdev=58236.34
    clat (msec): min=1, max=1711, avg=63.67, stdev=141.68
     lat (msec): min=1, max=1711, avg=78.06, stdev=159.64
    clat percentiles (usec):
     |  1.00th=[ 1944],  5.00th=[ 2864], 10.00th=[ 4192], 20.00th=[ 7136],
     | 30.00th=[11200], 40.00th=[16768], 50.00th=[23680], 60.00th=[33536],
     | 70.00th=[46848], 80.00th=[66048], 90.00th=[110080], 95.00th=[205824],
     | 99.00th=[790528], 99.50th=[872448], 99.90th=[1253376], 99.95th=[
1368064],
     | 99.99th=[1531904]
    bw (KB  /s): min=   23, max= 3906, per=1.77%, avg=1189.80, stdev=837.25
    lat (msec) : 2=1.25%, 4=8.01%, 10=18.08%, 20=17.69%, 50=27.03%
    lat (msec) : 100=16.47%, 250=6.99%, 500=0.94%, 750=2.15%, 1000=1.20%
    lat (msec) : 2000=0.19%
  cpu          : usr=0.06%, sys=0.11%, ctx=173237, majf=0, minf=787
  IO depths    : 1=0.1%, 2=0.1%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued    : total=r=0/w=95955/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: io=1967.9MB, aggrb=67393KB/s, minb=67393KB/s, maxb=67393KB/s,
mint=29900msec, maxt=29900msec

Disk stats (read/write):
  rbd2: ios=0/95945, merge=0/0, ticks=0/3672960, in_queue=3685732,
util=99.66%


root@storage04:~# fio --name iops --rw randwrite --bs 21k --filename
/dev/rbd2 --numjobs 32 --iodepth=4 --ioengine=libaio --group_reporting
--direct=1 --runtime=100
iops: (g=0): rw=randwrite, bs=21K-21K/21K-21K/21K-21K, ioengine=libaio,
iodepth=4
...
fio-2.2.10
Starting 32 processes
^Cbs: 32 (f=32): [w(32)] [11.9% done] [0KB/40866KB/0KB /s] [0/1946/0 iops]
[eta 01m:29s]
fio: terminating on signal 2
Jobs: 32 (f=30): [w(32)] [12.9% done] [0KB/53676KB/0KB /s] [0/2556/0 iops]
[eta 01m:28s]
iops: (groupid=0, jobs=32): err= 0: pid=6344: Mon Feb 12 00:12:01 2018
  write: io=820680KB, bw=66912KB/s, iops=3186, runt= 12265msec
    slat (usec): min=3, max=842, avg=13.14, stdev=15.18
    clat (msec): min=1, max=1270, avg=38.67, stdev=114.69
     lat (msec): min=1, max=1270, avg=38.68, stdev=114.69
    clat percentiles (usec):
     |  1.00th=[ 1976],  5.00th=[ 2416], 10.00th=[ 3024], 20.00th=[ 4384],
     | 30.00th=[ 6048], 40.00th=[ 8032], 50.00th=[10688], 60.00th=[14400],
     | 70.00th=[19584], 80.00th=[30080], 90.00th=[57088], 95.00th=[112128],
     | 99.00th=[765952], 99.50th=[839680], 99.90th=[954368],
99.95th=[987136],
     | 99.99th=[1089536]
    bw (KB  /s): min=   30, max= 6300, per=3.58%, avg=2397.65, stdev=1548.34
    lat (msec) : 2=1.16%, 4=16.01%, 10=30.33%, 20=22.95%, 50=17.40%
    lat (msec) : 100=6.82%, 250=1.96%, 500=1.46%, 750=0.82%, 1000=1.07%
    lat (msec) : 2000=0.03%
  cpu          : usr=0.13%, sys=0.21%, ctx=38649, majf=0, minf=372
  IO depths    : 1=0.1%, 2=0.2%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued    : total=r=0/w=39080/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: io=820680KB, aggrb=66912KB/s, minb=66912KB/s, maxb=66912KB/s,
mint=12265msec, maxt=12265msec

Disk stats (read/write):
  rbd2: ios=0/39077, merge=0/0, ticks=0/1504048, in_queue=1507272,
util=99.17%



root@storage04:~# fio --name iops --rw randwrite --bs 21k --filename
/dev/rbd2 --numjobs 8 --iodepth=12 --ioengine=libaio --group_reporting
--direct=1 --runtime=100
iops: (g=0): rw=randwrite, bs=21K-21K/21K-21K/21K-21K, ioengine=libaio,
iodepth=12
...
fio-2.2.10
Starting 8 processes
^Cbs: 8 (f=8): [w(8)] [20.8% done] [0KB/34650KB/0KB /s] [0/1650/0 iops]
[eta 01m:20s]
fio: terminating on signal 2
Jobs: 2 (f=2): [_(5),w(2),_(1)] [10.1% done] [0KB/1554KB/0KB /s] [0/74/0
iops] [eta 03m:24s]
iops: (groupid=0, jobs=8): err= 0: pid=6770: Mon Feb 12 00:12:32 2018
  write: io=979755KB, bw=44563KB/s, iops=2122, runt= 21986msec
    slat (usec): min=3, max=551, avg=12.92, stdev=14.78
    clat (msec): min=1, max=1531, avg=44.14, stdev=136.44
     lat (msec): min=1, max=1531, avg=44.15, stdev=136.44
    clat percentiles (usec):
     |  1.00th=[ 1944],  5.00th=[ 2320], 10.00th=[ 2672], 20.00th=[ 3568],
     | 30.00th=[ 4768], 40.00th=[ 6240], 50.00th=[ 8640], 60.00th=[12224],
     | 70.00th=[18816], 80.00th=[31872], 90.00th=[67072], 95.00th=[154624],
     | 99.00th=[839680], 99.50th=[913408], 99.90th=[1073152], 99.95th=[
1368064],
     | 99.99th=[1466368]
    bw (KB  /s): min=   26, max=18023, per=14.47%, avg=6449.72,
stdev=5032.09
    lat (msec) : 2=1.37%, 4=22.49%, 10=30.69%, 20=16.94%, 50=15.12%
    lat (msec) : 100=6.72%, 250=3.13%, 500=0.73%, 750=1.14%, 1000=1.41%
    lat (msec) : 2000=0.25%
  cpu          : usr=0.32%, sys=0.61%, ctx=44124, majf=0, minf=97
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=99.9%, 16=0.0%, 32=0.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
>=64=0.0%
     issued    : total=r=0/w=46655/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=12

Run status group 0 (all jobs):
  WRITE: io=979755KB, aggrb=44562KB/s, minb=44562KB/s, maxb=44562KB/s,
mint=21986msec, maxt=21986msec

Disk stats (read/write):
  rbd2: ios=0/46652, merge=0/0, ticks=0/2037428, in_queue=2043236,
util=99.65%
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to