Re: when an osd is started up, IO will be blocked

wangsongbo Sun, 25 Oct 2015 22:28:25 -0700

Hi all,

When an osd is started, I will get a lot of slow requests from thecorresponding osd log, as follows:

2015-10-26 03:42:51.593961 osd.4 [WRN] slow request 3.967808 secondsold, received at 2015-10-26 03:42:47.625968:osd_repop(client.2682003.0:2686048 43.fcfd1ddfcf/rbd_data.196483222ac2db.0000000000000010/head//43 v 9744'347845)currently commit_sent2015-10-26 03:42:51.593964 osd.4 [WRN] slow request 3.964537 secondsold, received at 2015-10-26 03:42:47.629239:osd_repop(client.2682003.0:2686049 43.b4bcbcbbb4b/rbd_data.196483222ac2db.000000000000020b/head//43 v9744'193029) currently commit_sent2015-10-26 03:42:52.594166 osd.4 [WRN] 40 slow requests, 17 includedbelow; oldest blocked for > 53.692556 secs2015-10-26 03:42:52.594172 osd.4 [WRN] slow request 2.272928 secondsold, received at 2015-10-26 03:42:50.321151:osd_repop(client.3684690.0:191908 43.540f1858540/rbd_data.1fc5ca7429fc17.0000000000000280/head//43 v 9744'63645)currently commit_sent2015-10-26 03:42:52.594175 osd.4 [WRN] slow request 2.270618 secondsold, received at 2015-10-26 03:42:50.323461:osd_op(client.3684690.0:191911 rbd_data.1fc5ca7429fc17.0000000000000209[write 2633728~4096] 43.72b9f039 ack+ondisk+write e9744) currentlycommit_sent2015-10-26 03:42:52.594264 osd.4 [WRN] slow request 4.968252 secondsold, received at 2015-10-26 03:42:47.625828:osd_repop(client.2682003.0:2686047 43.b4bcbcbbb4b/rbd_data.196483222ac2db.000000000000020b/head//43 v9744'193028) currently commit_sent2015-10-26 03:42:52.594266 osd.4 [WRN] slow request 4.968111 secondsold, received at 2015-10-26 03:42:47.625968:osd_repop(client.2682003.0:2686048 43.fcfd1ddfcf/rbd_data.196483222ac2db.0000000000000010/head//43 v 9744'347845)currently commit_sent2015-10-26 03:42:52.594318 osd.4 [WRN] slow request 4.964841 secondsold, received at 2015-10-26 03:42:47.629239:osd_repop(client.2682003.0:2686049 43.b4bcbcbbb4b/rbd_data.196483222ac2db.000000000000020b/head//43 v9744'193029) currently commit_sent2015-10-26 03:42:53.594527 osd.4 [WRN] 40 slow requests, 16 includedbelow; oldest blocked for > 54.692945 secs2015-10-26 03:42:53.594533 osd.4 [WRN] slow request 16.004669 secondsold, received at 2015-10-26 03:42:37.589800:osd_repop(client.2682003.0:2686041 43.b4bcbcbbb4b/rbd_data.196483222ac2db.000000000000020b/head//43 v9744'193024) currently commit_sent2015-10-26 03:42:53.594536 osd.4 [WRN] slow request 16.003889 secondsold, received at 2015-10-26 03:42:37.590580:osd_repop(client.2682003.0:2686040 43.fcfd1ddfcf/rbd_data.196483222ac2db.0000000000000010/head//43 v 9744'347842)currently commit_sent2015-10-26 03:42:53.594538 osd.4 [WRN] slow request 16.000954 secondsold, received at 2015-10-26 03:42:37.593515:osd_repop(client.2682003.0:2686042 43.b4bcbcbbb4b/rbd_data.196483222ac2db.000000000000020b/head//43 v9744'193025) currently commit_sent2015-10-26 03:42:53.594541 osd.4 [WRN] slow request 29.138828 secondsold, received at 2015-10-26 03:42:24.455641:osd_repop(client.4764855.0:65121 43.dbe169a9dbe/rbd_data.49a7a4633ac0b1.0000000000000021/head//43 v 9744'12509)currently commit_sent2015-10-26 03:42:53.594543 osd.4 [WRN] slow request 15.998814 secondsold, received at 2015-10-26 03:42:37.595656:osd_repop(client.1800547.0:1205399 43.cc59285ecc5/rbd_data.1b794560c6e2ea.00000000000000d0/head//43 v 9744'36732)currently commit_sent2015-10-26 03:42:54.594892 osd.4 [WRN] 39 slow requests, 17 includedbelow; oldest blocked for > 55.693227 secs2015-10-26 03:42:54.594908 osd.4 [WRN] slow request 4.273600 secondsold, received at 2015-10-26 03:42:50.321151:osd_repop(client.3684690.0:191908 43.540f1858540/rbd_data.1fc5ca7429fc17.0000000000000280/head//43 v 9744'63645)currently commit_sent2015-10-26 03:42:54.594911 osd.4 [WRN] slow request 4.271290 secondsold, received at 2015-10-26 03:42:50.323461:osd_op(client.3684690.0:191911 rbd_data.1fc5ca7429fc17.0000000000000209[write 2633728~4096] 43.72b9f039 ack+ondisk+write e9744) currentlycommit_sent


Meanwhile, I run fio process with the rbd ioengine.

The iops of read and write were too small to response any IO from thefio process,In other words, when an osd is started, the IO of the whole cluster willbe blocked.

Is there some parameter to adjust ?
How to explain this  problem?
The results of running fio process were as fllows:

ebs: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=rbd, iodepth=64
fio-2.2.9-20-g1520
Starting 1 thread
rbd engine: RBD version: 0.1.9

Jobs: 1 (f=1): [m(1)] [0.3% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta05h:10m:14s]

ebs: (groupid=0, jobs=1): err= 0: pid=40323: Mon Oct 26 04:02:00 2015
  read : io=10904KB, bw=175183B/s, *iops=21*, runt= 63737msec
    slat (usec): min=0, max=61, avg= 1.11, stdev= 3.16
    clat (msec): min=1, max=63452, avg=1190.04, stdev=6046.28
     lat (msec): min=1, max=63452, avg=1190.04, stdev=6046.28
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[    5], 20.00th=[    6],
     | 30.00th=[    6], 40.00th=[    7], 50.00th=[    8], 60.00th=[    9],
     | 70.00th=[   10], 80.00th=[   12], 90.00th=[   14], 95.00th=[ 709],
     | 99.00th=[16712], 99.50th=[16712], 99.90th=[16712], 99.95th=[16712],
     | 99.99th=[16712]

bw (KB /s): min= 129, max= 2038, per=100.00%, avg=774.00,stdev=1094.73

  write: io=24976KB, bw=401264B/s,*iops=48,* runt= 63737msec
    slat (usec): min=0, max=40, avg= 2.48, stdev= 3.30
    clat (msec): min=2, max=31379, avg=786.91, stdev=4829.02
     lat (msec): min=2, max=31379, avg=786.92, stdev=4829.02
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    6], 10.00th=[    6], 20.00th=[    8],
     | 30.00th=[    9], 40.00th=[    9], 50.00th=[   11], 60.00th=[   12],
     | 70.00th=[   13], 80.00th=[   15], 90.00th=[   19], 95.00th=[   29],
     | 99.00th=[16712], 99.50th=[16712], 99.90th=[16712], 99.95th=[16712],
     | 99.99th=[16712]

bw (KB /s): min= 317, max= 5228, per=100.00%, avg=1957.33,stdev=2832.48

    lat (msec) : 2=0.07%, 4=3.23%, 10=52.29%, 20=35.76%, 50=4.41%
    lat (msec) : 750=1.36%, 1000=0.02%, 2000=0.02%, >=2000=2.83%
  cpu          : usr=0.03%, sys=0.00%, ctx=228, majf=0, minf=19

IO depths : 1=0.1%, 2=0.1%, 4=0.4%, 8=3.9%, 16=18.9%, 32=73.1%,>=64=3.5%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,>=64=0.0%complete : 0=0.0%, 4=97.5%, 8=0.0%, 16=0.2%, 32=0.9%, 64=1.5%,>=64=0.0%issued : total=r=1363/w=3122/d=0, short=r=0/w=0/d=0,drop=r=0/w=0/d=0

     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):

READ: io=10904KB, aggrb=171KB/s, minb=171KB/s, maxb=171KB/s,mint=63737msec, maxt=63737msecWRITE: io=24976KB, aggrb=391KB/s, minb=391KB/s, maxb=391KB/s,mint=63737msec, maxt=63737msec


Disk stats (read/write):

dm-0: ios=81/34, merge=0/0, ticks=472/26, in_queue=498, util=0.30%,aggrios=143/102, aggrmerge=106/122, aggrticks=1209/134,aggrin_queue=1343, aggrutil=0.60%sdd: ios=143/102, merge=106/122, ticks=1209/134, in_queue=1343,util=0.60%



Thanks and Regards,
WangSongbo

On 15/10/22 下午10:43, wangsongbo wrote:

Hi all,

When an osd is started, relative IO will be blocked.
According to the test result,the larger iops the clients send , thelonger it will take to elapse.Adjustment on all the parameters associate with recovery operationswas also found useless.
How to reduce the impact of this process on the IO ?

Thanks and Regards,
WangSongbo


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: when an osd is started up, IO will be blocked

Reply via email to