On 2017-10-14 05:02, J David wrote:
> Thanks all for input on this.
>
> It's taken a couple of weeks, but based on the feedback from the list,
> we've got our version of a scrub-one-at-a-time cron script running and
> confirmed that it's working properly.
>
> Unfortunately, this hasn't really solved the real problem. Even with
> just one scrub and one client running, client I/O requests routinely
> take 30-60 seconds to complete (read or write), which is so poor that
> the cluster is unusable for any sort of interactive activity. Nobody
> is going to sit around and wait 30-60 seconds for a file to save or
> load, or for a web server to respond, or a SQL query to finish.
>
> Running "ceph -w" blames this on slow requests blocked for > 32 seconds:
>
> 2017-10-13 21:21:34.445798 mon.ceph1 [INF] overall HEALTH_OK
> 2017-10-13 21:21:51.305661 mon.ceph1 [WRN] Health check failed: 42
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:21:57.311892 mon.ceph1 [WRN] Health check update: 140
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:03.343443 mon.ceph1 [WRN] Health check update: 111
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:01.833605 osd.5 [WRN] 1 slow requests, 1 included
> below; oldest blocked for > 30.526819 secs
> 2017-10-13 21:22:01.833614 osd.5 [WRN] slow request 30.526819 seconds
> old, received at 2017-10-13 21:21:31.306718:
> osd_op(client.6104975.0:7330926 0.a2
> 0:456218c9:::rbd_data.1a24832ae8944a.000000000009d21d:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write
> 2364416~88064] snapc 0=[] ondisk+write+known_if_redirected e18866)
> currently sub_op_commit_rec from 9
> 2017-10-13 21:22:11.238561 mon.ceph1 [WRN] Health check update: 24
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:04.834075 osd.5 [WRN] 1 slow requests, 1 included
> below; oldest blocked for > 30.291869 secs
> 2017-10-13 21:22:04.834082 osd.5 [WRN] slow request 30.291869 seconds
> old, received at 2017-10-13 21:21:34.542137:
> osd_op(client.6104975.0:7331703 0.a2
> 0:4571f0f6:::rbd_data.1a24832ae8944a.000000000009c8ef:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write
> 2934272~46592] snapc 0=[] ondisk+write+known_if_redirected e18866)
> currently op_applied
> 2017-10-13 21:22:07.834445 osd.5 [WRN] 1 slow requests, 1 included
> below; oldest blocked for > 30.421122 secs
> 2017-10-13 21:22:07.834452 osd.5 [WRN] slow request 30.421122 seconds
> old, received at 2017-10-13 21:21:37.413260:
> osd_op(client.6104975.0:7332411 0.a2
> 0:456218c9:::rbd_data.1a24832ae8944a.000000000009d21d:head
> [set-alloc-hint object_size 4194304 write_size 4194304,write
> 4068352~16384] snapc 0=[] ondisk+write+known_if_redirected e18866)
> currently op_applied
> 2017-10-13 21:22:16.238929 mon.ceph1 [WRN] Health check update: 8 slow
> requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:21.239234 mon.ceph1 [WRN] Health check update: 4 slow
> requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 21:22:21.329402 mon.ceph1 [INF] Health check cleared:
> REQUEST_SLOW (was: 4 slow requests are blocked > 32 sec)
> 2017-10-13 21:22:21.329490 mon.ceph1 [INF] Cluster is now healthy
>
> So far, the following steps have been taken to attempt to resolve this:
>
> 1) Updated to Ubuntu 16.04.3 LTS and Ceph 12.2.1.
>
> 2) Changes to ceph.conf:
> osd max scrubs = 1
> osd scrub during recovery = false
> osd deep scrub interval = 2592000
> osd scrub max interval = 2592000
> osd deep scrub randomize ratio = 0.0
> osd disk thread ioprio priority = 7
> osd disk thread ioprio class = idle
> osd scrub sleep = 0.1
>
> 3) Kernel I/O Scheduler set to cfq.
>
> 4) Deep-scrub moved to cron, with a limit of one running at a time.
>
> With these changes, scrubs now take 40-45 minutes to complete, up from
> 20-25, so the amount of time where there are client I/O issues has
> actually gotten substantially worse.
>
> To summarize the ceph cluster, it has five nodes. Each node has
> - Intel Xeon E5-1620 v3 3.5Ghz quad core CPU
> - 64GiB DDR4 1866
> - Intel SSD DC S3700 1GB divided into three partitions used from
> Bluestore blocks.db for each OSD
> - Separate 64GB SSD for ceph monitor data & system image.
> - 3 x 7200rpm drives (Seagate Constellation ES.3 4TB or Seagate
> Enterprise Capacity 8TB)
> - Dual Intel 10Gigabit NIC w/LACP
>
> The SATA drives all check out healthy via smartctl and several are
> either new and were tested right before insertion into this cluster,
> or have been pulled for testing. When tested on random operations,
> they are by and large capable of 120-150 IOPS and about 30MB/sec
> throughput at 100% utilization with response times of 5-7ms.
>
> The CPUs are 75-90% idle. The RAM is largely unused (~55GiB free).
> The network is nearly idle (<50Mbps TX & RX, often <10Mbps). The
> blocks.db SSDs report 0% to 0.2% utilization. The system/monitor SSD
> reports 0-0.5% utilization. The SATA drives report between 0 and 100%
> utilization.
>
> If I turn off the client and just let one deep scrub run, then I'll see
>
> With one scrub only (client turned off), most of the drives at
> negligible utilization, but three at 60-100% utilization (all reads,
> about 30MB/sec -- so, pretty much flat out), and three with the
> utilization match the three OSD's that service the pg being
> deep-scrubbed.
>
> With the client only (scrub script disabled, no scrub running), the
> SATA disks run 10-30% utilization, doing 5-40 IOPS. "ceph status"
> every ten seconds over two minutes reports:
> client: 2196 kB/s rd, 10291 kB/s wr, 164 op/s rd, 202 op/s wr
> client: 1989 kB/s rd, 14932 kB/s wr, 136 op/s rd, 194 op/s wr
> client: 1569 kB/s rd, 9648 kB/s wr, 130 op/s rd, 158 op/s wr
> client: 1949 kB/s rd, 11310 kB/s wr, 176 op/s rd, 173 op/s wr
> client: 2515 kB/s rd, 6987 kB/s wr, 120 op/s rd, 134 op/s wr
> client: 1730 kB/s rd, 7516 kB/s wr, 114 op/s rd, 114 op/s wr
> client: 1745 kB/s rd, 7575 kB/s wr, 120 op/s rd, 123 op/s wr
> client: 1407 kB/s rd, 4975 kB/s wr, 129 op/s rd, 121 op/s wr
> client: 1161 kB/s rd, 5053 kB/s wr, 60 op/s rd, 152 op/s wr
> client: 2551 kB/s rd, 7975 kB/s wr, 92 op/s rd, 140 op/s wr
> client: 1310 kB/s rd, 10059 kB/s wr, 47 op/s rd, 171 op/s wr
> client: 1257 kB/s rd, 8717 kB/s wr, 60 op/s rd, 164 op/s wr
> AVERAGE: 1782 kB/s rd, 8753 kB/s wr, 112 op/s rd, 154 op/s wr
>
> With the client and one deep-scrub running, the I/O utilizations of 3
> disks are 70-100%, the rest at 0-30%. "ceph status" every ten seconds
> over two minutes reports:
> client: 380 kB/s rd, 770 kB/s wr, 9 op/s rd, 64 op/s wr
> client: 1158 kB/s rd, 739 kB/s wr, 55 op/s rd, 44 op/s wr
> client: 447 kB/s rd, 4486 kB/s wr, 20 op/s rd, 221 op/s wr
> client: 467 kB/s rd, 17906 B/s wr, 35 op/s rd, 3 op/s wr
> client: 507 kB/s rd, 462 kB/s wr, 19 op/s rd, 11 op/s wr
> client: 795 kB/s rd, 1553 kB/s wr, 14 op/s rd, 38 op/s wr
> client: 349 kB/s rd, 994 kB/s wr, 32 op/s rd, 84 op/s wr
> client: 1055 kB/s rd, 83983 B/s wr, 28 op/s rd, 9 op/s wr
> client: 493 kB/s rd, 363 kB/s wr, 20 op/s rd, 30 op/s wr
> client: 616 kB/s rd, 2113 kB/s wr, 28 op/s rd, 137 op/s wr
> client: 393 kB/s rd, 1253 kB/s wr, 42 op/s rd, 35 op/s wr
> client: 495 kB/s rd, 2653 kB/s wr, 53 op/s rd, 206 op/s wr
> AVERAGE: 596 kB/s rd, 1290 kB/s wr, 30 op/s rd, 74 op/s wr
>
> The client workload was the same in both cases.
>
> So the performance impact of a single deep scrub on a single client is
> 67% less read throughput, 85% less write throughput, 73% less read
> IOPS, and 52% less IOPS.
>
> (This is still a dramatic improvement from where we started; without
> the ceph.conf tuning and with 2-3 scrubs running, it was fairly
> routine for the ceph -w to show no client read requests completing at
> all for over a minute.)
>
> But it seems like even with these settings, the deep scrub is still
> capable of starving the client.
>
> For example, with this PG being deep-scrubbed:
>
> $ sudo ceph pg dump | fgrep scrub
> dumped all
> 0.4c 10331 0 0 0 0
> 43330306048 1566 1566 active+clean+scrubbing+deep 2017-10-13
> 22:57:48.482152 18866'9181844 18866:6689009 [6,10,5] 6
> [6,10,5] 6 18765'9167507 2017-10-12 17:38:50.990794
> 18666'8968567 2017-09-29 01:28:42.558810
>
> The affected OSDs are 6, 10, and 5.
>
> When things go amiss:
>
> 2017-10-13 23:04:44.217383 mon.ceph1 [INF] Cluster is now healthy
> 2017-10-13 23:05:02.963973 mon.ceph1 [WRN] Health check failed: 3 slow
> requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 23:05:08.445170 mon.ceph1 [WRN] Health check update: 19
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 23:05:14.705356 mon.ceph1 [WRN] Health check update: 34
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-13 23:05:20.884466 mon.ceph1 [INF] Health check cleared:
> REQUEST_SLOW (was: 12 slow requests are blocked > 32 sec)
> 2017-10-13 23:05:20.884522 mon.ceph1 [INF] Cluster is now healthy
>
> Running "ceph health detail" repeatedly confirms that it is (all of)
> the OSDs being scrubbed that are stalling:
>
> HEALTH_OK
>
> HEALTH_WARN 3 slow requests are blocked > 32 sec
> REQUEST_SLOW 3 slow requests are blocked > 32 sec
> 3 ops are blocked > 32.768 sec
> osd.5 has blocked requests > 32.768 sec
>
> HEALTH_WARN 3 slow requests are blocked > 32 sec
> REQUEST_SLOW 3 slow requests are blocked > 32 sec
> 3 ops are blocked > 32.768 sec
> osd.5 has blocked requests > 32.768 sec
>
> HEALTH_WARN 38 slow requests are blocked > 32 sec
> REQUEST_SLOW 38 slow requests are blocked > 32 sec
> 38 ops are blocked > 32.768 sec
> osds 5,6 have blocked requests > 32.768 sec
>
> HEALTH_WARN 38 slow requests are blocked > 32 sec
> REQUEST_SLOW 38 slow requests are blocked > 32 sec
> 38 ops are blocked > 32.768 sec
> osds 5,6 have blocked requests > 32.768 sec
>
> HEALTH_WARN 19 slow requests are blocked > 32 sec
> REQUEST_SLOW 19 slow requests are blocked > 32 sec
> 19 ops are blocked > 32.768 sec
> osds 5,6 have blocked requests > 32.768 sec
>
> HEALTH_WARN 19 slow requests are blocked > 32 sec
> REQUEST_SLOW 19 slow requests are blocked > 32 sec
> 19 ops are blocked > 32.768 sec
> osds 5,6 have blocked requests > 32.768 sec
>
> HEALTH_WARN 35 slow requests are blocked > 32 sec
> REQUEST_SLOW 35 slow requests are blocked > 32 sec
> 35 ops are blocked > 32.768 sec
> osds 5,6,10 have blocked requests > 32.768 sec
>
> HEALTH_WARN 35 slow requests are blocked > 32 sec
> REQUEST_SLOW 35 slow requests are blocked > 32 sec
> 35 ops are blocked > 32.768 sec
> osds 5,6,10 have blocked requests > 32.768 sec
>
> HEALTH_WARN 29 slow requests are blocked > 32 sec
> REQUEST_SLOW 29 slow requests are blocked > 32 sec
> 29 ops are blocked > 32.768 sec
> osds 6,10 have blocked requests > 32.768 sec
>
> HEALTH_WARN 34 slow requests are blocked > 32 sec
> REQUEST_SLOW 34 slow requests are blocked > 32 sec
> 34 ops are blocked > 32.768 sec
> osds 6,10 have blocked requests > 32.768 sec
>
> HEALTH_WARN 34 slow requests are blocked > 32 sec
> REQUEST_SLOW 34 slow requests are blocked > 32 sec
> 34 ops are blocked > 32.768 sec
> osds 6,10 have blocked requests > 32.768 sec
>
> HEALTH_WARN 34 slow requests are blocked > 32 sec
> REQUEST_SLOW 34 slow requests are blocked > 32 sec
> 34 ops are blocked > 32.768 sec
> osds 6,10 have blocked requests > 32.768 sec
>
> HEALTH_WARN 34 slow requests are blocked > 32 sec
> REQUEST_SLOW 34 slow requests are blocked > 32 sec
> 34 ops are blocked > 32.768 sec
> osds 6,10 have blocked requests > 32.768 sec
>
> HEALTH_WARN 12 slow requests are blocked > 32 sec
> REQUEST_SLOW 12 slow requests are blocked > 32 sec
> 12 ops are blocked > 32.768 sec
> osd.10 has blocked requests > 32.768 sec
>
> HEALTH_OK
>
> A different sample taken later where ceph -w happened to spit out the
> details of slow requests shows that the slowness follows the OSDs, not
> the PG being deep-scrubbed:
>
> $ sudo ceph pg dump | fgrep scrub
> dumped all
> 0.1a3 10479 0 0 0 0
> 43951919104 1564 1564 active+clean+scrubbing+deep 2017-10-14
> 02:13:06.046930 18866'9070989 18866:6495502 [14,9,15] 14
> [14,9,15] 14 18866'9060892 2017-10-13 11:22:33.563810
> 18666'8855616 2017-09-29 08:14:06.935978
>
> Although all of the affected requests shown are on an OSD
> participating in a deep-scrub, none of the ones shown are for the PG
> being scrubbed (0.1a3, in this case):
>
> 2017-10-14 02:30:58.876111 osd.15 [WRN] 54 slow requests, 5 included
> below; oldest blocked for > 40.738962 secs
> 2017-10-14 02:30:58.876117 osd.15 [WRN] slow request 30.042803 seconds
> old, received at 2017-10-14 02:30:28.833218:
> osd_op(client.6210820.0:962249 0.167 0.6a309767 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:58.876121 osd.15 [WRN] slow request 30.350780 seconds
> old, received at 2017-10-14 02:30:28.525241:
> osd_op(client.6198346.0:931036 0.4f 0.66fafa4f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:58.876125 osd.15 [WRN] slow request 34.967224 seconds
> old, received at 2017-10-14 02:30:23.908796:
> osd_op(client.6210820.0:961833 0.167 0.f744d67 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:58.876129 osd.15 [WRN] slow request 33.858779 seconds
> old, received at 2017-10-14 02:30:25.017242:
> osd_op(client.6210820.0:961949 0.4f 0.fb8a24f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:58.876152 osd.15 [WRN] slow request 32.730421 seconds
> old, received at 2017-10-14 02:30:26.145599:
> osd_op(client.6210820.0:962067 0.4f 0.2ea1fa4f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:59.876364 osd.15 [WRN] 47 slow requests, 5 included
> below; oldest blocked for > 39.876936 secs
> 2017-10-14 02:30:59.876371 osd.15 [WRN] slow request 30.347226 seconds
> old, received at 2017-10-14 02:30:29.529043:
> osd_op(client.6198346.0:931188 0.12b 0.c13f692b (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:59.876375 osd.15 [WRN] slow request 30.346784 seconds
> old, received at 2017-10-14 02:30:29.529485:
> osd_op(client.6198346.0:931189 0.12b 0.c13f692b (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:59.876379 osd.15 [WRN] slow request 30.899518 seconds
> old, received at 2017-10-14 02:30:28.976750:
> osd_op(client.6198346.0:931114 0.4f 0.b02f084f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:59.876383 osd.15 [WRN] slow request 30.899370 seconds
> old, received at 2017-10-14 02:30:28.976898:
> osd_op(client.6198346.0:931115 0.4f 0.b02f084f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:30:59.876386 osd.15 [WRN] slow request 30.851203 seconds
> old, received at 2017-10-14 02:30:29.025066:
> osd_op(client.6198346.0:931116 0.4f 0.b02f084f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:00.876599 osd.15 [WRN] 52 slow requests, 5 included
> below; oldest blocked for > 40.877172 secs
> 2017-10-14 02:31:00.876605 osd.15 [WRN] slow request 30.883711 seconds
> old, received at 2017-10-14 02:30:29.992794:
> osd_op(client.6198346.0:931211 0.36 0.b9aba836 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:00.876610 osd.15 [WRN] slow request 30.513765 seconds
> old, received at 2017-10-14 02:30:30.362740:
> osd_op(client.6198346.0:931274 0.a9 0.2d8d5aa9 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:00.876613 osd.15 [WRN] slow request 30.017060 seconds
> old, received at 2017-10-14 02:30:30.859445:
> osd_op(client.6210820.0:962360 0.a9 0.7db19aa9 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:00.876617 osd.15 [WRN] slow request 31.501782 seconds
> old, received at 2017-10-14 02:30:29.374722:
> osd_op(client.6198346.0:931183 0.167 0.c0db5367 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:00.876620 osd.15 [WRN] slow request 30.288990 seconds
> old, received at 2017-10-14 02:30:30.587515:
> osd_op(client.6210820.0:962354 0.4f 0.d04a024f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:01.876820 osd.15 [WRN] 54 slow requests, 5 included
> below; oldest blocked for > 41.877395 secs
> 2017-10-14 02:31:01.876826 osd.15 [WRN] slow request 30.051551 seconds
> old, received at 2017-10-14 02:30:31.825177:
> osd_op(client.6198346.0:931372 0.a9 0.ae2afca9 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:01.876830 osd.15 [WRN] slow request 31.180720 seconds
> old, received at 2017-10-14 02:30:30.696008:
> osd_op(client.6198346.0:931303 0.4f 0.d960a04f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:01.876833 osd.15 [WRN] slow request 30.097627 seconds
> old, received at 2017-10-14 02:30:31.779100:
> osd_op(client.6198346.0:931371 0.a9 0.ae2afca9 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:01.876837 osd.15 [WRN] slow request 36.859324 seconds
> old, received at 2017-10-14 02:30:25.017404:
> osd_op(client.6210820.0:961950 0.4f 0.fb8a24f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:01.876840 osd.15 [WRN] slow request 36.106802 seconds
> old, received at 2017-10-14 02:30:25.769926:
> osd_op(client.6198346.0:930836 0.81 0.cdb24281 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:02.877030 osd.15 [WRN] 45 slow requests, 5 included
> below; oldest blocked for > 42.630584 secs
> 2017-10-14 02:31:02.877037 osd.15 [WRN] slow request 30.497084 seconds
> old, received at 2017-10-14 02:30:32.379855:
> osd_op(client.6198346.0:931402 0.db 0.1b5228db (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:02.877041 osd.15 [WRN] slow request 30.045499 seconds
> old, received at 2017-10-14 02:30:32.831441:
> osd_op(client.6198346.0:931439 0.19e 0.bf928d9e (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:02.877045 osd.15 [WRN] slow request 30.045355 seconds
> old, received at 2017-10-14 02:30:32.831584:
> osd_op(client.6198346.0:931440 0.19e 0.bf928d9e (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:02.877048 osd.15 [WRN] slow request 33.937728 seconds
> old, received at 2017-10-14 02:30:28.939211:
> osd_op(client.6198346.0:931109 0.16c 0.ecf9016c (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:02.877051 osd.15 [WRN] slow request 39.956090 seconds
> old, received at 2017-10-14 02:30:22.920850:
> osd_op(client.6210820.0:961696 0.167 0.74cc8167 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:03.877265 osd.15 [WRN] 48 slow requests, 4 included
> below; oldest blocked for > 43.630826 secs
> 2017-10-14 02:31:03.877271 osd.15 [WRN] slow request 30.265866 seconds
> old, received at 2017-10-14 02:30:33.611316:
> osd_op(client.6198346.0:931511 0.12b 0.f9f6932b (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:03.877275 osd.15 [WRN] slow request 30.736755 seconds
> old, received at 2017-10-14 02:30:33.140427:
> osd_op(client.6198346.0:931461 0.81 0.67d8881 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:03.877279 osd.15 [WRN] slow request 30.068784 seconds
> old, received at 2017-10-14 02:30:33.808398:
> osd_op(client.6198346.0:931521 0.167 0.c39f1b67 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:03.877283 osd.15 [WRN] slow request 34.937801 seconds
> old, received at 2017-10-14 02:30:28.939380:
> osd_op(client.6198346.0:931110 0.16c 0.ecf9016c (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:04.877468 osd.15 [WRN] 50 slow requests, 2 included
> below; oldest blocked for > 44.631044 secs
> 2017-10-14 02:31:04.877474 osd.15 [WRN] slow request 30.100364 seconds
> old, received at 2017-10-14 02:30:34.777036:
> osd_op(client.6198346.0:931601 0.12b 0.824c6d2b (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:04.877478 osd.15 [WRN] slow request 30.598084 seconds
> old, received at 2017-10-14 02:30:34.279316:
> osd_op(client.6210820.0:962473 0.1c1 0.d0030dc1 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:05.877693 osd.15 [WRN] 54 slow requests, 4 included
> below; oldest blocked for > 45.631254 secs
> 2017-10-14 02:31:05.877699 osd.15 [WRN] slow request 30.613464 seconds
> old, received at 2017-10-14 02:30:35.264146:
> osd_op(client.6210820.0:962514 0.12b 0.d9b2c92b (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:05.877703 osd.15 [WRN] slow request 30.564546 seconds
> old, received at 2017-10-14 02:30:35.313064:
> osd_op(client.6198346.0:931635 0.167 0.b6530d67 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:05.877707 osd.15 [WRN] slow request 30.561430 seconds
> old, received at 2017-10-14 02:30:35.316180:
> osd_op(client.6210820.0:962520 0.12b 0.2bc1b12b (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:05.877711 osd.15 [WRN] slow request 30.231697 seconds
> old, received at 2017-10-14 02:30:35.645913:
> osd_op(client.6198346.0:931670 0.19e 0.1063099e (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:06.877917 osd.15 [WRN] 45 slow requests, 4 included
> below; oldest blocked for > 45.044786 secs
> 2017-10-14 02:31:06.877925 osd.15 [WRN] slow request 30.860692 seconds
> old, received at 2017-10-14 02:30:36.017137:
> osd_op(client.6198346.0:931696 0.167 0.a0e34b67 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:06.877929 osd.15 [WRN] slow request 30.860517 seconds
> old, received at 2017-10-14 02:30:36.017312:
> osd_op(client.6198346.0:931697 0.167 0.a0e34b67 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:06.877933 osd.15 [WRN] slow request 30.860432 seconds
> old, received at 2017-10-14 02:30:36.017397:
> osd_op(client.6198346.0:931698 0.167 0.a0e34b67 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:06.877936 osd.15 [WRN] slow request 30.695287 seconds
> old, received at 2017-10-14 02:30:36.182542:
> osd_op(client.6210820.0:962573 0.14e 0.23c5354e (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:10.449873 mon.ceph1 [WRN] Health check update: 72
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-14 02:31:15.450202 mon.ceph1 [WRN] Health check update: 59
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2017-10-14 02:31:08.878201 osd.15 [WRN] 48 slow requests, 3 included
> below; oldest blocked for > 47.045085 secs
> 2017-10-14 02:31:08.878207 osd.15 [WRN] slow request 30.788227 seconds
> old, received at 2017-10-14 02:30:38.089900:
> osd_op(client.6210820.0:962712 0.144 0.9a2d1144 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:08.878212 osd.15 [WRN] slow request 30.788130 seconds
> old, received at 2017-10-14 02:30:38.089997:
> osd_op(client.6210820.0:962713 0.144 0.9a2d1144 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:08.878215 osd.15 [WRN] slow request 30.786906 seconds
> old, received at 2017-10-14 02:30:38.091221:
> osd_op(client.6210820.0:962747 0.1c1 0.60f887c1 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:09.878410 osd.15 [WRN] 42 slow requests, 1 included
> below; oldest blocked for > 46.828631 secs
> 2017-10-14 02:31:09.878417 osd.15 [WRN] slow request 30.367803 seconds
> old, received at 2017-10-14 02:30:39.510546:
> osd_op(client.6210820.0:962753 0.16c 0.209b656c (undecoded)
> ondisk+read+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:10.878600 osd.15 [WRN] 44 slow requests, 2 included
> below; oldest blocked for > 47.828813 secs
> 2017-10-14 02:31:10.878625 osd.15 [WRN] slow request 30.023391 seconds
> old, received at 2017-10-14 02:30:40.855140:
> osd_op(client.6198346.0:931975 0.12b 0.96d4852b (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:10.878630 osd.15 [WRN] slow request 30.693944 seconds
> old, received at 2017-10-14 02:30:40.184587:
> osd_op(client.6198346.0:931947 0.15d 0.360d7d5d (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:11.878798 osd.15 [WRN] 45 slow requests, 1 included
> below; oldest blocked for > 48.829020 secs
> 2017-10-14 02:31:11.878805 osd.15 [WRN] slow request 30.753634 seconds
> old, received at 2017-10-14 02:30:41.125105:
> osd_op(client.6198346.0:932010 0.1c1 0.80369dc1 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:12.878984 osd.15 [WRN] 47 slow requests, 2 included
> below; oldest blocked for > 49.829201 secs
> 2017-10-14 02:31:12.878993 osd.15 [WRN] slow request 30.042094 seconds
> old, received at 2017-10-14 02:30:42.836824:
> osd_op(client.6210820.0:962766 0.14e 0.25cb5d4e (undecoded)
> ondisk+read+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:12.879001 osd.15 [WRN] slow request 30.463578 seconds
> old, received at 2017-10-14 02:30:42.415340:
> osd_op(client.6198346.0:932067 0.db 0.895084db (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:13.879166 osd.15 [WRN] 48 slow requests, 1 included
> below; oldest blocked for > 50.829393 secs
> 2017-10-14 02:31:13.879172 osd.15 [WRN] slow request 30.191760 seconds
> old, received at 2017-10-14 02:30:43.687351:
> osd_op(client.6198346.0:932120 0.19e 0.3e2de99e (undecoded)
> ondisk+read+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:14.879389 osd.15 [WRN] 48 slow requests, 4 included
> below; oldest blocked for > 50.970509 secs
> 2017-10-14 02:31:14.879396 osd.15 [WRN] slow request 30.787035 seconds
> old, received at 2017-10-14 02:30:44.092271:
> osd_op(client.6210820.0:962790 0.4f 0.925ece4f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:14.879401 osd.15 [WRN] slow request 30.461304 seconds
> old, received at 2017-10-14 02:30:44.418002:
> osd_op(client.6198346.0:932145 0.4f 0.2882a64f (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:14.879404 osd.15 [WRN] slow request 30.461200 seconds
> old, received at 2017-10-14 02:30:44.418106:
> osd_op(client.6198346.0:932146 0.19e 0.53a7419e (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
> 2017-10-14 02:31:14.879408 osd.15 [WRN] slow request 30.787593 seconds
> old, received at 2017-10-14 02:30:44.091713:
> osd_op(client.6210820.0:962779 0.167 0.79816d67 (undecoded)
> ondisk+write+known_if_redirected e18866) currently queued_for_pg
>
> Even if it were possible to restrict the scrubs to low-usage hours,
> running scrubs for 8 hours a day at 45 minutes each with 512 PGs would
> take about 48 days to cycle through.
>
> There's no evidence of any kind supporting any type of hardware issue.
>
> So, what's going on here? There's fairly obviously something
> seriously wrong with this cluster (rather than with Ceph generally),
> but where should I look to run this down further?
>
> Thanks for any advice!
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
If you storing small object sizes then i would recommend lowering disk
read ahead cache if it was larger, for example limit it to 128M:
echo 128 > /sys/block/sdX/queue/read_ahead_kb
The other thing is to check/set you are using cfq so you can set io
priorities.
echo cfq > /sys/block/sdX/queue/scheduler
and double check the kernel is not using the newer blk-mq wih scsi
cat /proc/config.gz | gunzip | grep CONFIG_SCSI_MQ_DEFAULT
if this is set to true you should disable it via boot option
scsi_mod.use_blk_mq=0
Lastly i would not like too much on the scrub conf parameters since it
seems there was earlier discussion on those, but i would also look into:
osd_scrub_priority = 1
osd_scrub_sleep = 2
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
osd_scrub_load_threshold = 3
Maged
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com