Re: [ceph-users] Potential OSD deadlock?

Sage Weil Wed, 14 Oct 2015 10:08:55 -0700

On Wed, 14 Oct 2015, Robert LeBlanc wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> It seems in our situation the cluster is just busy, usually with
> really small RBD I/O. We have gotten things to where it doesn't happen
> as much in a steady state, but when we have an OSD fail (mostly from
> an XFS log bug we hit at least once a week), it is very painful as the
> OSD exits and enters the cluster. We are working to split the PGs a
> couple of fold, but this is a painful process for the reasons
> mentioned in the tracker. Matt Benjamin and Sam Just had a discussion
> on IRC about getting the other primaries to throttle back when such a
> situation occurs so that each primary OSD has some time to service
> client I/O and to push back on the clients to slow down in these
> situations.
> 
> In our case a single OSD can lock up a VM for a very long time while
> others are happily going about their business. Instead of looking like
> the cluster is out of I/O, it looks like there is an error. If
> pressure is pushed back to clients, it would show up as all of the
> clients slowing down a little instead of one or two just hanging for
> even over 1,000 seconds.


This 1000 seconds figure is very troubling.  Do you have logs?  I suspect 
this is a different issue than the prioritization one in the log from the 
other day (which only waited about 30s for higher-priority replica 
requests).

> My thoughts is that each OSD should have some percentage to time given
> to servicing client I/O whereas now it seems that replica I/O can
> completely starve client I/O. I understand why replica traffic needs a
> higher priority, but I think some balance needs to be attained.

We currently do 'fair' prioritized queueing with a token bucket filter 
only for requests with priorities <= 63.  Simply increasing this threshold 
so that it covers replica requests might be enough.  But... we'll be 
starting client requests locally at the expense of in-progress client 
writes elsewhere.  Given that the amount of (our) client-related work we 
do is always bounded by the msgr throttle, I think this is okay since we 
only make the situation worse by a fixed factor.  (We still don't address 
the possibilty that we are replica for every other osd in the system and 
could be flooded by N*(max client ops per osd).

It's this line:

        https://github.com/ceph/ceph/blob/master/src/osd/OSD.cc#L8334

sage



> 
> Thanks,
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.0
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWHne4CRDmVDuy+mK58QAAwYUP/RzTrmsYV7Vi6e64Yikh
> YMMI4Cxt4mBWbTIOsb8iRY98EkqhUWd/kz45OoFQgwE4hS3O5Lksf3u0pcmS
> I+Gz6jQ4/K0B6Mc3Rt19ofD1cA9s6BLnHSqTFZEUVapiHftj84ewIRLts9dg
> YCJJeaaOV8fu07oZvnumRTAKOzWPyQizQKBGx7nujIg13Us0st83C8uANzoX
> hKvlA2qVMXO4rLgR7nZMcgj+X+/79v7MDycM3WP/Q21ValsNfETQVhN+XxC8
> D/IUfX4/AKUEuF4WBEck4Z/Wx9YD+EvpLtQVLy21daazRApWES/iy089F63O
> k9RHp189c4WCduFBaTvZj2cdekAq/Wl50O1AdafYFptWqYhw+aKpihI+yMrX
> +LhWgoYALD6wyXr0KVDZZszIRZbO/PSjct8z13aXBJoJm9r0Vyazfhi9jNW9
> Z/1GD7gv5oHymf7eR9u7T8INdjNzn6Qllj7XCyZfQv5TYxsRWMZxf5vEkpMB
> nAYANoZcNs4ZSIy+OdFOb6nM66ujrytWL1DqWusJUEM/GauBw0fxnQ/i+pMy
> XU8gYbG1um5YY8jrtvvkhnbHdeO/k24/cH7MGslxeezBPnMNzmqj3qVdiX1H
> EBbyBBtp8OF+pKExrmZc2w01W/Nxl6GbVoG+IKJ61FgwKOXEiMwb0wv5mu30
> eP3D
> =R0O9
> -----END PGP SIGNATURE-----
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Wed, Oct 14, 2015 at 12:00 AM, Haomai Wang <haomaiw...@gmail.com> wrote:
> > On Wed, Oct 14, 2015 at 1:03 AM, Sage Weil <sw...@redhat.com> wrote:
> >> On Mon, 12 Oct 2015, Robert LeBlanc wrote:
> >>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> Hash: SHA256
> >>>
> >>> After a weekend, I'm ready to hit this from a different direction.
> >>>
> >>> I replicated the issue with Firefly so it doesn't seem an issue that
> >>> has been introduced or resolved in any nearby version. I think overall
> >>> we may be seeing [1] to a great degree. From what I can extract from
> >>> the logs, it looks like in situations where OSDs are going up and
> >>> down, I see I/O blocked at the primary OSD waiting for peering and/or
> >>> the PG to become clean before dispatching the I/O to the replicas.
> >>>
> >>> In an effort to understand the flow of the logs, I've attached a small
> >>> 2 minute segment of a log I've extracted what I believe to be
> >>> important entries in the life cycle of an I/O along with my
> >>> understanding. If someone would be kind enough to help my
> >>> understanding, I would appreciate it.
> >>>
> >>> 2015-10-12 14:12:36.537906 7fb9d2c68700 10 -- 192.168.55.16:6800/11295
> >>> >> 192.168.55.12:0/2013622 pipe(0x26c90000 sd=47 :6800 s=2 pgs=2 cs=1
> >>> l=1 c=0x32c85440).reader got message 19 0x2af81700
> >>> osd_op(client.6709.0:67 rbd_data.103c74b0dc51.000000000000003a
> >>> [set-alloc-hint object_size 4194304 write_size 4194304,write
> >>> 0~4194304] 0.474a01a9 ack+ondisk+write+known_if_redirected e44) v5
> >>>
> >>> - ->Messenger has recieved the message from the client (previous
> >>> entries in the 7fb9d2c68700 thread are the individual segments that
> >>> make up this message).
> >>>
> >>> 2015-10-12 14:12:36.537963 7fb9d2c68700  1 -- 192.168.55.16:6800/11295
> >>> <== client.6709 192.168.55.12:0/2013622 19 ====
> >>> osd_op(client.6709.0:67 rbd_data.103c74b0dc51.000000000000003a
> >>> [set-alloc-hint object_size 4194304 write_size 4194304,write
> >>> 0~4194304] 0.474a01a9 ack+ondisk+write+known_if_redirected e44) v5
> >>> ==== 235+0+4194304 (2317308138 0 2001296353) 0x2af81700 con 0x32c85440
> >>>
> >>> - ->OSD process acknowledges that it has received the write.
> >>>
> >>> 2015-10-12 14:12:36.538096 7fb9d2c68700 15 osd.4 44 enqueue_op
> >>> 0x3052b300 prio 63 cost 4194304 latency 0.012371
> >>> osd_op(client.6709.0:67 rbd_data.103c74b0dc51.000000000000003a
> >>> [set-alloc-hint object_size 4194304 write_size 4194304,write
> >>> 0~4194304] 0.474a01a9 ack+ondisk+write+known_if_redirected e44) v5
> >>>
> >>> - ->Not sure excatly what is going on here, the op is being enqueued 
> >>> somewhere..
> >>>
> >>> 2015-10-12 14:13:06.542819 7fb9e2d3a700 10 osd.4 44 dequeue_op
> >>> 0x3052b300 prio 63 cost 4194304 latency 30.017094
> >>> osd_op(client.6709.0:67 rbd_data.103c74b0dc51.000000000000003a
> >>> [set-alloc-hint object_size 4194304 write_size 4194304,write
> >>> 0~4194304] 0.474a01a9 ack+ondisk+write+known_if_redirected e44) v
> >>> 5 pg pg[0.29( v 44'703 (0'0,44'703] local-les=40 n=641 ec=1 les/c
> >>> 40/44 32/32/10) [4,5,0] r=0 lpr=32 crt=44'700 lcod 44'702 mlcod 44'702
> >>> active+clean]
> >>>
> >>> - ->The op is dequeued from this mystery queue 30 seconds later in a
> >>> different thread.
> >>
> >> ^^ This is the problem.  Everything after this looks reasonable.  Looking
> >> at the other dequeue_op calls over this period, it looks like we're just
> >> overwhelmed with higher priority requests.  New clients are 63, while
> >> osd_repop (replicated write from another primary) are 127 and replies from
> >> our own replicated ops are 196.  We do process a few other prio 63 items,
> >> but you'll see that their latency is also climbing up to 30s over this
> >> period.
> >>
> >> The question is why we suddenly get a lot of them.. maybe the peering on
> >> other OSDs just completed so we get a bunch of these?  It's also not clear
> >> to me what makes osd.4 or this op special.  We expect a mix of primary and
> >> replica ops on all the OSDs, so why would we suddenly have more of them
> >> here....
> >
> > I guess the bug tracker(http://tracker.ceph.com/issues/13482) is
> > related to this thread.
> >
> > So is it means that there exists live lock with client op and repop?
> > We permit all clients issue too much client ops which cause some OSDs
> > bottleneck, then actually other OSDs maybe idle enough and accept more
> > client ops. Finally, all osds are stuck into the bottleneck OSD. It
> > seemed reasonable, but why it will last so long?
> >
> >>
> >> sage
> >>
> >>
> >>>
> >>> 2015-10-12 14:13:06.542912 7fb9e2d3a700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'703 (0'0,44'703] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 crt=44'700 lcod 44'702 mlcod 44'702 active+clean]
> >>> do_op osd_op(client.6709.0:67 rbd_data.103c74b0dc51.000000000000003a
> >>> [set-alloc-hint object_size 4194304 write_size 4194304,write
> >>> 0~4194304] 0.474a01a9 ack+ondisk+write+known_if_redirected e44) v5
> >>> may_write -> write-ordered flags ack+ondisk+write+known_if_redirected
> >>>
> >>> - ->Not sure what this message is. Look up of secondary OSDs?
> >>>
> >>> 2015-10-12 14:13:06.544999 7fb9e2d3a700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'703 (0'0,44'703] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 crt=44'700 lcod 44'702 mlcod 44'702 active+clean]
> >>> new_repop rep_tid 17815 on osd_op(client.6709.0:67
> >>> rbd_data.103c74b0dc51.000000000000003a [set-alloc-hint object_size
> >>> 4194304 write_size 4194304,write 0~4194304] 0.474a01a9
> >>> ack+ondisk+write+known_if_redirected e44) v5
> >>>
> >>> - ->Dispatch write to secondaty OSDs?
> >>>
> >>> 2015-10-12 14:13:06.545116 7fb9e2d3a700  1 -- 192.168.55.16:6801/11295
> >>> --> 192.168.55.15:6801/32036 -- osd_repop(client.6709.0:67 0.29
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 v 44'704) v1
> >>> -- ?+4195078 0x238fd600 con 0x32bcb5a0
> >>>
> >>> - ->OSD dispatch write to OSD.0.
> >>>
> >>> 2015-10-12 14:13:06.545132 7fb9e2d3a700 20 -- 192.168.55.16:6801/11295
> >>> submit_message osd_repop(client.6709.0:67 0.29
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 v 44'704) v1
> >>> remote, 192.168.55.15:6801/32036, have pipe.
> >>>
> >>> - ->Message sent to OSD.0.
> >>>
> >>> 2015-10-12 14:13:06.545195 7fb9e2d3a700  1 -- 192.168.55.16:6801/11295
> >>> --> 192.168.55.11:6801/13185 -- osd_repop(client.6709.0:67 0.29
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 v 44'704) v1
> >>> -- ?+4195078 0x16edd200 con 0x3a37b20
> >>>
> >>> - ->OSD dispatch write to OSD.5.
> >>>
> >>> 2015-10-12 14:13:06.545210 7fb9e2d3a700 20 -- 192.168.55.16:6801/11295
> >>> submit_message osd_repop(client.6709.0:67 0.29
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 v 44'704) v1
> >>> remote, 192.168.55.11:6801/13185, have pipe.
> >>>
> >>> - ->Message sent to OSD.5.
> >>>
> >>> 2015-10-12 14:13:06.545229 7fb9e2d3a700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'703 (0'0,44'703] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 crt=44'700 lcod 44'702 mlcod 44'702 active+clean]
> >>> append_log log((0'0,44'703], crt=44'700) [44'704 (44'691) modify
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 by
> >>> client.6709.0:67 2015-10-12 14:12:34.340082]
> >>> 2015-10-12 14:13:06.545268 7fb9e2d3a700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 luod=44'703 lua=44'703 crt=44'700 lcod 44'702 mlcod
> >>> 44'702 active+clean] add_log_entry 44'704 (44'691) modify
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 by
> >>> client.6709.0:67 2015-10-12 14:12:34.340082
> >>>
> >>> - ->These record the OP in the journal log?
> >>>
> >>> 2015-10-12 14:13:06.563241 7fb9d326e700 20 -- 192.168.55.16:6801/11295
> >>> >> 192.168.55.11:6801/13185 pipe(0x2d355000 sd=98 :6801 s=2 pgs=12
> >>> cs=3 l=0 c=0x3a37b20).writer encoding 17337 features 37154696925806591
> >>> 0x16edd200 osd_repop(client.6709.0:67 0.29
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 v 44'704) v1
> >>>
> >>> - ->Writing the data to OSD.5?
> >>>
> >>> 2015-10-12 14:13:06.573938 7fb9d3874700 10 -- 192.168.55.16:6801/11295
> >>> >> 192.168.55.15:6801/32036 pipe(0x3f96000 sd=176 :6801 s=2 pgs=8 cs=3
> >>> l=0 c=0x32bcb5a0).reader got ack seq 1206 >= 1206 on 0x238fd600
> >>> osd_repop(client.6709.0:67 0.29
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 v 44'704) v1
> >>>
> >>> - ->Messenger gets ACK from OSD.0 that it reveiced that last packet?
> >>>
> >>> 2015-10-12 14:13:06.613425 7fb9d3874700 10 -- 192.168.55.16:6801/11295
> >>> >> 192.168.55.15:6801/32036 pipe(0x3f96000 sd=176 :6801 s=2 pgs=8 cs=3
> >>> l=0 c=0x32bcb5a0).reader got message 1146 0x3ffa480
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1
> >>>
> >>> - ->Messenger receives ack on disk from OSD.0.
> >>>
> >>> 2015-10-12 14:13:06.613447 7fb9d3874700  1 -- 192.168.55.16:6801/11295
> >>> <== osd.0 192.168.55.15:6801/32036 1146 ====
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1 ====
> >>> 83+0+0 (2772408781 0 0) 0x3ffa480 con 0x32bcb5a0
> >>>
> >>> - ->OSD process gets on disk ACK from OSD.0.
> >>>
> >>> 2015-10-12 14:13:06.613478 7fb9d3874700 10 osd.4 44 handle_replica_op
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1 epoch 44
> >>>
> >>> - ->Primary OSD records the ACK (duplicate message?). Not sure how to
> >>> correlate that to the previous message other than by time.
> >>>
> >>> 2015-10-12 14:13:06.613504 7fb9d3874700 15 osd.4 44 enqueue_op
> >>> 0x120f9b00 prio 196 cost 0 latency 0.000250
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1
> >>>
> >>> - ->The reply is enqueued onto a mystery queue.
> >>>
> >>> 2015-10-12 14:13:06.627793 7fb9d6afd700 10 -- 192.168.55.16:6801/11295
> >>> >> 192.168.55.11:6801/13185 pipe(0x2d355000 sd=98 :6801 s=2 pgs=12
> >>> cs=3 l=0 c=0x3a37b20).reader got ack seq 17337 >= 17337 on 0x16edd200
> >>> osd_repop(client.6709.0:67 0.29
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 v 44'704) v1
> >>>
> >>> - ->Messenger gets ACK from OSD.5 that it reveiced that last packet?
> >>>
> >>> 2015-10-12 14:13:06.628364 7fb9d6afd700 10 -- 192.168.55.16:6801/11295
> >>> >> 192.168.55.11:6801/13185 pipe(0x2d355000 sd=98 :6801 s=2 pgs=12
> >>> cs=3 l=0 c=0x3a37b20).reader got message 16477 0x21cef3c0
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1
> >>>
> >>> - ->Messenger receives ack on disk from OSD.5.
> >>>
> >>> 2015-10-12 14:13:06.628382 7fb9d6afd700  1 -- 192.168.55.16:6801/11295
> >>> <== osd.5 192.168.55.11:6801/13185 16477 ====
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1 ====
> >>> 83+0+0 (2104182993 0 0) 0x21cef3c0 con 0x3a37b20
> >>>
> >>> - ->OSD process gets on disk ACK from OSD.5.
> >>>
> >>> 2015-10-12 14:13:06.628406 7fb9d6afd700 10 osd.4 44 handle_replica_op
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1 epoch 44
> >>>
> >>> - ->Primary OSD records the ACK (duplicate message?). Not sure how to
> >>> correlate that to the previous message other than by time.
> >>>
> >>> 2015-10-12 14:13:06.628426 7fb9d6afd700 15 osd.4 44 enqueue_op
> >>> 0x3e41600 prio 196 cost 0 latency 0.000180
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1
> >>>
> >>> - ->The reply is enqueued onto a mystery queue.
> >>>
> >>> 2015-10-12 14:13:07.124206 7fb9f4e9f700  0 log_channel(cluster) log
> >>> [WRN] : slow request 30.598371 seconds old, received at 2015-10-12
> >>> 14:12:36.525724: osd_op(client.6709.0:67
> >>> rbd_data.103c74b0dc51.000000000000003a [set-alloc-hint object_size
> >>> 4194304 write_size 4194304,write 0~4194304] 0.474a01a9
> >>> ack+ondisk+write+known_if_redirected e44) currently waiting for subops
> >>> from 0,5
> >>>
> >>> - ->OP has not been dequeued to the client from the mystery queue yet.
> >>>
> >>> 2015-10-12 14:13:07.278449 7fb9e2d3a700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 luod=44'703 lua=44'703 crt=44'702 lcod 44'702 mlcod
> >>> 44'702 active+clean] eval_repop repgather(0x37ea3cc0 44'704
> >>> rep_tid=17815 committed?=0 applied?=0 lock=0
> >>> op=osd_op(client.6709.0:67 rbd_data.103c74b0dc51.000000000000003a
> >>> [set-alloc-hint object_size 4194304 write_size 4194304,write
> >>> 0~4194304] 0.474a01a9 ack+ondisk+write+known_if_redirected e44) v5)
> >>> wants=ad
> >>>
> >>> - ->Not sure what this means. The OP has been completed on all replicas?
> >>>
> >>> 2015-10-12 14:13:07.278566 7fb9e0535700 10 osd.4 44 dequeue_op
> >>> 0x120f9b00 prio 196 cost 0 latency 0.665312
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1 pg
> >>> pg[0.29( v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44
> >>> 32/32/10) [4,5,0] r=0 lpr=32 luod=44'703 lua=44'703 crt=44'702 lcod
> >>> 44'702 mlcod 44'702 active+clean]
> >>>
> >>> - ->One of the replica OPs is dequeued in a different thread
> >>>
> >>> 2015-10-12 14:13:07.278809 7fb9e0535700 10 osd.4 44 dequeue_op
> >>> 0x3e41600 prio 196 cost 0 latency 0.650563
> >>> osd_repop_reply(client.6709.0:67 0.29 ondisk, result = 0) v1 pg
> >>> pg[0.29( v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44
> >>> 32/32/10) [4,5,0] r=0 lpr=32 luod=44'703 lua=44'703 crt=44'702 lcod
> >>> 44'702 mlcod 44'702 active+clean]
> >>>
> >>> - ->The other replica OP is dequeued in the new thread
> >>>
> >>> 2015-10-12 14:13:07.967469 7fb9efe95700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 lua=44'703 crt=44'702 lcod 44'703 mlcod 44'702
> >>> active+clean] eval_repop repgather(0x37ea3cc0 44'704 rep_tid=17815
> >>> committed?=1 applied?=0 lock=0 op=osd_op(client.6709.0:67
> >>> rbd_data.103c74b0dc51.000000000000003a [set-alloc-hint object_size
> >>> 4194304 write_size 4194304,write 0~4194304] 0.474a01a9
> >>> ack+ondisk+write+known_if_redirected e44) v5) wants=ad
> >>>
> >>> - ->Not sure what this does. A thread that joins the replica OPs with
> >>> the primary OP?
> >>>
> >>> 2015-10-12 14:13:07.967515 7fb9efe95700 15 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 lua=44'703 crt=44'702 lcod 44'703 mlcod 44'702
> >>> active+clean] log_op_stats osd_op(client.6709.0:67
> >>> rbd_data.103c74b0dc51.000000000000003a [set-alloc-hint object_size
> >>> 4194304 write_size 4194304,write 0~4194304] 0.474a01a9
> >>> ack+ondisk+write+known_if_redirected e44) v5 inb 4194304 outb 0 rlat
> >>> 0.000000 lat 31.441789
> >>>
> >>> - ->Logs that the write has been committed to all replicas in the
> >>> primary journal?
> >>>
> >>> Not sure what the rest of these do, nor do I understand where the
> >>> client gets an ACK that the write is committed.
> >>>
> >>> 2015-10-12 14:13:07.967583 7fb9efe95700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 lua=44'703 crt=44'702 lcod 44'703 mlcod 44'702
> >>> active+clean]  sending commit on repgather(0x37ea3cc0 44'704
> >>> rep_tid=17815 committed?=1 applied?=0 lock=0
> >>> op=osd_op(client.6709.0:67 rbd_data.103c74b0dc51.000000000000003a
> >>> [set-alloc-hint object_size 4194304 write_size 4194304,write
> >>> 0~4194304] 0.474a01a9 ack+ondisk+write+known_if_redirected e44) v5)
> >>> 0x3a2f0840
> >>>
> >>> 2015-10-12 14:13:10.351452 7fb9f0696700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 crt=44'702 lcod 44'703 mlcod 44'702 active+clean]
> >>> eval_repop repgather(0x37ea3cc0 44'704 rep_tid=17815 committed?=1
> >>> applied?=1 lock=0 op[0/1943]client.6709.0:67
> >>> rbd_data.103c74b0dc51.000000000000003a [set-alloc-hint object_size
> >>> 4194304 write_size 4194304,write 0~4194304] 0.474a01a9
> >>> ack+ondisk+write+known_if_redirected e44) v5) wants=ad
> >>>
> >>> 2015-10-12 14:13:10.354089 7fb9f0696700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 crt=44'702 lcod 44'703 mlcod 44'703 active+clean]
> >>> removing repgather(0x37ea3cc0 44'704 rep_tid=17815 committed?=1
> >>> applied?=1 lock=0 op=osd_op(client.6709.0:67
> >>> rbd_data.103c74b0dc51.000000000000003a [set-alloc-hint object_size
> >>> 4194304 write_size 4194304,write 0~4194304] 0.474a01a9
> >>> ack+ondisk+write+known_if_redirected e44) v5)
> >>>
> >>> 2015-10-12 14:13:10.354163 7fb9f0696700 20 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 crt=44'702 lcod 44'703 mlcod 44'703 active+clean]
> >>>  q front is repgather(0x37ea3cc0 44'704 rep_tid=17815 committed?=1
> >>> applied?=1 lock=0 op=osd_op(client.6709.0:67
> >>> rbd_data.103c74b0dc51.000000000000003a [set-alloc-hint object_size
> >>> 4194304 write_size 4194304,write 0~4194304] 0.474a01a9
> >>> ack+ondisk+write+known_if_redirected e44) v5)
> >>>
> >>> 2015-10-12 14:13:10.354199 7fb9f0696700 20 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'704 (0'0,44'704] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 crt=44'702 lcod 44'703 mlcod 44'703 active+clean]
> >>> remove_repop repgather(0x37ea3cc0 44'704 rep_tid=17815 committed?=1
> >>> applied?=1 lock=0 op=osd_op(client.6709.0:67
> >>> rbd_data.103c74b0dc51.000000000000003a [set-alloc-hint object_size
> >>> 4194304 write_size 4194304,write 0~4194304] 0.474a01a9
> >>> ack+ondisk+write+known_if_redirected e44) v5)
> >>>
> >>> 2015-10-12 14:13:15.488448 7fb9e2d3a700 10 osd.4 pg_epoch: 44 pg[0.29(
> >>> v 44'707 (0'0,44'707] local-les=40 n=641 ec=1 les/c 40/44 32/32/10)
> >>> [4,5,0] r=0 lpr=32 luod=44'705 lua=44'705 crt=44'704 lcod 44'704 mlcod
> >>> 44'704 active+clean] append_log: trimming to 44'704 entries 44'704
> >>> (44'691) modify
> >>> 474a01a9/rbd_data.103c74b0dc51.000000000000003a/head//0 by
> >>> client.6709.0:67 2015-10-12 14:12:34.340082
> >>>
> >>> Thanks for hanging in there with me on this...
> >>>
> >>> [1] http://www.spinics.net/lists/ceph-devel/msg26633.html
> >>> -----BEGIN PGP SIGNATURE-----
> >>> Version: Mailvelope v1.2.0
> >>> Comment: https://www.mailvelope.com
> >>>
> >>> wsFcBAEBCAAQBQJWHCx0CRDmVDuy+mK58QAAXf8P/j6MD52r2DLqOP9hKFAP
> >>> MJUktg8uqK1i8awtuIQhJHAPDZQF8EACOXg6RBuOz75iryCFKAJXk5exLXrE
> >>> pIZqY/0/JCsUEPuQGaMY9GVQNrTeB82F5VIu572i2xeFir4fUEcvllXSeR9O
> >>> CxSgaAncxUYGSXwsiCJ28QhwPCFXtCLACg1eTpghhAcOwY0t+z6ZB3vh+WxB
> >>> B8kRCdee78TVZOgeTnd66aBJUrr21Ir9aPqSm73uY561dyDmyxc4zPq+FDsJ
> >>> kuac+Ky9Lc6rqhxwRptbdx5i/EDzxj96EKEz2v4SFBmvzU8jtZlA8THJ6WlF
> >>> 6lZRpRIMfEqVu4neFcdUIct8+Brf7fuxOI7hbhUL5xq2I6yDSY8E2T8ImRoS
> >>> w8bSrjFV3wmnXSCHnFJPROqdhtlQlH1PkKPBRJeJrkrB1MloX0ybU4hNIr7Q
> >>> 4ZyzeLpD9sgL1vEfUVuCksgiVJhzlFOyqeRHcfpPEnLxyGL/+mLUa5lQ5m5l
> >>> m286ZnsMZGMzAdSA/tsqnTFzL0HbjkiWD/OMU5zThSKW2tZBNWg3xZE5Yia9
> >>> zAbhxpvxqhKQ7nfmv3xeVJ1GKb9CuzfN9ZIGPltHvpA3rZf3I4+XVlWbbhDZ
> >>> z8Xp8Pw8f7neh89Tv3AT+krM1jrE1ZxOF5A2K4CxBcS3OEMc5UIZ2fy4dHSo
> >>> 0iTE
> >>> =t7nL
> >>> -----END PGP SIGNATURE-----
> >>> ----------------
> >>> Robert LeBlanc
> >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>>
> >>>
> >>> On Thu, Oct 8, 2015 at 11:44 PM, Robert LeBlanc <rob...@leblancnet.us> 
> >>> wrote:
> >>> > -----BEGIN PGP SIGNED MESSAGE-----
> >>> > Hash: SHA256
> >>> >
> >>> > Sage,
> >>> >
> >>> > After trying to bisect this issue (all test moved the bisect towards
> >>> > Infernalis) and eventually testing the Infernalis branch again, it
> >>> > looks like the problem still exists although it is handled a tad
> >>> > better in Infernalis. I'm going to test against Firefly/Giant next
> >>> > week and then try and dive into the code to see if I can expose any
> >>> > thing.
> >>> >
> >>> > If I can do anything to provide you with information, please let me 
> >>> > know.
> >>> >
> >>> > Thanks,
> >>> > -----BEGIN PGP SIGNATURE-----
> >>> > Version: Mailvelope v1.2.0
> >>> > Comment: https://www.mailvelope.com
> >>> >
> >>> > wsFcBAEBCAAQBQJWF1QlCRDmVDuy+mK58QAAWLgP/2l+TkcpeKihDxF8h/kw
> >>> > YFffNWODNfOMq8FVDQkQceo2mFCFc29JnBYiAeqW+XPelwuU5S86LG998aUB
> >>> > BvIU4EHaJNJ31X1NCIA7nwi8rXlFYfSG2qQn58+IzqZoWCQM5vD/THISV1rP
> >>> > qQKtoOAEuRxz+vOAJGI1A1xJSOiFwTRjs4LjE1zYjSP26LdEF61D/lb+AVzV
> >>> > ufxi/ci6mAla/4VTAH4VqEviDgC8AbAZnWFGfUPcTUxJQS99kFrfjJnWvgyF
> >>> > V9EmWtQCvhRO74hQLBqspOwdAxEJesPfGcJT1LjR0eEAMWvbGPtaqbSFAEWa
> >>> > jjyy5wP9+4NnGLdhba6UBtLphjqTcl0e2vVwRj0zLhI14moAOlbhIKmZ1Dt+
> >>> > 1P6vfgOUGvO76xgDMwrVKRoQgWJO/0Tup9+oqInnNYgf4W+ZWsLgLgo7ETAF
> >>> > VcI7LP1wkwAI3lz5YphY/TnKNGs6i+wVjKBamOt3R1yz9WeylaG0T6xgGHrs
> >>> > VugrRSUuO+ND9+mE5EsUgITCZoaavXJESJMb30XkK6hYGB+T/q+hBafc6Wle
> >>> > Jgs+aT2m1erdSyZn0ZC9a6CjWmwJXY6FCSGhE53BbefBxmCFxn+8tVav+Q8W
> >>> > 7s14TntP6ex4ca7eTwGuSXC9FU5fAVa+3+3aXDAC1QPAkeVkXyB716W1XG6b
> >>> > BCFo
> >>> > =GJL4
> >>> > -----END PGP SIGNATURE-----
> >>> > ----------------
> >>> > Robert LeBlanc
> >>> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>> >
> >>> >
> >>> > On Wed, Oct 7, 2015 at 1:25 PM, Robert LeBlanc <rob...@leblancnet.us> 
> >>> > wrote:
> >>> >> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >> Hash: SHA256
> >>> >>
> >>> >> We forgot to upload the ceph.log yesterday. It is there now.
> >>> >> - ----------------
> >>> >> Robert LeBlanc
> >>> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>> >>
> >>> >>
> >>> >> On Tue, Oct 6, 2015 at 5:40 PM, Robert LeBlanc  wrote:
> >>> >>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>> Hash: SHA256
> >>> >>>
> >>> >>> I upped the debug on about everything and ran the test for about 40
> >>> >>> minutes. I took OSD.19 on ceph1 doen and then brought it back in.
> >>> >>> There was at least one op on osd.19 that was blocked for over 1,000
> >>> >>> seconds. Hopefully this will have something that will cast a light on
> >>> >>> what is going on.
> >>> >>>
> >>> >>> We are going to upgrade this cluster to Infernalis tomorrow and rerun
> >>> >>> the test to verify the results from the dev cluster. This cluster
> >>> >>> matches the hardware of our production cluster but is not yet in
> >>> >>> production so we can safely wipe it to downgrade back to Hammer.
> >>> >>>
> >>> >>> Logs are located at http://dev.v3trae.net/~jlavoy/ceph/logs/
> >>> >>>
> >>> >>> Let me know what else we can do to help.
> >>> >>>
> >>> >>> Thanks,
> >>> >>> -----BEGIN PGP SIGNATURE-----
> >>> >>> Version: Mailvelope v1.2.0
> >>> >>> Comment: https://www.mailvelope.com
> >>> >>>
> >>> >>> wsFcBAEBCAAQBQJWFFwACRDmVDuy+mK58QAAs/UP/1L+y7DEfHqD/5OpkiNQ
> >>> >>> xuEEDm7fNJK58tLRmKsCrDrsFUvWCjiqUwboPg/E40e2GN7Lt+VkhMUEUWoo
> >>> >>> e3L20ig04c8Zu6fE/SXX3lnvayxsWTPcMnYI+HsmIV9E/efDLVLEf6T4fvXg
> >>> >>> 5dKLiqQ8Apu+UMVfd1+aKKDdLdnYlgBCZcIV9AQe1GB8X2VJJhmNWh6TQ3Xr
> >>> >>> gNXDexBdYjFBLu84FXOITd3ZtyUkgx/exCUMmwsJSc90jduzipS5hArvf7LN
> >>> >>> HD6m1gBkZNbfWfc/4nzqOQnKdY1pd9jyoiQM70jn0R5b2BlZT0wLjiAJm+07
> >>> >>> eCCQ99TZHFyeu1LyovakrYncXcnPtP5TfBFZW952FWQugupvxPCcaduz+GJV
> >>> >>> OhPAJ9dv90qbbGCO+8kpTMAD1aHgt/7+0/hKZTg8WMHhua68SFCXmdGAmqje
> >>> >>> IkIKswIAX4/uIoo5mK4TYB5HdEMJf9DzBFd+1RzzfRrrRalVkBfsu5ChFTx3
> >>> >>> mu5LAMwKTslvILMxAct0JwnwkOX5Gd+OFvmBRdm16UpDaDTQT2DfykylcmJd
> >>> >>> Cf9rPZxUv0ZHtZyTTyP2e6vgrc7UM/Ie5KonABxQ11mGtT8ysra3c9kMhYpw
> >>> >>> D6hcAZGtdvpiBRXBC5gORfiFWFxwu5kQ+daUhgUIe/O/EWyeD0rirZoqlLnZ
> >>> >>> EDrG
> >>> >>> =BZVw
> >>> >>> -----END PGP SIGNATURE-----
> >>> >>> ----------------
> >>> >>> Robert LeBlanc
> >>> >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>> >>>
> >>> >>>
> >>> >>> On Tue, Oct 6, 2015 at 2:36 PM, Robert LeBlanc  wrote:
> >>> >>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>> Hash: SHA256
> >>> >>>>
> >>> >>>> On my second test (a much longer one), it took nearly an hour, but a
> >>> >>>> few messages have popped up over a 20 window. Still far less than I
> >>> >>>> have been seeing.
> >>> >>>> - ----------------
> >>> >>>> Robert LeBlanc
> >>> >>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>> >>>>
> >>> >>>>
> >>> >>>> On Tue, Oct 6, 2015 at 2:00 PM, Robert LeBlanc  wrote:
> >>> >>>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>> Hash: SHA256
> >>> >>>>>
> >>> >>>>> I'll capture another set of logs. Is there any other debugging you
> >>> >>>>> want turned up? I've seen the same thing where I see the message
> >>> >>>>> dispatched to the secondary OSD, but the message just doesn't show 
> >>> >>>>> up
> >>> >>>>> for 30+ seconds in the secondary OSD logs.
> >>> >>>>> - ----------------
> >>> >>>>> Robert LeBlanc
> >>> >>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> On Tue, Oct 6, 2015 at 1:34 PM, Sage Weil  wrote:
> >>> >>>>>> On Tue, 6 Oct 2015, Robert LeBlanc wrote:
> >>> >>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> Hash: SHA256
> >>> >>>>>>>
> >>> >>>>>>> I can't think of anything. In my dev cluster the only thing that 
> >>> >>>>>>> has
> >>> >>>>>>> changed is the Ceph versions (no reboot). What I like is even 
> >>> >>>>>>> though
> >>> >>>>>>> the disks are 100% utilized, it is preforming as I expect now. 
> >>> >>>>>>> Client
> >>> >>>>>>> I/O is slightly degraded during the recovery, but no blocked I/O 
> >>> >>>>>>> when
> >>> >>>>>>> the OSD boots or during the recovery period. This is with
> >>> >>>>>>> max_backfills set to 20, one backfill max in our production 
> >>> >>>>>>> cluster is
> >>> >>>>>>> painful on OSD boot/recovery. I was able to reproduce this issue 
> >>> >>>>>>> on
> >>> >>>>>>> our dev cluster very easily and very quickly with these settings. 
> >>> >>>>>>> So
> >>> >>>>>>> far two tests and an hour later, only the blocked I/O when the 
> >>> >>>>>>> OSD is
> >>> >>>>>>> marked out. We would love to see that go away too, but this is far
> >>> >>>>>>                                             (me too!)
> >>> >>>>>>> better than what we have now. This dev cluster also has
> >>> >>>>>>> osd_client_message_cap set to default (100).
> >>> >>>>>>>
> >>> >>>>>>> We need to stay on the Hammer version of Ceph and I'm willing to 
> >>> >>>>>>> take
> >>> >>>>>>> the time to bisect this. If this is not a problem in 
> >>> >>>>>>> Firefly/Giant,
> >>> >>>>>>> you you prefer a bisect to find the introduction of the problem
> >>> >>>>>>> (Firefly/Giant -> Hammer) or the introduction of the resolution
> >>> >>>>>>> (Hammer -> Infernalis)? Do you have some hints to reduce hitting a
> >>> >>>>>>> commit that prevents a clean build as that is my most limiting 
> >>> >>>>>>> factor?
> >>> >>>>>>
> >>> >>>>>> Nothing comes to mind.  I think the best way to find this is still 
> >>> >>>>>> to see
> >>> >>>>>> it happen in the logs with hammer.  The frustrating thing with 
> >>> >>>>>> that log
> >>> >>>>>> dump you sent is that although I see plenty of slow request 
> >>> >>>>>> warnings in
> >>> >>>>>> the osd logs, I don't see the requests arriving.  Maybe the logs 
> >>> >>>>>> weren't
> >>> >>>>>> turned up for long enough?
> >>> >>>>>>
> >>> >>>>>> sage
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>
> >>> >>>>>>> Thanks,
> >>> >>>>>>> - ----------------
> >>> >>>>>>> Robert LeBlanc
> >>> >>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> On Tue, Oct 6, 2015 at 12:32 PM, Sage Weil  wrote:
> >>> >>>>>>> > On Tue, 6 Oct 2015, Robert LeBlanc wrote:
> >>> >>>>>>> >> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> Hash: SHA256
> >>> >>>>>>> >>
> >>> >>>>>>> >> OK, an interesting point. Running ceph version 
> >>> >>>>>>> >> 9.0.3-2036-g4f54a0d
> >>> >>>>>>> >> (4f54a0dd7c4a5c8bdc788c8b7f58048b2a28b9be) looks a lot better. 
> >>> >>>>>>> >> I got
> >>> >>>>>>> >> messages when the OSD was marked out:
> >>> >>>>>>> >>
> >>> >>>>>>> >> 2015-10-06 11:52:46.961040 osd.13 192.168.55.12:6800/20870 81 :
> >>> >>>>>>> >> cluster [WRN] 17 slow requests, 3 included below; oldest 
> >>> >>>>>>> >> blocked for >
> >>> >>>>>>> >> 34.476006 secs
> >>> >>>>>>> >> 2015-10-06 11:52:46.961056 osd.13 192.168.55.12:6800/20870 82 :
> >>> >>>>>>> >> cluster [WRN] slow request 32.913474 seconds old, received at
> >>> >>>>>>> >> 2015-10-06 11:52:14.047475: osd_op(client.600962.0:474
> >>> >>>>>>> >> rbd_data.338102ae8944a.0000000000005270 [read 3302912~4096] 
> >>> >>>>>>> >> 8.c74a4538
> >>> >>>>>>> >> ack+read+known_if_redirected e58744) currently waiting for 
> >>> >>>>>>> >> peered
> >>> >>>>>>> >> 2015-10-06 11:52:46.961066 osd.13 192.168.55.12:6800/20870 83 :
> >>> >>>>>>> >> cluster [WRN] slow request 32.697545 seconds old, received at
> >>> >>>>>>> >> 2015-10-06 11:52:14.263403: osd_op(client.600960.0:583
> >>> >>>>>>> >> rbd_data.3380f74b0dc51.000000000001ee75 [read 1016832~4096] 
> >>> >>>>>>> >> 8.778d1be3
> >>> >>>>>>> >> ack+read+known_if_redirected e58744) currently waiting for 
> >>> >>>>>>> >> peered
> >>> >>>>>>> >> 2015-10-06 11:52:46.961074 osd.13 192.168.55.12:6800/20870 84 :
> >>> >>>>>>> >> cluster [WRN] slow request 32.668006 seconds old, received at
> >>> >>>>>>> >> 2015-10-06 11:52:14.292942: osd_op(client.600955.0:571
> >>> >>>>>>> >> rbd_data.3380f74b0dc51.0000000000019b09 [read 1034240~4096] 
> >>> >>>>>>> >> 8.e87a6f58
> >>> >>>>>>> >> ack+read+known_if_redirected e58744) currently waiting for 
> >>> >>>>>>> >> peered
> >>> >>>>>>> >>
> >>> >>>>>>> >> But I'm not seeing the blocked messages when the OSD came back 
> >>> >>>>>>> >> in. The
> >>> >>>>>>> >> OSD spindles have been running at 100% during this test. I 
> >>> >>>>>>> >> have seen
> >>> >>>>>>> >> slowed I/O from the clients as expected from the extra load, 
> >>> >>>>>>> >> but so
> >>> >>>>>>> >> far no blocked messages. I'm going to run some more tests.
> >>> >>>>>>> >
> >>> >>>>>>> > Good to hear.
> >>> >>>>>>> >
> >>> >>>>>>> > FWIW I looked through the logs and all of the slow request no 
> >>> >>>>>>> > flag point
> >>> >>>>>>> > messages came from osd.163... and the logs don't show when they 
> >>> >>>>>>> > arrived.
> >>> >>>>>>> > My guess is this OSD has a slower disk than the others, or 
> >>> >>>>>>> > something else
> >>> >>>>>>> > funny is going on?
> >>> >>>>>>> >
> >>> >>>>>>> > I spot checked another OSD at random (60) where I saw a slow 
> >>> >>>>>>> > request.  It
> >>> >>>>>>> > was stuck peering for 10s of seconds... waiting on a pg log 
> >>> >>>>>>> > message from
> >>> >>>>>>> > osd.163.
> >>> >>>>>>> >
> >>> >>>>>>> > sage
> >>> >>>>>>> >
> >>> >>>>>>> >
> >>> >>>>>>> >>
> >>> >>>>>>> >> -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> >> Version: Mailvelope v1.2.0
> >>> >>>>>>> >> Comment: https://www.mailvelope.com
> >>> >>>>>>> >>
> >>> >>>>>>> >> wsFcBAEBCAAQBQJWFAzRCRDmVDuy+mK58QAASRYP/jrbKy5mptq/cSqJvB47
> >>> >>>>>>> >> F/gEatsqU4/TwyIJg137DQTkONbHKnLgCZqsJLnCZRH8fFqtvY6g/Q/AA7Ks
> >>> >>>>>>> >> ouo5gvbjKM7pOm/uUn8kU44Xe15f/bkVHvWBECZzg8YJwinPAisp5R0m1HBC
> >>> >>>>>>> >> HLvsbeqV00m72TyfsZX4aj7lHdyvcdcIH2EVgX/db092VVXczK4q2gRoNr0Y
> >>> >>>>>>> >> 77BEr2Y/gPj5LM4b/aDG5AWY8dJZRlNz+B1CyLS+kIDXSaAbzul2UbAG6jNE
> >>> >>>>>>> >> KJEVxndMPfHLIdwg55+q8VTMIjqXcCM47cQhWFrKChgVD8byJxpc6E0TqOxs
> >>> >>>>>>> >> 1gtNE8AILoCSYKnwQZan+TBDGxki7rQxzMdNI+NLfhy1Mwd3lSCPsDtD7W/i
> >>> >>>>>>> >> tzNTr6aGz+wr+OPDQV5zrzLaPZYF3FLWN4n6RYNfnDramYzD76v+7kjdW4dE
> >>> >>>>>>> >> 5UVCtE7KGLCZ21fu6sln1b9q6lYXNtohAmAunIdqpo3FmHusRySyZzYKu1+9
> >>> >>>>>>> >> zg/LHiArD/ddjkPxVWCTFBS17g/bESRcv2MsA30GS8J6k1zlQaLX5KeGg6Ql
> >>> >>>>>>> >> WJSmW8gFfEbXj/7JTrVtQWTdgjsegaySFnDisTWUR/hEM/NuKii4xfjI32M/
> >>> >>>>>>> >> luUMXHZ8lTHk9C8MfZcpyPGvwp2FliD9LqaWOVPWtWZJcerEWcZVlEApg4qb
> >>> >>>>>>> >> fo5a
> >>> >>>>>>> >> =ahEi
> >>> >>>>>>> >> -----END PGP SIGNATURE-----
> >>> >>>>>>> >> ----------------
> >>> >>>>>>> >> Robert LeBlanc
> >>> >>>>>>> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 
> >>> >>>>>>> >> B9F1
> >>> >>>>>>> >>
> >>> >>>>>>> >>
> >>> >>>>>>> >> On Tue, Oct 6, 2015 at 6:37 AM, Sage Weil  wrote:
> >>> >>>>>>> >> > On Mon, 5 Oct 2015, Robert LeBlanc wrote:
> >>> >>>>>>> >> >> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> >> Hash: SHA256
> >>> >>>>>>> >> >>
> >>> >>>>>>> >> >> With some off-list help, we have adjusted
> >>> >>>>>>> >> >> osd_client_message_cap=10000. This seems to have helped a 
> >>> >>>>>>> >> >> bit and we
> >>> >>>>>>> >> >> have seen some OSDs have a value up to 4,000 for client 
> >>> >>>>>>> >> >> messages. But
> >>> >>>>>>> >> >> it does not solve the problem with the blocked I/O.
> >>> >>>>>>> >> >>
> >>> >>>>>>> >> >> One thing that I have noticed is that almost exactly 30 
> >>> >>>>>>> >> >> seconds elapse
> >>> >>>>>>> >> >> between an OSD boots and the first blocked I/O message. I 
> >>> >>>>>>> >> >> don't know
> >>> >>>>>>> >> >> if the OSD doesn't have time to get it's brain right about 
> >>> >>>>>>> >> >> a PG before
> >>> >>>>>>> >> >> it starts servicing it or what exactly.
> >>> >>>>>>> >> >
> >>> >>>>>>> >> > I'm downloading the logs from yesterday now; sorry it's 
> >>> >>>>>>> >> > taking so long.
> >>> >>>>>>> >> >
> >>> >>>>>>> >> >> On another note, I tried upgrading our CentOS dev cluster 
> >>> >>>>>>> >> >> from Hammer
> >>> >>>>>>> >> >> to master and things didn't go so well. The OSDs would not 
> >>> >>>>>>> >> >> start
> >>> >>>>>>> >> >> because /var/lib/ceph was not owned by ceph. I chowned the 
> >>> >>>>>>> >> >> directory
> >>> >>>>>>> >> >> and all OSDs and the OSD then started, but never became 
> >>> >>>>>>> >> >> active in the
> >>> >>>>>>> >> >> cluster. It just sat there after reading all the PGs. There 
> >>> >>>>>>> >> >> were
> >>> >>>>>>> >> >> sockets open to the monitor, but no OSD to OSD sockets. I 
> >>> >>>>>>> >> >> tried
> >>> >>>>>>> >> >> downgrading to the Infernalis branch and still no luck 
> >>> >>>>>>> >> >> getting the
> >>> >>>>>>> >> >> OSDs to come up. The OSD processes were idle after the 
> >>> >>>>>>> >> >> initial boot.
> >>> >>>>>>> >> >> All packages were installed from gitbuilder.
> >>> >>>>>>> >> >
> >>> >>>>>>> >> > Did you chown -R ?
> >>> >>>>>>> >> >
> >>> >>>>>>> >> >         
> >>> >>>>>>> >> > https://github.com/ceph/ceph/blob/infernalis/doc/release-notes.rst#upgrading-from-hammer
> >>> >>>>>>> >> >
> >>> >>>>>>> >> > My guess is you only chowned the root dir, and the OSD 
> >>> >>>>>>> >> > didn't throw
> >>> >>>>>>> >> > an error when it encountered the other files?  If you can 
> >>> >>>>>>> >> > generate a debug
> >>> >>>>>>> >> > osd = 20 log, that would be helpful.. thanks!
> >>> >>>>>>> >> >
> >>> >>>>>>> >> > sage
> >>> >>>>>>> >> >
> >>> >>>>>>> >> >
> >>> >>>>>>> >> >>
> >>> >>>>>>> >> >> Thanks,
> >>> >>>>>>> >> >> -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> >> >> Version: Mailvelope v1.2.0
> >>> >>>>>>> >> >> Comment: https://www.mailvelope.com
> >>> >>>>>>> >> >>
> >>> >>>>>>> >> >> wsFcBAEBCAAQBQJWE0F5CRDmVDuy+mK58QAAaCYQAJuFcCvRUJ46k0rYrMcc
> >>> >>>>>>> >> >> YlrSrGwS57GJS/JjaFHsvBV7KTobEMNeMkSv4PTGpwylNV9Dx4Ad74DDqX4g
> >>> >>>>>>> >> >> 6hZDe0rE+uEI7tW9Lqp+MN7eaU2lDuwLt/pOzZI14jTskUYTlNi3HjlN67mQ
> >>> >>>>>>> >> >> aiX1rbrJL6FFkuMOn/YqHpMbxI5ZOUZc1s7RDhASOPIs4z/CxpDfluW6fZA/
> >>> >>>>>>> >> >> y8C+pW6zzS9U/6jZwtGhBq4dvDBO41Lxb9WOehD8Aa/Qt6XNDzGw2KEkEkw7
> >>> >>>>>>> >> >> 8dBc7UFa2Wx3Tnzy238a/nKhtz6O6OrHsroA+HGWwCoxPWjOsz/xOoOmfwp+
> >>> >>>>>>> >> >> ALkY3id+t2uJEqzbL8/MgJ2RV1A+AZ7W1VWIJUOkDz0wR+KxQsxduHoD6rQy
> >>> >>>>>>> >> >> zg0fj2KSAlmVusYOPM1s1+jBsqNF3wcNxpbRoVuFqk0xMgGPrIdUNdZHg6bs
> >>> >>>>>>> >> >> D5sfkjNKexFe0ifFJ0cfv6UaGIKv4dK2eq3jUKgXHfh/qZmJbEB+zHaqJNyg
> >>> >>>>>>> >> >> CN6w6xu1FHLeVobKAWe5ZzKY5lxw6b8YG+ce/E2dvW73gSASPTvtv68gaT04
> >>> >>>>>>> >> >> 2SPF9Ql0fERL5EDY9Pc4MHpQVcS0XxxJA69CgnWgaG6fzq2eY7fALeMBVWlB
> >>> >>>>>>> >> >> fRj3zQwqJls/X8JZ3c4P4G0R6DP9bmMwGr++oYc3gWGrvgzxw3N7+ornd0jd
> >>> >>>>>>> >> >> GdXC
> >>> >>>>>>> >> >> =Aigq
> >>> >>>>>>> >> >> -----END PGP SIGNATURE-----
> >>> >>>>>>> >> >> ----------------
> >>> >>>>>>> >> >> Robert LeBlanc
> >>> >>>>>>> >> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 
> >>> >>>>>>> >> >> FA62 B9F1
> >>> >>>>>>> >> >>
> >>> >>>>>>> >> >>
> >>> >>>>>>> >> >> On Sun, Oct 4, 2015 at 3:04 PM, Robert LeBlanc  wrote:
> >>> >>>>>>> >> >> > -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> >> > Hash: SHA256
> >>> >>>>>>> >> >> >
> >>> >>>>>>> >> >> > I have eight nodes running the fio job rbd_test_real to 
> >>> >>>>>>> >> >> > different RBD
> >>> >>>>>>> >> >> > volumes. I've included the CRUSH map in the tarball.
> >>> >>>>>>> >> >> >
> >>> >>>>>>> >> >> > I stopped one OSD process and marked it out. I let it 
> >>> >>>>>>> >> >> > recover for a
> >>> >>>>>>> >> >> > few minutes and then I started the process again and 
> >>> >>>>>>> >> >> > marked it in. I
> >>> >>>>>>> >> >> > started getting block I/O messages during the recovery.
> >>> >>>>>>> >> >> >
> >>> >>>>>>> >> >> > The logs are located at 
> >>> >>>>>>> >> >> > http://162.144.87.113/files/ushou1.tar.xz
> >>> >>>>>>> >> >> >
> >>> >>>>>>> >> >> > Thanks,
> >>> >>>>>>> >> >> > -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> >> >> > Version: Mailvelope v1.2.0
> >>> >>>>>>> >> >> > Comment: https://www.mailvelope.com
> >>> >>>>>>> >> >> >
> >>> >>>>>>> >> >> > wsFcBAEBCAAQBQJWEZRcCRDmVDuy+mK58QAALbEQAK5pFiixJarUdLm50zp/
> >>> >>>>>>> >> >> > 3AGgGBPrieExKmoZZLCoMGfOLfxZDbN2ybtopKDQDfrTqndE/6Xi9UXqTOdW
> >>> >>>>>>> >> >> > jDc9U1wusgG0CKPsY1SMYnB9akvaDwtdh5q5k4VpN2zsG9R6lRojHeNQR3Nf
> >>> >>>>>>> >> >> > 56QevJL4/e5lC3sLhVnxXXi2XKnHCVOHT+PYgNour2ZWt6OTLoFFxuSU3zLN
> >>> >>>>>>> >> >> > OtfXgrFiiNF0mrDpm0gg2l8a8N5SwP9mM233S2U/JiGAqsqoqkfd0okjDenC
> >>> >>>>>>> >> >> > ksesU/n7zordFpfLN3yjL6+X9pQ4YA6otZrq4wWtjWKO/H0b+6iIsf/AE131
> >>> >>>>>>> >> >> > R6a4Vufndpd3Ce+FNfM+iu3FmKk0KVfDAaF/tIP6S6XUzGVMAbpvpmqNL17o
> >>> >>>>>>> >> >> > boh3wPZEyK+7KiF4Qlt2KoI/FV24Yj8XiyMnKin3MbMYbammb4ER977VH7iI
> >>> >>>>>>> >> >> > sZyelNPSsYmmw/MF+AkA5KVgzQ4DAPflaejIgC5uw3dYKrn2AQE5CE9nN8Gz
> >>> >>>>>>> >> >> > GVVaGItu1Bvrz21QoT9o5v0dZ85zttFvtrKIYgSi4mdpC6XkzUbg9s9EB1/T
> >>> >>>>>>> >> >> > SEY+fau7W7TtiLpzCAIQ3zDvgsvkx2P6tKg5U8e93LVv9B+YI8i8mUxxv1j5
> >>> >>>>>>> >> >> > PHFi7KTgRUPm1FPMJDSyzvOgqyMj9AzaESl1Na6k529ILFIcyfko0niTT1oZ
> >>> >>>>>>> >> >> > 3EPx
> >>> >>>>>>> >> >> > =UDIV
> >>> >>>>>>> >> >> > -----END PGP SIGNATURE-----
> >>> >>>>>>> >> >> >
> >>> >>>>>>> >> >> > ----------------
> >>> >>>>>>> >> >> > Robert LeBlanc
> >>> >>>>>>> >> >> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 
> >>> >>>>>>> >> >> > FA62 B9F1
> >>> >>>>>>> >> >> >
> >>> >>>>>>> >> >> >
> >>> >>>>>>> >> >> > On Sun, Oct 4, 2015 at 7:48 AM, Sage Weil  wrote:
> >>> >>>>>>> >> >> >> On Sat, 3 Oct 2015, Robert LeBlanc wrote:
> >>> >>>>>>> >> >> >>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> >> >>> Hash: SHA256
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> We are still struggling with this and have tried a lot 
> >>> >>>>>>> >> >> >>> of different
> >>> >>>>>>> >> >> >>> things. Unfortunately, Inktank (now Red Hat) no longer 
> >>> >>>>>>> >> >> >>> provides
> >>> >>>>>>> >> >> >>> consulting services for non-Red Hat systems. If there 
> >>> >>>>>>> >> >> >>> are some
> >>> >>>>>>> >> >> >>> certified Ceph consultants in the US that we can do 
> >>> >>>>>>> >> >> >>> both remote and
> >>> >>>>>>> >> >> >>> on-site engagements, please let us know.
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> This certainly seems to be network related, but 
> >>> >>>>>>> >> >> >>> somewhere in the
> >>> >>>>>>> >> >> >>> kernel. We have tried increasing the network and TCP 
> >>> >>>>>>> >> >> >>> buffers, number
> >>> >>>>>>> >> >> >>> of TCP sockets, reduced the FIN_WAIT2 state. There is 
> >>> >>>>>>> >> >> >>> about 25% idle
> >>> >>>>>>> >> >> >>> on the boxes, the disks are busy, but not constantly at 
> >>> >>>>>>> >> >> >>> 100% (they
> >>> >>>>>>> >> >> >>> cycle from <10% up to 100%, but not 100% for more than 
> >>> >>>>>>> >> >> >>> a few seconds
> >>> >>>>>>> >> >> >>> at a time). There seems to be no reasonable explanation 
> >>> >>>>>>> >> >> >>> why I/O is
> >>> >>>>>>> >> >> >>> blocked pretty frequently longer than 30 seconds. We 
> >>> >>>>>>> >> >> >>> have verified
> >>> >>>>>>> >> >> >>> Jumbo frames by pinging from/to each node with 9000 
> >>> >>>>>>> >> >> >>> byte packets. The
> >>> >>>>>>> >> >> >>> network admins have verified that packets are not being 
> >>> >>>>>>> >> >> >>> dropped in the
> >>> >>>>>>> >> >> >>> switches for these nodes. We have tried different 
> >>> >>>>>>> >> >> >>> kernels including
> >>> >>>>>>> >> >> >>> the recent Google patch to cubic. This is showing up on 
> >>> >>>>>>> >> >> >>> three cluster
> >>> >>>>>>> >> >> >>> (two Ethernet and one IPoIB). I booted one cluster into 
> >>> >>>>>>> >> >> >>> Debian Jessie
> >>> >>>>>>> >> >> >>> (from CentOS 7.1) with similar results.
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> The messages seem slightly different:
> >>> >>>>>>> >> >> >>> 2015-10-03 14:38:23.193082 osd.134 
> >>> >>>>>>> >> >> >>> 10.208.16.25:6800/1425 439 :
> >>> >>>>>>> >> >> >>> cluster [WRN] 14 slow requests, 1 included below; 
> >>> >>>>>>> >> >> >>> oldest blocked for >
> >>> >>>>>>> >> >> >>> 100.087155 secs
> >>> >>>>>>> >> >> >>> 2015-10-03 14:38:23.193090 osd.134 
> >>> >>>>>>> >> >> >>> 10.208.16.25:6800/1425 440 :
> >>> >>>>>>> >> >> >>> cluster [WRN] slow request 30.041999 seconds old, 
> >>> >>>>>>> >> >> >>> received at
> >>> >>>>>>> >> >> >>> 2015-10-03 14:37:53.151014: 
> >>> >>>>>>> >> >> >>> osd_op(client.1328605.0:7082862
> >>> >>>>>>> >> >> >>> rbd_data.13fdcb2ae8944a.000000000001264f [read 
> >>> >>>>>>> >> >> >>> 975360~4096]
> >>> >>>>>>> >> >> >>> 11.6d19c36f ack+read+known_if_redirected e10249) 
> >>> >>>>>>> >> >> >>> currently no flag
> >>> >>>>>>> >> >> >>> points reached
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> I don't know what "no flag points reached" means.
> >>> >>>>>>> >> >> >>
> >>> >>>>>>> >> >> >> Just that the op hasn't been marked as reaching any 
> >>> >>>>>>> >> >> >> interesting points
> >>> >>>>>>> >> >> >> (op->mark_*() calls).
> >>> >>>>>>> >> >> >>
> >>> >>>>>>> >> >> >> Is it possible to gather a lot with debug ms = 20 and 
> >>> >>>>>>> >> >> >> debug osd = 20?
> >>> >>>>>>> >> >> >> It's extremely verbose but it'll let us see where the op 
> >>> >>>>>>> >> >> >> is getting
> >>> >>>>>>> >> >> >> blocked.  If you see the "slow request" message it means 
> >>> >>>>>>> >> >> >> the op in
> >>> >>>>>>> >> >> >> received by ceph (that's when the clock starts), so I 
> >>> >>>>>>> >> >> >> suspect it's not
> >>> >>>>>>> >> >> >> something we can blame on the network stack.
> >>> >>>>>>> >> >> >>
> >>> >>>>>>> >> >> >> sage
> >>> >>>>>>> >> >> >>
> >>> >>>>>>> >> >> >>
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> The problem is most pronounced when we have to reboot 
> >>> >>>>>>> >> >> >>> an OSD node (1
> >>> >>>>>>> >> >> >>> of 13), we will have hundreds of I/O blocked for some 
> >>> >>>>>>> >> >> >>> times up to 300
> >>> >>>>>>> >> >> >>> seconds. It takes a good 15 minutes for things to 
> >>> >>>>>>> >> >> >>> settle down. The
> >>> >>>>>>> >> >> >>> production cluster is very busy doing normally 8,000 
> >>> >>>>>>> >> >> >>> I/O and peaking
> >>> >>>>>>> >> >> >>> at 15,000. This is all 4TB spindles with SSD journals 
> >>> >>>>>>> >> >> >>> and the disks
> >>> >>>>>>> >> >> >>> are between 25-50% full. We are currently splitting PGs 
> >>> >>>>>>> >> >> >>> to distribute
> >>> >>>>>>> >> >> >>> the load better across the disks, but we are having to 
> >>> >>>>>>> >> >> >>> do this 10 PGs
> >>> >>>>>>> >> >> >>> at a time as we get blocked I/O. We have max_backfills 
> >>> >>>>>>> >> >> >>> and
> >>> >>>>>>> >> >> >>> max_recovery set to 1, client op priority is set higher 
> >>> >>>>>>> >> >> >>> than recovery
> >>> >>>>>>> >> >> >>> priority. We tried increasing the number of op threads 
> >>> >>>>>>> >> >> >>> but this didn't
> >>> >>>>>>> >> >> >>> seem to help. It seems as soon as PGs are finished 
> >>> >>>>>>> >> >> >>> being checked, they
> >>> >>>>>>> >> >> >>> become active and could be the cause for slow I/O while 
> >>> >>>>>>> >> >> >>> the other PGs
> >>> >>>>>>> >> >> >>> are being checked.
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> What I don't understand is that the messages are 
> >>> >>>>>>> >> >> >>> delayed. As soon as
> >>> >>>>>>> >> >> >>> the message is received by Ceph OSD process, it is very 
> >>> >>>>>>> >> >> >>> quickly
> >>> >>>>>>> >> >> >>> committed to the journal and a response is sent back to 
> >>> >>>>>>> >> >> >>> the primary
> >>> >>>>>>> >> >> >>> OSD which is received very quickly as well. I've adjust
> >>> >>>>>>> >> >> >>> min_free_kbytes and it seems to keep the OSDs from 
> >>> >>>>>>> >> >> >>> crashing, but
> >>> >>>>>>> >> >> >>> doesn't solve the main problem. We don't have swap and 
> >>> >>>>>>> >> >> >>> there is 64 GB
> >>> >>>>>>> >> >> >>> of RAM per nodes for 10 OSDs.
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> Is there something that could cause the kernel to get a 
> >>> >>>>>>> >> >> >>> packet but not
> >>> >>>>>>> >> >> >>> be able to dispatch it to Ceph such that it could be 
> >>> >>>>>>> >> >> >>> explaining why we
> >>> >>>>>>> >> >> >>> are seeing these blocked I/O for 30+ seconds. Is there 
> >>> >>>>>>> >> >> >>> some pointers
> >>> >>>>>>> >> >> >>> to tracing Ceph messages from the network buffer 
> >>> >>>>>>> >> >> >>> through the kernel to
> >>> >>>>>>> >> >> >>> the Ceph process?
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> We can really use some pointers no matter how 
> >>> >>>>>>> >> >> >>> outrageous. We've have
> >>> >>>>>>> >> >> >>> over 6 people looking into this for weeks now and just 
> >>> >>>>>>> >> >> >>> can't think of
> >>> >>>>>>> >> >> >>> anything else.
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> Thanks,
> >>> >>>>>>> >> >> >>> -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> Version: Mailvelope v1.1.0
> >>> >>>>>>> >> >> >>> Comment: https://www.mailvelope.com
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> wsFcBAEBCAAQBQJWEDY1CRDmVDuy+mK58QAARgoP/RcoL1qVmg7qbQrzStar
> >>> >>>>>>> >> >> >>> NK80bqYGeYHb26xHbt1fZVgnZhXU0nN0Dv4ew0e/cYJLELSO2KCeXNfXN6F1
> >>> >>>>>>> >> >> >>> prZuzYagYEyj1Q1TOo+4h/nOQRYsTwQDdFzbHb/OUDN55C0QGZ29DjEvrqP6
> >>> >>>>>>> >> >> >>> K5l6sAQzvQDpUEEIiOCkS6pH59ira740nSmnYkEWhr1lxF/hMjb6fFlfCFe2
> >>> >>>>>>> >> >> >>> h1djM0GfY7vBHFGgI3jkw0BL5AQnWe+SCcCiKZmxY6xiR70FWl3XqK5M+nxm
> >>> >>>>>>> >> >> >>> iq74y7Dv6cpenit6boMr6qtOeIt+8ko85hVMh09Hkaqz/m2FzxAKLcahzkGF
> >>> >>>>>>> >> >> >>> Fh/M6YBzgnX7QBURTC4YQT/FVyDTW3JMuT3RKQdaX6c0iiOsVdkE+iyidWyY
> >>> >>>>>>> >> >> >>> Hr1KzWU23Ur9yBfZ39Y43jrsSiAEwHnKjSqMowSGljdTysNEAAZQhlqZIoHb
> >>> >>>>>>> >> >> >>> JlgpB39ugkHI1H5fZ5b2SIDz32/d5ywG4Gay9Rk6hp8VanvIrBbev+JYEoYT
> >>> >>>>>>> >> >> >>> 8/WX+fhueHt4dqUYWIl3HZ0CEzbXbug0xmFvhrbmL2f3t9XOkDZRbAjlYrGm
> >>> >>>>>>> >> >> >>> lswiJMDueY8JkxSnPvCQrHXqjbCcy9rMG7nTnLFz98rTcHNCwtpv0qVYhheg
> >>> >>>>>>> >> >> >>> 4YRNRVMbfNP/6xsJvG1wVOSQPwxZSPqJh42pDqMRePJl3Zn66MTx5wvdNDpk
> >>> >>>>>>> >> >> >>> l7OF
> >>> >>>>>>> >> >> >>> =OI++
> >>> >>>>>>> >> >> >>> -----END PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> ----------------
> >>> >>>>>>> >> >> >>> Robert LeBlanc
> >>> >>>>>>> >> >> >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 
> >>> >>>>>>> >> >> >>> 3BB2 FA62 B9F1
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>> On Fri, Sep 25, 2015 at 2:40 PM, Robert LeBlanc  wrote:
> >>> >>>>>>> >> >> >>> > We dropped the replication on our cluster from 4 to 3 
> >>> >>>>>>> >> >> >>> > and it looks
> >>> >>>>>>> >> >> >>> > like all the blocked I/O has stopped (no entries in 
> >>> >>>>>>> >> >> >>> > the log for the
> >>> >>>>>>> >> >> >>> > last 12 hours). This makes me believe that there is 
> >>> >>>>>>> >> >> >>> > some issue with
> >>> >>>>>>> >> >> >>> > the number of sockets or some other TCP issue. We 
> >>> >>>>>>> >> >> >>> > have not messed with
> >>> >>>>>>> >> >> >>> > Ephemeral ports and TIME_WAIT at this point. There 
> >>> >>>>>>> >> >> >>> > are 130 OSDs, 8 KVM
> >>> >>>>>>> >> >> >>> > hosts hosting about 150 VMs. Open files is set at 32K 
> >>> >>>>>>> >> >> >>> > for the OSD
> >>> >>>>>>> >> >> >>> > processes and 16K system wide.
> >>> >>>>>>> >> >> >>> >
> >>> >>>>>>> >> >> >>> > Does this seem like the right spot to be looking? 
> >>> >>>>>>> >> >> >>> > What are some
> >>> >>>>>>> >> >> >>> > configuration items we should be looking at?
> >>> >>>>>>> >> >> >>> >
> >>> >>>>>>> >> >> >>> > Thanks,
> >>> >>>>>>> >> >> >>> > ----------------
> >>> >>>>>>> >> >> >>> > Robert LeBlanc
> >>> >>>>>>> >> >> >>> > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 
> >>> >>>>>>> >> >> >>> > 3BB2 FA62 B9F1
> >>> >>>>>>> >> >> >>> >
> >>> >>>>>>> >> >> >>> >
> >>> >>>>>>> >> >> >>> > On Wed, Sep 23, 2015 at 1:30 PM, Robert LeBlanc  
> >>> >>>>>>> >> >> >>> > wrote:
> >>> >>>>>>> >> >> >>> >> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> >> >>> >> Hash: SHA256
> >>> >>>>>>> >> >> >>> >>
> >>> >>>>>>> >> >> >>> >> We were able to only get ~17Gb out of the XL710 
> >>> >>>>>>> >> >> >>> >> (heavily tweaked)
> >>> >>>>>>> >> >> >>> >> until we went to the 4.x kernel where we got ~36Gb 
> >>> >>>>>>> >> >> >>> >> (no tweaking). It
> >>> >>>>>>> >> >> >>> >> seems that there were some major reworks in the 
> >>> >>>>>>> >> >> >>> >> network handling in
> >>> >>>>>>> >> >> >>> >> the kernel to efficiently handle that network rate. 
> >>> >>>>>>> >> >> >>> >> If I remember
> >>> >>>>>>> >> >> >>> >> right we also saw a drop in CPU utilization. I'm 
> >>> >>>>>>> >> >> >>> >> starting to think
> >>> >>>>>>> >> >> >>> >> that we did see packet loss while congesting our 
> >>> >>>>>>> >> >> >>> >> ISLs in our initial
> >>> >>>>>>> >> >> >>> >> testing, but we could not tell where the dropping 
> >>> >>>>>>> >> >> >>> >> was happening. We
> >>> >>>>>>> >> >> >>> >> saw some on the switches, but it didn't seem to be 
> >>> >>>>>>> >> >> >>> >> bad if we weren't
> >>> >>>>>>> >> >> >>> >> trying to congest things. We probably already saw 
> >>> >>>>>>> >> >> >>> >> this issue, just
> >>> >>>>>>> >> >> >>> >> didn't know it.
> >>> >>>>>>> >> >> >>> >> - ----------------
> >>> >>>>>>> >> >> >>> >> Robert LeBlanc
> >>> >>>>>>> >> >> >>> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 
> >>> >>>>>>> >> >> >>> >> 3BB2 FA62 B9F1
> >>> >>>>>>> >> >> >>> >>
> >>> >>>>>>> >> >> >>> >>
> >>> >>>>>>> >> >> >>> >> On Wed, Sep 23, 2015 at 1:10 PM, Mark Nelson  wrote:
> >>> >>>>>>> >> >> >>> >>> FWIW, we've got some 40GbE Intel cards in the 
> >>> >>>>>>> >> >> >>> >>> community performance cluster
> >>> >>>>>>> >> >> >>> >>> on a Mellanox 40GbE switch that appear (knock on 
> >>> >>>>>>> >> >> >>> >>> wood) to be running fine
> >>> >>>>>>> >> >> >>> >>> with 3.10.0-229.7.2.el7.x86_64.  We did get 
> >>> >>>>>>> >> >> >>> >>> feedback from Intel that older
> >>> >>>>>>> >> >> >>> >>> drivers might cause problems though.
> >>> >>>>>>> >> >> >>> >>>
> >>> >>>>>>> >> >> >>> >>> Here's ifconfig from one of the nodes:
> >>> >>>>>>> >> >> >>> >>>
> >>> >>>>>>> >> >> >>> >>> ens513f1: flags=4163  mtu 1500
> >>> >>>>>>> >> >> >>> >>>         inet 10.0.10.101  netmask 255.255.255.0  
> >>> >>>>>>> >> >> >>> >>> broadcast 10.0.10.255
> >>> >>>>>>> >> >> >>> >>>         inet6 fe80::6a05:caff:fe2b:7ea1  prefixlen 
> >>> >>>>>>> >> >> >>> >>> 64  scopeid 0x20
> >>> >>>>>>> >> >> >>> >>>         ether 68:05:ca:2b:7e:a1  txqueuelen 1000  
> >>> >>>>>>> >> >> >>> >>> (Ethernet)
> >>> >>>>>>> >> >> >>> >>>         RX packets 169232242875  bytes 
> >>> >>>>>>> >> >> >>> >>> 229346261232279 (208.5 TiB)
> >>> >>>>>>> >> >> >>> >>>         RX errors 0  dropped 0  overruns 0  frame 0
> >>> >>>>>>> >> >> >>> >>>         TX packets 153491686361  bytes 
> >>> >>>>>>> >> >> >>> >>> 203976410836881 (185.5 TiB)
> >>> >>>>>>> >> >> >>> >>>         TX errors 0  dropped 0 overruns 0  carrier 
> >>> >>>>>>> >> >> >>> >>> 0  collisions 0
> >>> >>>>>>> >> >> >>> >>>
> >>> >>>>>>> >> >> >>> >>> Mark
> >>> >>>>>>> >> >> >>> >>>
> >>> >>>>>>> >> >> >>> >>>
> >>> >>>>>>> >> >> >>> >>> On 09/23/2015 01:48 PM, Robert LeBlanc wrote:
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> >> >>> >>>> Hash: SHA256
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> OK, here is the update on the saga...
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> I traced some more of blocked I/Os and it seems 
> >>> >>>>>>> >> >> >>> >>>> that communication
> >>> >>>>>>> >> >> >>> >>>> between two hosts seemed worse than others. I did 
> >>> >>>>>>> >> >> >>> >>>> a two way ping flood
> >>> >>>>>>> >> >> >>> >>>> between the two hosts using max packet sizes 
> >>> >>>>>>> >> >> >>> >>>> (1500). After 1.5M
> >>> >>>>>>> >> >> >>> >>>> packets, no lost pings. Then then had the ping 
> >>> >>>>>>> >> >> >>> >>>> flood running while I
> >>> >>>>>>> >> >> >>> >>>> put Ceph load on the cluster and the dropped pings 
> >>> >>>>>>> >> >> >>> >>>> started increasing
> >>> >>>>>>> >> >> >>> >>>> after stopping the Ceph workload the pings stopped 
> >>> >>>>>>> >> >> >>> >>>> dropping.
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> I then ran iperf between all the nodes with the 
> >>> >>>>>>> >> >> >>> >>>> same results, so that
> >>> >>>>>>> >> >> >>> >>>> ruled out Ceph to a large degree. I then booted in 
> >>> >>>>>>> >> >> >>> >>>> the the
> >>> >>>>>>> >> >> >>> >>>> 3.10.0-229.14.1.el7.x86_64 kernel and with an hour 
> >>> >>>>>>> >> >> >>> >>>> test so far there
> >>> >>>>>>> >> >> >>> >>>> hasn't been any dropped pings or blocked I/O. Our 
> >>> >>>>>>> >> >> >>> >>>> 40 Gb NICs really
> >>> >>>>>>> >> >> >>> >>>> need the network enhancements in the 4.x series to 
> >>> >>>>>>> >> >> >>> >>>> work well.
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> Does this sound familiar to anyone? I'll probably 
> >>> >>>>>>> >> >> >>> >>>> start bisecting the
> >>> >>>>>>> >> >> >>> >>>> kernel to see where this issue in introduced. Both 
> >>> >>>>>>> >> >> >>> >>>> of the clusters
> >>> >>>>>>> >> >> >>> >>>> with this issue are running 4.x, other than that, 
> >>> >>>>>>> >> >> >>> >>>> they are pretty
> >>> >>>>>>> >> >> >>> >>>> differing hardware and network configs.
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> Thanks,
> >>> >>>>>>> >> >> >>> >>>> -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> >>>> Version: Mailvelope v1.1.0
> >>> >>>>>>> >> >> >>> >>>> Comment: https://www.mailvelope.com
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> wsFcBAEBCAAQBQJWAvOzCRDmVDuy+mK58QAApOMP/1xmCtW++G11qcE8y/sr
> >>> >>>>>>> >> >> >>> >>>> RkXguqZJLc4czdOwV/tjUvhVsm5qOl4wvQCtABFZpc6t4+m5nzE3LkA1rl2l
> >>> >>>>>>> >> >> >>> >>>> AnARPOjh61TO6cV0CT8O0DlqtHmSd2y0ElgAUl0594eInEn7eI7crz8R543V
> >>> >>>>>>> >> >> >>> >>>> 7I68XU5zL/vNJ9IIx38UqdhtSzXQQL664DGq3DLINK0Yb9XRVBlFip+Slt+j
> >>> >>>>>>> >> >> >>> >>>> cB64TuWjOPLSH09pv7SUyksodqrTq3K7p6sQkq0MOzBkFQM1FHfOipbo/LYv
> >>> >>>>>>> >> >> >>> >>>> F42iiQbCvFizArMu20WeOSQ4dmrXT/iecgTfEag/Zxvor2gOi/J6d2XS9ckW
> >>> >>>>>>> >> >> >>> >>>> byEC5/rbm4yDBua2ZugeNxQLWq0Oa7spZnx7usLsu/6YzeDNI6kmtGURajdE
> >>> >>>>>>> >> >> >>> >>>> /XC8bESWKveBzmGDzjff5oaMs9A1PZURYnlYADEODGAt6byoaoQEGN6dlFGe
> >>> >>>>>>> >> >> >>> >>>> LwQ5nOdQYuUrWpJzTJBN3aduOxursoFY8S0eR0uXm0l1CHcp22RWBDvRinok
> >>> >>>>>>> >> >> >>> >>>> UWk5xRBgjDCD2gIwc+wpImZbCtiTdf0vad1uLvdxGL29iFta4THzJgUGrp98
> >>> >>>>>>> >> >> >>> >>>> sUqM3RaTRdJYjFcNP293H7/DC0mqpnmo0Clx3jkdHX+x1EXpJUtocSeI44LX
> >>> >>>>>>> >> >> >>> >>>> KWIMhe9wXtKAoHQFEcJ0o0+wrXWMevvx33HPC4q1ULrFX0ILNx5Mo0Rp944X
> >>> >>>>>>> >> >> >>> >>>> 4OEo
> >>> >>>>>>> >> >> >>> >>>> =P33I
> >>> >>>>>>> >> >> >>> >>>> -----END PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> >>>> ----------------
> >>> >>>>>>> >> >> >>> >>>> Robert LeBlanc
> >>> >>>>>>> >> >> >>> >>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E 
> >>> >>>>>>> >> >> >>> >>>> E654 3BB2 FA62 B9F1
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> On Tue, Sep 22, 2015 at 4:15 PM, Robert LeBlanc
> >>> >>>>>>> >> >> >>> >>>> wrote:
> >>> >>>>>>> >> >> >>> >>>>>
> >>> >>>>>>> >> >> >>> >>>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> >> >>> >>>>> Hash: SHA256
> >>> >>>>>>> >> >> >>> >>>>>
> >>> >>>>>>> >> >> >>> >>>>> This is IPoIB and we have the MTU set to 64K. 
> >>> >>>>>>> >> >> >>> >>>>> There was some issues
> >>> >>>>>>> >> >> >>> >>>>> pinging hosts with "No buffer space available" 
> >>> >>>>>>> >> >> >>> >>>>> (hosts are currently
> >>> >>>>>>> >> >> >>> >>>>> configured for 4GB to test SSD caching rather 
> >>> >>>>>>> >> >> >>> >>>>> than page cache). I
> >>> >>>>>>> >> >> >>> >>>>> found that MTU under 32K worked reliable for 
> >>> >>>>>>> >> >> >>> >>>>> ping, but still had the
> >>> >>>>>>> >> >> >>> >>>>> blocked I/O.
> >>> >>>>>>> >> >> >>> >>>>>
> >>> >>>>>>> >> >> >>> >>>>> I reduced the MTU to 1500 and checked pings (OK), 
> >>> >>>>>>> >> >> >>> >>>>> but I'm still seeing
> >>> >>>>>>> >> >> >>> >>>>> the blocked I/O.
> >>> >>>>>>> >> >> >>> >>>>> - ----------------
> >>> >>>>>>> >> >> >>> >>>>> Robert LeBlanc
> >>> >>>>>>> >> >> >>> >>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E 
> >>> >>>>>>> >> >> >>> >>>>> E654 3BB2 FA62 B9F1
> >>> >>>>>>> >> >> >>> >>>>>
> >>> >>>>>>> >> >> >>> >>>>>
> >>> >>>>>>> >> >> >>> >>>>> On Tue, Sep 22, 2015 at 3:52 PM, Sage Weil  wrote:
> >>> >>>>>>> >> >> >>> >>>>>>
> >>> >>>>>>> >> >> >>> >>>>>> On Tue, 22 Sep 2015, Samuel Just wrote:
> >>> >>>>>>> >> >> >>> >>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>> I looked at the logs, it looks like there was a 
> >>> >>>>>>> >> >> >>> >>>>>>> 53 second delay
> >>> >>>>>>> >> >> >>> >>>>>>> between when osd.17 started sending the 
> >>> >>>>>>> >> >> >>> >>>>>>> osd_repop message and when
> >>> >>>>>>> >> >> >>> >>>>>>> osd.13 started reading it, which is pretty 
> >>> >>>>>>> >> >> >>> >>>>>>> weird.  Sage, didn't we
> >>> >>>>>>> >> >> >>> >>>>>>> once see a kernel issue which caused some 
> >>> >>>>>>> >> >> >>> >>>>>>> messages to be mysteriously
> >>> >>>>>>> >> >> >>> >>>>>>> delayed for many 10s of seconds?
> >>> >>>>>>> >> >> >>> >>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>
> >>> >>>>>>> >> >> >>> >>>>>> Every time we have seen this behavior and 
> >>> >>>>>>> >> >> >>> >>>>>> diagnosed it in the wild it
> >>> >>>>>>> >> >> >>> >>>>>> has
> >>> >>>>>>> >> >> >>> >>>>>> been a network misconfiguration.  Usually 
> >>> >>>>>>> >> >> >>> >>>>>> related to jumbo frames.
> >>> >>>>>>> >> >> >>> >>>>>>
> >>> >>>>>>> >> >> >>> >>>>>> sage
> >>> >>>>>>> >> >> >>> >>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>> What kernel are you running?
> >>> >>>>>>> >> >> >>> >>>>>>> -Sam
> >>> >>>>>>> >> >> >>> >>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>> On Tue, Sep 22, 2015 at 2:22 PM, Robert LeBlanc 
> >>> >>>>>>> >> >> >>> >>>>>>>  wrote:
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> >> >>> >>>>>>>> Hash: SHA256
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> OK, looping in ceph-devel to see if I can get 
> >>> >>>>>>> >> >> >>> >>>>>>>> some more eyes. I've
> >>> >>>>>>> >> >> >>> >>>>>>>> extracted what I think are important entries 
> >>> >>>>>>> >> >> >>> >>>>>>>> from the logs for the
> >>> >>>>>>> >> >> >>> >>>>>>>> first blocked request. NTP is running all the 
> >>> >>>>>>> >> >> >>> >>>>>>>> servers so the logs
> >>> >>>>>>> >> >> >>> >>>>>>>> should be close in terms of time. Logs for 
> >>> >>>>>>> >> >> >>> >>>>>>>> 12:50 to 13:00 are
> >>> >>>>>>> >> >> >>> >>>>>>>> available at 
> >>> >>>>>>> >> >> >>> >>>>>>>> http://162.144.87.113/files/ceph_block_io.logs.tar.xz
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:06.500374 - osd.17 gets I/O 
> >>> >>>>>>> >> >> >>> >>>>>>>> from client
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:06.557160 - osd.17 submits 
> >>> >>>>>>> >> >> >>> >>>>>>>> I/O to osd.13
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:06.557305 - osd.17 submits 
> >>> >>>>>>> >> >> >>> >>>>>>>> I/O to osd.16
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:06.573711 - osd.16 gets I/O 
> >>> >>>>>>> >> >> >>> >>>>>>>> from osd.17
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:06.595716 - osd.17 gets 
> >>> >>>>>>> >> >> >>> >>>>>>>> ondisk result=0 from osd.16
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:06.640631 - osd.16 reports to 
> >>> >>>>>>> >> >> >>> >>>>>>>> osd.17 ondisk result=0
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:36.926691 - osd.17 reports 
> >>> >>>>>>> >> >> >>> >>>>>>>> slow I/O > 30.439150 sec
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:59.790591 - osd.13 gets I/O 
> >>> >>>>>>> >> >> >>> >>>>>>>> from osd.17
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:59.812405 - osd.17 gets 
> >>> >>>>>>> >> >> >>> >>>>>>>> ondisk result=0 from osd.13
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:56:02.941602 - osd.13 reports to 
> >>> >>>>>>> >> >> >>> >>>>>>>> osd.17 ondisk result=0
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> In the logs I can see that osd.17 dispatches 
> >>> >>>>>>> >> >> >>> >>>>>>>> the I/O to osd.13 and
> >>> >>>>>>> >> >> >>> >>>>>>>> osd.16 almost silmutaniously. osd.16 seems to 
> >>> >>>>>>> >> >> >>> >>>>>>>> get the I/O right away,
> >>> >>>>>>> >> >> >>> >>>>>>>> but for some reason osd.13 doesn't get the 
> >>> >>>>>>> >> >> >>> >>>>>>>> message until 53 seconds
> >>> >>>>>>> >> >> >>> >>>>>>>> later. osd.17 seems happy to just wait and 
> >>> >>>>>>> >> >> >>> >>>>>>>> doesn't resend the data
> >>> >>>>>>> >> >> >>> >>>>>>>> (well, I'm not 100% sure how to tell which 
> >>> >>>>>>> >> >> >>> >>>>>>>> entries are the actual data
> >>> >>>>>>> >> >> >>> >>>>>>>> transfer).
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> It looks like osd.17 is receiving responses to 
> >>> >>>>>>> >> >> >>> >>>>>>>> start the communication
> >>> >>>>>>> >> >> >>> >>>>>>>> with osd.13, but the op is not acknowledged 
> >>> >>>>>>> >> >> >>> >>>>>>>> until almost a minute
> >>> >>>>>>> >> >> >>> >>>>>>>> later. To me it seems that the message is 
> >>> >>>>>>> >> >> >>> >>>>>>>> getting received but not
> >>> >>>>>>> >> >> >>> >>>>>>>> passed to another thread right away or 
> >>> >>>>>>> >> >> >>> >>>>>>>> something. This test was done
> >>> >>>>>>> >> >> >>> >>>>>>>> with an idle cluster, a single fio client (rbd 
> >>> >>>>>>> >> >> >>> >>>>>>>> engine) with a single
> >>> >>>>>>> >> >> >>> >>>>>>>> thread.
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> The OSD servers are almost 100% idle during 
> >>> >>>>>>> >> >> >>> >>>>>>>> these blocked I/O
> >>> >>>>>>> >> >> >>> >>>>>>>> requests. I think I'm at the end of my 
> >>> >>>>>>> >> >> >>> >>>>>>>> troubleshooting, so I can use
> >>> >>>>>>> >> >> >>> >>>>>>>> some help.
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> Single Test started about
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:52:36
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:36.926680 osd.17 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.14:6800/16726 56 :
> >>> >>>>>>> >> >> >>> >>>>>>>> cluster [WRN] 1 slow requests, 1 included 
> >>> >>>>>>> >> >> >>> >>>>>>>> below; oldest blocked for >
> >>> >>>>>>> >> >> >>> >>>>>>>> 30.439150 secs
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:36.926699 osd.17 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.14:6800/16726 57 :
> >>> >>>>>>> >> >> >>> >>>>>>>> cluster [WRN] slow request 30.439150 seconds 
> >>> >>>>>>> >> >> >>> >>>>>>>> old, received at
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:06.487451:
> >>> >>>>>>> >> >> >>> >>>>>>>>   osd_op(client.250874.0:1388 
> >>> >>>>>>> >> >> >>> >>>>>>>> rbd_data.3380e2ae8944a.0000000000000545
> >>> >>>>>>> >> >> >>> >>>>>>>> [set-alloc-hint object_size 4194304 write_size 
> >>> >>>>>>> >> >> >>> >>>>>>>> 4194304,write
> >>> >>>>>>> >> >> >>> >>>>>>>> 0~4194304] 8.bbf3e8ff 
> >>> >>>>>>> >> >> >>> >>>>>>>> ack+ondisk+write+known_if_redirected e56785)
> >>> >>>>>>> >> >> >>> >>>>>>>>   currently waiting for subops from 13,16
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:36.697904 osd.16 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.13:6800/29410 7 : cluster
> >>> >>>>>>> >> >> >>> >>>>>>>> [WRN] 2 slow requests, 2 included below; 
> >>> >>>>>>> >> >> >>> >>>>>>>> oldest blocked for >
> >>> >>>>>>> >> >> >>> >>>>>>>> 30.379680 secs
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:36.697918 osd.16 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.13:6800/29410 8 : cluster
> >>> >>>>>>> >> >> >>> >>>>>>>> [WRN] slow request 30.291520 seconds old, 
> >>> >>>>>>> >> >> >>> >>>>>>>> received at 2015-09-22
> >>> >>>>>>> >> >> >>> >>>>>>>> 12:55:06.406303:
> >>> >>>>>>> >> >> >>> >>>>>>>>   osd_op(client.250874.0:1384 
> >>> >>>>>>> >> >> >>> >>>>>>>> rbd_data.3380e2ae8944a.0000000000000541
> >>> >>>>>>> >> >> >>> >>>>>>>> [set-alloc-hint object_size 4194304 write_size 
> >>> >>>>>>> >> >> >>> >>>>>>>> 4194304,write
> >>> >>>>>>> >> >> >>> >>>>>>>> 0~4194304] 8.5fb2123f 
> >>> >>>>>>> >> >> >>> >>>>>>>> ack+ondisk+write+known_if_redirected e56785)
> >>> >>>>>>> >> >> >>> >>>>>>>>   currently waiting for subops from 13,17
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:55:36.697927 osd.16 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.13:6800/29410 9 : cluster
> >>> >>>>>>> >> >> >>> >>>>>>>> [WRN] slow request 30.379680 seconds old, 
> >>> >>>>>>> >> >> >>> >>>>>>>> received at 2015-09-22
> >>> >>>>>>> >> >> >>> >>>>>>>> 12:55:06.318144:
> >>> >>>>>>> >> >> >>> >>>>>>>>   osd_op(client.250874.0:1382 
> >>> >>>>>>> >> >> >>> >>>>>>>> rbd_data.3380e2ae8944a.000000000000053f
> >>> >>>>>>> >> >> >>> >>>>>>>> [set-alloc-hint object_size 4194304 write_size 
> >>> >>>>>>> >> >> >>> >>>>>>>> 4194304,write
> >>> >>>>>>> >> >> >>> >>>>>>>> 0~4194304] 8.312e69ca 
> >>> >>>>>>> >> >> >>> >>>>>>>> ack+ondisk+write+known_if_redirected e56785)
> >>> >>>>>>> >> >> >>> >>>>>>>>   currently waiting for subops from 13,14
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:58:03.998275 osd.13 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.12:6804/4574 130 :
> >>> >>>>>>> >> >> >>> >>>>>>>> cluster [WRN] 1 slow requests, 1 included 
> >>> >>>>>>> >> >> >>> >>>>>>>> below; oldest blocked for >
> >>> >>>>>>> >> >> >>> >>>>>>>> 30.954212 secs
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:58:03.998286 osd.13 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.12:6804/4574 131 :
> >>> >>>>>>> >> >> >>> >>>>>>>> cluster [WRN] slow request 30.954212 seconds 
> >>> >>>>>>> >> >> >>> >>>>>>>> old, received at
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:57:33.044003:
> >>> >>>>>>> >> >> >>> >>>>>>>>   osd_op(client.250874.0:1873 
> >>> >>>>>>> >> >> >>> >>>>>>>> rbd_data.3380e2ae8944a.000000000000070d
> >>> >>>>>>> >> >> >>> >>>>>>>> [set-alloc-hint object_size 4194304 write_size 
> >>> >>>>>>> >> >> >>> >>>>>>>> 4194304,write
> >>> >>>>>>> >> >> >>> >>>>>>>> 0~4194304] 8.e69870d4 
> >>> >>>>>>> >> >> >>> >>>>>>>> ack+ondisk+write+known_if_redirected e56785)
> >>> >>>>>>> >> >> >>> >>>>>>>>   currently waiting for subops from 16,17
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:58:03.759826 osd.16 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.13:6800/29410 10 :
> >>> >>>>>>> >> >> >>> >>>>>>>> cluster [WRN] 1 slow requests, 1 included 
> >>> >>>>>>> >> >> >>> >>>>>>>> below; oldest blocked for >
> >>> >>>>>>> >> >> >>> >>>>>>>> 30.704367 secs
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:58:03.759840 osd.16 
> >>> >>>>>>> >> >> >>> >>>>>>>> 192.168.55.13:6800/29410 11 :
> >>> >>>>>>> >> >> >>> >>>>>>>> cluster [WRN] slow request 30.704367 seconds 
> >>> >>>>>>> >> >> >>> >>>>>>>> old, received at
> >>> >>>>>>> >> >> >>> >>>>>>>> 2015-09-22 12:57:33.055404:
> >>> >>>>>>> >> >> >>> >>>>>>>>   osd_op(client.250874.0:1874 
> >>> >>>>>>> >> >> >>> >>>>>>>> rbd_data.3380e2ae8944a.000000000000070e
> >>> >>>>>>> >> >> >>> >>>>>>>> [set-alloc-hint object_size 4194304 write_size 
> >>> >>>>>>> >> >> >>> >>>>>>>> 4194304,write
> >>> >>>>>>> >> >> >>> >>>>>>>> 0~4194304] 8.f7635819 
> >>> >>>>>>> >> >> >>> >>>>>>>> ack+ondisk+write+known_if_redirected e56785)
> >>> >>>>>>> >> >> >>> >>>>>>>>   currently waiting for subops from 13,17
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> Server   IP addr              OSD
> >>> >>>>>>> >> >> >>> >>>>>>>> nodev  - 192.168.55.11 - 12
> >>> >>>>>>> >> >> >>> >>>>>>>> nodew  - 192.168.55.12 - 13
> >>> >>>>>>> >> >> >>> >>>>>>>> nodex  - 192.168.55.13 - 16
> >>> >>>>>>> >> >> >>> >>>>>>>> nodey  - 192.168.55.14 - 17
> >>> >>>>>>> >> >> >>> >>>>>>>> nodez  - 192.168.55.15 - 14
> >>> >>>>>>> >> >> >>> >>>>>>>> nodezz - 192.168.55.16 - 15
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> fio job:
> >>> >>>>>>> >> >> >>> >>>>>>>> [rbd-test]
> >>> >>>>>>> >> >> >>> >>>>>>>> readwrite=write
> >>> >>>>>>> >> >> >>> >>>>>>>> blocksize=4M
> >>> >>>>>>> >> >> >>> >>>>>>>> #runtime=60
> >>> >>>>>>> >> >> >>> >>>>>>>> name=rbd-test
> >>> >>>>>>> >> >> >>> >>>>>>>> #readwrite=randwrite
> >>> >>>>>>> >> >> >>> >>>>>>>> #bssplit=4k/85:32k/11:512/3:1m/1,4k/89:32k/10:512k/1
> >>> >>>>>>> >> >> >>> >>>>>>>> #rwmixread=72
> >>> >>>>>>> >> >> >>> >>>>>>>> #norandommap
> >>> >>>>>>> >> >> >>> >>>>>>>> #size=1T
> >>> >>>>>>> >> >> >>> >>>>>>>> #blocksize=4k
> >>> >>>>>>> >> >> >>> >>>>>>>> ioengine=rbd
> >>> >>>>>>> >> >> >>> >>>>>>>> rbdname=test2
> >>> >>>>>>> >> >> >>> >>>>>>>> pool=rbd
> >>> >>>>>>> >> >> >>> >>>>>>>> clientname=admin
> >>> >>>>>>> >> >> >>> >>>>>>>> iodepth=8
> >>> >>>>>>> >> >> >>> >>>>>>>> #numjobs=4
> >>> >>>>>>> >> >> >>> >>>>>>>> #thread
> >>> >>>>>>> >> >> >>> >>>>>>>> #group_reporting
> >>> >>>>>>> >> >> >>> >>>>>>>> #time_based
> >>> >>>>>>> >> >> >>> >>>>>>>> #direct=1
> >>> >>>>>>> >> >> >>> >>>>>>>> #ramp_time=60
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> Thanks,
> >>> >>>>>>> >> >> >>> >>>>>>>> -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> >>>>>>>> Version: Mailvelope v1.1.0
> >>> >>>>>>> >> >> >>> >>>>>>>> Comment: https://www.mailvelope.com
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> wsFcBAEBCAAQBQJWAcaKCRDmVDuy+mK58QAAPMsQAKBnS94fwuw0OqpPU3/z
> >>> >>>>>>> >> >> >>> >>>>>>>> tL8Z6TVRxrNigf721+2ClIu4LIH71bupDc3DgrrysQmmqGuvEMn68spmasWu
> >>> >>>>>>> >> >> >>> >>>>>>>> h9I/CqqgRpHqe4lUVoUEjyWA9/6Dbb6NiHSdpJ6p5jpGc8kZCvNS+ocDgFOl
> >>> >>>>>>> >> >> >>> >>>>>>>> 903i0M0E9eEMeci5O/hrMrx1FG8SN2LS8nI261aNHMOwQK0bw8wWiCJEvqVB
> >>> >>>>>>> >> >> >>> >>>>>>>> sz1/+jK1BJoeIYfaT9HfUXBAvfo/W3tY/vj9KbJuZJ5AMpeYPvEHu/LAr1N7
> >>> >>>>>>> >> >> >>> >>>>>>>> FzzUc7a6EMlaxmSd0ML49JbV0cY9BMDjfrkKEQNKlzszlEHm3iif98QtsxbF
> >>> >>>>>>> >> >> >>> >>>>>>>> pPJ0hZ0G53BY3k976OWVMFm3WFRWUVOb/oiLF8H6PCm59b4LBNAg6iPNH1AI
> >>> >>>>>>> >> >> >>> >>>>>>>> 5XhEcPpg06M03vqUaIiY9P1kQlvnn0yCXf82IUEgmg///vhxDsHWmcwClLEn
> >>> >>>>>>> >> >> >>> >>>>>>>> B0VszouStTzlMYnc/2vlUiI4gFVeilWLMW00VGTWV+7V1oIzIYvWHyl2QpBq
> >>> >>>>>>> >> >> >>> >>>>>>>> 4/ZwVjQ43qLfuDTS4o+IJ4ztOMd26vIv6Mn6WVwKCjoCXJc8ajywR9Dy+6lL
> >>> >>>>>>> >> >> >>> >>>>>>>> o8oJ+tn7hMc9Qy1iBhu3/QIP4WCsUf9RVeu60oahNEpde89qW32S9CZlrJDO
> >>> >>>>>>> >> >> >>> >>>>>>>> gf4iTryRjkAhdmZIj9JiaE8jQ6dvN817D9cqs/CXKV9vhzYoM7p5YWHghBKB
> >>> >>>>>>> >> >> >>> >>>>>>>> J3hS
> >>> >>>>>>> >> >> >>> >>>>>>>> =0J7F
> >>> >>>>>>> >> >> >>> >>>>>>>> -----END PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> >>>>>>>> ----------------
> >>> >>>>>>> >> >> >>> >>>>>>>> Robert LeBlanc
> >>> >>>>>>> >> >> >>> >>>>>>>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E 
> >>> >>>>>>> >> >> >>> >>>>>>>> E654 3BB2 FA62 B9F1
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> On Tue, Sep 22, 2015 at 8:31 AM, Gregory 
> >>> >>>>>>> >> >> >>> >>>>>>>> Farnum  wrote:
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>> On Tue, Sep 22, 2015 at 7:24 AM, Robert 
> >>> >>>>>>> >> >> >>> >>>>>>>>> LeBlanc  wrote:
> >>> >>>>>>> >> >> >>> >>>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>> >>>>>>> >> >> >>> >>>>>>>>>> Hash: SHA256
> >>> >>>>>>> >> >> >>> >>>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>> Is there some way to tell in the logs that 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> this is happening?
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>> You can search for the (mangled) name 
> >>> >>>>>>> >> >> >>> >>>>>>>>> _split_collection
> >>> >>>>>>> >> >> >>> >>>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>> I'm not
> >>> >>>>>>> >> >> >>> >>>>>>>>>> seeing much I/O, CPU usage during these 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> times. Is there some way to
> >>> >>>>>>> >> >> >>> >>>>>>>>>> prevent the splitting? Is there a negative 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> side effect to doing so?
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>> Bump up the split and merge thresholds. You 
> >>> >>>>>>> >> >> >>> >>>>>>>>> can search the list for
> >>> >>>>>>> >> >> >>> >>>>>>>>> this, it was discussed not too long ago.
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>> We've had I/O block for over 900 seconds and 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> as soon as the sessions
> >>> >>>>>>> >> >> >>> >>>>>>>>>> are aborted, they are reestablished and 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> complete immediately.
> >>> >>>>>>> >> >> >>> >>>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>> The fio test is just a seq write, starting 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> it over (rewriting from
> >>> >>>>>>> >> >> >>> >>>>>>>>>> the
> >>> >>>>>>> >> >> >>> >>>>>>>>>> beginning) is still causing the issue. I was 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> suspect that it is not
> >>> >>>>>>> >> >> >>> >>>>>>>>>> having to create new file and therefore 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> split collections. This is
> >>> >>>>>>> >> >> >>> >>>>>>>>>> on
> >>> >>>>>>> >> >> >>> >>>>>>>>>> my test cluster with no other load.
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>> Hmm, that does make it seem less likely if 
> >>> >>>>>>> >> >> >>> >>>>>>>>> you're really not creating
> >>> >>>>>>> >> >> >>> >>>>>>>>> new objects, if you're actually running fio 
> >>> >>>>>>> >> >> >>> >>>>>>>>> in such a way that it's
> >>> >>>>>>> >> >> >>> >>>>>>>>> not allocating new FS blocks (this is 
> >>> >>>>>>> >> >> >>> >>>>>>>>> probably hard to set up?).
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>> I'll be doing a lot of testing today. Which 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> log options and depths
> >>> >>>>>>> >> >> >>> >>>>>>>>>> would be the most helpful for tracking this 
> >>> >>>>>>> >> >> >>> >>>>>>>>>> issue down?
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>>> If you want to go log diving "debug osd = 
> >>> >>>>>>> >> >> >>> >>>>>>>>> 20", "debug filestore =
> >>> >>>>>>> >> >> >>> >>>>>>>>> 20",
> >>> >>>>>>> >> >> >>> >>>>>>>>> "debug ms = 1" are what the OSD guys like to 
> >>> >>>>>>> >> >> >>> >>>>>>>>> see. That should spit
> >>> >>>>>>> >> >> >>> >>>>>>>>> out
> >>> >>>>>>> >> >> >>> >>>>>>>>> everything you need to track exactly what 
> >>> >>>>>>> >> >> >>> >>>>>>>>> each Op is doing.
> >>> >>>>>>> >> >> >>> >>>>>>>>> -Greg
> >>> >>>>>>> >> >> >>> >>>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>> --
> >>> >>>>>>> >> >> >>> >>>>>>>> To unsubscribe from this list: send the line 
> >>> >>>>>>> >> >> >>> >>>>>>>> "unsubscribe ceph-devel"
> >>> >>>>>>> >> >> >>> >>>>>>>> in
> >>> >>>>>>> >> >> >>> >>>>>>>> the body of a message to 
> >>> >>>>>>> >> >> >>> >>>>>>>> majord...@vger.kernel.org
> >>> >>>>>>> >> >> >>> >>>>>>>> More majordomo info at  
> >>> >>>>>>> >> >> >>> >>>>>>>> http://vger.kernel.org/majordomo-info.html
> >>> >>>>>>> >> >> >>> >>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>>>
> >>> >>>>>>> >> >> >>> >>>>>
> >>> >>>>>>> >> >> >>> >>>>> -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> >>>>> Version: Mailvelope v1.1.0
> >>> >>>>>>> >> >> >>> >>>>> Comment: https://www.mailvelope.com
> >>> >>>>>>> >> >> >>> >>>>>
> >>> >>>>>>> >> >> >>> >>>>> wsFcBAEBCAAQBQJWAdMSCRDmVDuy+mK58QAAoEgP/AqpH7i1BLpoz6fTlfWG
> >>> >>>>>>> >> >> >>> >>>>> a6swvF8xvsyR15PDiPINYT0N7MgoikikGrMmhWpJ6utEr1XPW0MPFgzvNIsf
> >>> >>>>>>> >> >> >>> >>>>> a1eMtNzyww4rAo6JCq6BtjmUsSKmOrBNhRNr6It9v4Nv+biqZHkiY8x/rRtV
> >>> >>>>>>> >> >> >>> >>>>> s9z0cv3Q9Wqa6y/zKZg3H1XtbtUAx0r/DUwzSsP3omupZgNyaKkCgdkil9Vc
> >>> >>>>>>> >> >> >>> >>>>> iyzBxFZU4+qXNT2FBG4dYDjxSHQv4psjvKR3AWXSN4yEn286KyMDjFrsDY5B
> >>> >>>>>>> >> >> >>> >>>>> izS3h603QPoErqsUQngDE8COcaTAHHrV7gNJTikmGoNW6oQBjFq/z/zindTz
> >>> >>>>>>> >> >> >>> >>>>> caXshVQQ+OTLo/qzJM8QPswh0TGU74SVbDkTq+eTOb5pBhQbp+42Pkkqh7jj
> >>> >>>>>>> >> >> >>> >>>>> efyyYgDzpB1WrWRbUlWMNqmnjq7DT3lnAtuHyKbkwVs8x3JMPEiCl6PBvJbx
> >>> >>>>>>> >> >> >>> >>>>> GnNSCqgDJrpb4fHQ2iqfQeh8Ai6AL1C1Ai19RZPrAUhpDW0/DbUvuoKSR8m7
> >>> >>>>>>> >> >> >>> >>>>> glYYuH3hpy+oPYRhFcHm2fpNJ3u9npyk2Dai9RpzQ+mWmp3xi7becYmL482H
> >>> >>>>>>> >> >> >>> >>>>> +WyvLeY+8AiJQDpA0CdD8KeSlOC9bw5TPmihAIn9dVTJ1O2RlapCLqL3YAJg
> >>> >>>>>>> >> >> >>> >>>>> pGyDs8ercTEJLmvEyElj5XWh5DarsGscd2LELNS/UpyuYurbPcyPKUQ0uPjp
> >>> >>>>>>> >> >> >>> >>>>> gcZm
> >>> >>>>>>> >> >> >>> >>>>> =CjwB
> >>> >>>>>>> >> >> >>> >>>>> -----END PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>> --
> >>> >>>>>>> >> >> >>> >>>> To unsubscribe from this list: send the line 
> >>> >>>>>>> >> >> >>> >>>> "unsubscribe ceph-devel" in
> >>> >>>>>>> >> >> >>> >>>> the body of a message to majord...@vger.kernel.org
> >>> >>>>>>> >> >> >>> >>>> More majordomo info at  
> >>> >>>>>>> >> >> >>> >>>> http://vger.kernel.org/majordomo-info.html
> >>> >>>>>>> >> >> >>> >>>>
> >>> >>>>>>> >> >> >>> >>>
> >>> >>>>>>> >> >> >>> >>
> >>> >>>>>>> >> >> >>> >> -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> >> Version: Mailvelope v1.1.0
> >>> >>>>>>> >> >> >>> >> Comment: https://www.mailvelope.com
> >>> >>>>>>> >> >> >>> >>
> >>> >>>>>>> >> >> >>> >> wsFcBAEBCAAQBQJWAv3QCRDmVDuy+mK58QAABr4QAJcQj8zjl606aMdkmQG7
> >>> >>>>>>> >> >> >>> >> S46iMXVav/Tv2os9GCUsQmMPx2u1w3/WmPfjByd6Divczfo0JLDDqrbsqre2
> >>> >>>>>>> >> >> >>> >> lq0GNK6e8fq6FXHhPpnL+t4uFV4UZ289cma3yklRqEBDXWHlP59Hu7VpxC5l
> >>> >>>>>>> >> >> >>> >> 0MIcCg4wM5VM/LkrfcMven5em5CnjyFJYbActGzw9043rZoyUwCM+eL7sotl
> >>> >>>>>>> >> >> >>> >> JYHMcNWnqwdt8TLFDhUfVGiAQyV8/6E33CuCNUEuFGdtiBKzs9IZadOI8Ce0
> >>> >>>>>>> >> >> >>> >> dod2DQNyFSvomqNq6t0DuTCSA+pT8uuks2O0NcrHjoqwIWVkxQGPYlpbpckf
> >>> >>>>>>> >> >> >>> >> nxQdVM7vkqapVeQ0qUZx43Db9A5wDTC3PaEfVJZPZzWsSDjh9z7o6qHs3Kvp
> >>> >>>>>>> >> >> >>> >> krfyS+dJaZ3tOYAP1VFDfasj06sOTFu3mfGYToKA75zz5HN7QZ13Zau/qhDu
> >>> >>>>>>> >> >> >>> >> FHxsgk4oIXJsjj22LiSpoiigH5Ls+aVqtIbg8/vWp+EO6pK1fovEtJVeGAfE
> >>> >>>>>>> >> >> >>> >> tLOdxfJJLVjMCAScFG9BRl1ePPLeptivKV0v9ruWsTpn+Q96VtqAR5GQCkYE
> >>> >>>>>>> >> >> >>> >> hFrlxM+oIzHeArhhiIxSPCYLlnzxoD5IYXmTrWUYBCGvlY1mrI3j80mZ4VTj
> >>> >>>>>>> >> >> >>> >> BErsSlqnjUyFKmaI7YNKyARCloMroz3wqdy/wpg/63Io62nmh5IyY+WO8hPo
> >>> >>>>>>> >> >> >>> >> ae22
> >>> >>>>>>> >> >> >>> >> =AX+L
> >>> >>>>>>> >> >> >>> >> -----END PGP SIGNATURE-----
> >>> >>>>>>> >> >> >>> _______________________________________________
> >>> >>>>>>> >> >> >>> ceph-users mailing list
> >>> >>>>>>> >> >> >>> ceph-users@lists.ceph.com
> >>> >>>>>>> >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> >>>
> >>> >>>>>>> >> >> _______________________________________________
> >>> >>>>>>> >> >> ceph-users mailing list
> >>> >>>>>>> >> >> ceph-users@lists.ceph.com
> >>> >>>>>>> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>> >>>>>>> >> >>
> >>> >>>>>>> >> >>
> >>> >>>>>>> >> --
> >>> >>>>>>> >> To unsubscribe from this list: send the line "unsubscribe 
> >>> >>>>>>> >> ceph-devel" in
> >>> >>>>>>> >> the body of a message to majord...@vger.kernel.org
> >>> >>>>>>> >> More majordomo info at  
> >>> >>>>>>> >> http://vger.kernel.org/majordomo-info.html
> >>> >>>>>>> >>
> >>> >>>>>>> >>
> >>> >>>>>>>
> >>> >>>>>>> -----BEGIN PGP SIGNATURE-----
> >>> >>>>>>> Version: Mailvelope v1.2.0
> >>> >>>>>>> Comment: https://www.mailvelope.com
> >>> >>>>>>>
> >>> >>>>>>> wsFcBAEBCAAQBQJWFBoOCRDmVDuy+mK58QAA7oYP/1yVPx66DovoUJiSDunA
> >>> >>>>>>> NjIXWnKzx77aQMDwueZ0woC8PvgsX4JpLVH90Gh1MOJWyt2L4Qp+n60loSiI
> >>> >>>>>>> Q5xU1NMYiup8YPlHqyslBxtqCPhcN1R8XhxN212R4uyVBIgjulkkEFiiQf8R
> >>> >>>>>>> 5Uq5rDy+Vqmbla3enekV9vpAJQhVdfxvhdnN9/tSC3I5JZm+6VW9PGmwvTL4
> >>> >>>>>>> HK5UIz8luvtBWCWXYm2m7ZCUKYq0oWfdVDGEpEV473yyYwoVyvTBFuNNNbpu
> >>> >>>>>>> kdxZ422Ztv2yj5phIQgU88Q/W5NY0awW25+16AMZNb6zCbF06hvQ9SjpydGu
> >>> >>>>>>> 6vokj3uCOImMZpdJlyMuj6IjIkB27bnJer7zVLM3tDzftPzwT8ia8M3LvMWE
> >>> >>>>>>> sD9Dl2jx5EdFZYPMxoHF4WnD4SQtUxr+cpcI/Ij96RfXz1cMbMbVdZbWXkfz
> >>> >>>>>>> gEY46SXuM8yMi7wzJHwd4kI9q8A+ZZDpsDuTyavMr1rqZX61H+Gzc3rNI7lc
> >>> >>>>>>> lkJ63hfYMPCdYggnUT8mAF+cwXxq66SclwbmBYM8lbrEPuuTZzZp7veLJr5g
> >>> >>>>>>> /PO1abPcJVYq5ZP7i1iELEac6WvDWcJgImvkF+JZAN57URNpdJA03KsVkIt7
> >>> >>>>>>> H5n1Y8zUv7QcVMwHo/Os30vfiPmUHxg9DFbtUU8otpcf3g+udDggWHeuiZiG
> >>> >>>>>>> 6Kfk
> >>> >>>>>>> =/gR6
> >>> >>>>>>> -----END PGP SIGNATURE-----
> >>> >>>>>>> --
> >>> >>>>>>> To unsubscribe from this list: send the line "unsubscribe 
> >>> >>>>>>> ceph-devel" in
> >>> >>>>>>> the body of a message to majord...@vger.kernel.org
> >>> >>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> >>>>>>>
> >>> >>>>>>>
> >>> >>>>>
> >>> >>>>> -----BEGIN PGP SIGNATURE-----
> >>> >>>>> Version: Mailvelope v1.2.0
> >>> >>>>> Comment: https://www.mailvelope.com
> >>> >>>>>
> >>> >>>>> wsFcBAEBCAAQBQJWFChuCRDmVDuy+mK58QAAfNsQAMGNu925hGNsCTuY4X7V
> >>> >>>>> x71rdicFIn41I12KYtmhWl0U/V9GpUwLkOAKzeAcQiK2FgBBYRle0pANqE2K
> >>> >>>>> Thf4YBJ5oEXZ72WOB14jaggiQkZwiTZLo6c69JLZADaM5NEXD/2mM77HyVLN
> >>> >>>>> SP5v7FSqtnlzA53aZ7hUZn5r20VfOl/peOJGJz7C393hy3gBjr+P4LKsLE2L
> >>> >>>>> QO0lNj4mJZVnVXbxqJp9Q8xn86vmfXK2sofqbAv2wjkT2C8gM9DkgLF+UJjc
> >>> >>>>> mCSL9EUDFHD82BGsWzvYYFci686bIUC9IxJXKLORYKjzH3ueGHhiK3/apIi4
> >>> >>>>> 7DA0159nObAVNNz8AvvJnnjK94KrfcqpD3inFT7++WiNWTWbYljC7eukEM8L
> >>> >>>>> QyrcMnbuomjT87I9wB9zNwa/Pt+AepdwSf7qAv1VVYrop3nJxp8bPVCzvkrr
> >>> >>>>> MV/gxv3esOF68nOoQ9yt8DyHFihpg0nqSPjY3xDS7qZ05u3jnWN4rgkNxmyR
> >>> >>>>> rOpwjVLUINAkVjfAM2FL2sW6wX1tKPd947CgMrAgcX0ChwZ1xYzt6xdS0p+R
> >>> >>>>> gciSgw7nfCvwFmpou0DnqUdTN3K0zvM9zDhQ/b9u7JW3CEZLJXMoi99C4n3g
> >>> >>>>> RfilE0rvScnx7uTI7mo94Pwy0MYFdGw04sNtFjwjIhRFPSsMUu+NSHDJe26U
> >>> >>>>> JFPi
> >>> >>>>> =ofgq
> >>> >>>>> -----END PGP SIGNATURE-----
> >>> >>>>
> >>> >>>> -----BEGIN PGP SIGNATURE-----
> >>> >>>> Version: Mailvelope v1.2.0
> >>> >>>> Comment: https://www.mailvelope.com
> >>> >>>>
> >>> >>>> wsFcBAEBCAAQBQJWFDDOCRDmVDuy+mK58QAA0kUP/1rfRQa5Us9b/VCvKrhk
> >>> >>>> BYrde1/FBybKBVXsuXVU8Dq124A1e4L682AhmQPUeVP8PQLoqS/VFSl0h7i6
> >>> >>>> 28AzydDaBTTjnrp6ZzVbtmKtm8WhmtSTFvWTlu/yJmRXAht9YozmFCByBfIY
> >>> >>>> GYvOhZzjvbxBKfwnwq97QkS7xfY2tss/BmaOvSVTX7naYaOF+HRwZMSt+BF4
> >>> >>>> 9vg9BLSL3Aic0BnvdM64TWkDaHp/3gwGSmyMn8Q2Sa9CqUTddKQx2HXN6doo
> >>> >>>> gIyxCj+dIw2Pt73u2NoiYv8ZhTuS3QYM4n0rRBxj8Wr/EeNwGAOwdDSgbOxf
> >>> >>>> OvDyozzmCpQyW3h/nkdQJW5mWsJmyDIiGxHDdUn7Vgemg+Bbod0ACdoJiwct
> >>> >>>> /BIRVQe2Ee1nZQFoKBOhvaWO6+ePJR7CVfLjMkZBTzKZBjt2tfkq17G5KTdS
> >>> >>>> EsehvG/+vfFJkANL5Xh6eo9ptlHbFW8I/44pvUtGi2JwsN487l56XR9DqEKM
> >>> >>>> 7Cmj9Ox205YxjqcBjhWIJQTok99lvrhDX9d7HHxIeTcmouvqPz4LTcCySRtC
> >>> >>>> xE/GcEGAAYWGPTwf9u8ULm9Rh2Z90OnKpqtCtuuWiwRRL9VU/tLlvqmHvEZM
> >>> >>>> 73qhiLQZka5I72B2SAEtJnDt2sX3NJ4unvH4zWKLRFTTm4M0qk6xUL1JfqNz
> >>> >>>> JYNo
> >>> >>>> =msX2
> >>> >>>> -----END PGP SIGNATURE-----
> >>> >>
> >>> >> -----BEGIN PGP SIGNATURE-----
> >>> >> Version: Mailvelope v1.2.0
> >>> >> Comment: https://www.mailvelope.com
> >>> >>
> >>> >> wsFcBAEBCAAQBQJWFXGPCRDmVDuy+mK58QAAx38P/1sn6TA8hH+F2kd1A2Pq
> >>> >> IU2cg1pFcH+kw21G8VO+BavfBaBoSETHEEuMXg5SszTIcL/HyziBLJos0C0j
> >>> >> Vu9I0/YtblQ15enzFqKFPosdc7qij9DPJxXRkx41sJZsxvSVky+URcPpcKk6
> >>> >> w8Lwuq9IupesQ19ZeJkCEWFVhKz/i2E9/VXfylBgFVlkICD+5pfx6/Aq7nCP
> >>> >> 4gboyha07zpPlDqoA7xgT+6v2zlYC80saGcA1m2XaAUdPF/17l6Mq9+Glv7E
> >>> >> 3KeUf7jmMTJQRGBZSInFgUpPwUQKvF5OSGb3YQlzofUy5Es+wH3ccqZ+mlIY
> >>> >> szuBLAtN6zhFFPCs6016hiragiUhLk97PItXaKdDJKecuyRdShlJrXJmtX+j
> >>> >> NdM14TkBPTiLtAd/IZEEhIIpdvQH8YSl3LnEZ5gywggaY4Pk3JLFIJPgLpEb
> >>> >> T8hJnuiaQaYxERQ0nRoBL4LAXARseSrOuVt2EAD50Yb/5JEwB9FQlN758rb1
> >>> >> AE/xhpK6d53+RlkPODKxXx816hXvDP6NADaC78XGmx+A4FfepdxBijGBsmOQ
> >>> >> 7SxAZe469K0E6EAfClc664VzwuvBEZjwTg1eK5Z6VS/FDTH/RxTKeFhlbUIT
> >>> >> XpezlP7XZ1/YRrJ/Eg7nb1Dv0MYQdu18tQ6QBv+C1ZsmxYLlHlcf6BZ3gNar
> >>> >> rZW5
> >>> >> =dKn9
> >>> >> -----END PGP SIGNATURE-----
> >>>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > Best Regards,
> >
> > Wheat
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Potential OSD deadlock?

Reply via email to