On Tue, Feb 12, 2013 at 06:28:15PM +1100, Chris Dunlop wrote:
> Hi,
>
> What are likely causes for "slow requests" and "monclient: hunting for new
> mon" messages? E.g.:
>
> 2013-02-12 16:27:07.318943 7f9c0bc16700 0 monclient: hunting for new mon
> ...
> 2013-02-12 16:27:45.892314 7f9c13c26700 0 log [WRN] : 6 slow requests, 6
> included below; oldest blocked for > 30.383883 secs
> 2013-02-12 16:27:45.892323 7f9c13c26700 0 log [WRN] : slow request 30.383883
> seconds old, received at 2013-02-12 16:27:15.508374:
> osd_op(client.9821.0:122242 rb.0.209f.74b0dc51.000000000120 [write
> 921600~4096] 2.981cf6bc) v4 currently no flag points reached
> 2013-02-12 16:27:45.892328 7f9c13c26700 0 log [WRN] : slow request 30.383782
> seconds old, received at 2013-02-12 16:27:15.508475:
> osd_op(client.9821.0:122243 rb.0.209f.74b0dc51.000000000120 [write
> 987136~4096] 2.981cf6bc) v4 currently no flag points reached
> 2013-02-12 16:27:45.892334 7f9c13c26700 0 log [WRN] : slow request 30.383720
> seconds old, received at 2013-02-12 16:27:15.508537:
> osd_op(client.9821.0:122244 rb.0.209f.74b0dc51.000000000120 [write
> 1036288~8192] 2.981cf6bc) v4 currently no flag points reached
> 2013-02-12 16:27:45.892338 7f9c13c26700 0 log [WRN] : slow request 30.383684
> seconds old, received at 2013-02-12 16:27:15.508573:
> osd_op(client.9821.0:122245 rb.0.209f.74b0dc51.000000000122 [write
> 1454080~4096] 2.fff29a9a) v4 currently no flag points reached
> 2013-02-12 16:27:45.892341 7f9c13c26700 0 log [WRN] : slow request 30.328986
> seconds old, received at 2013-02-12 16:27:15.563271:
> osd_op(client.9821.0:122246 rb.0.209f.74b0dc51.000000000122 [write
> 1482752~4096] 2.fff29a9a) v4 currently no flag points reached
Sorry, I forgot to add what lead me to look into this in the first place...
The b2 machine is running a number of libvirt / kvm virtuals on rbd via
librados:
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<auth username='xxxxxx'>
<secret type='ceph' uuid='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'/>
</auth>
<source protocol='rbd' name='rbd/foo:rbd_cache=1'/>
<target dev='vdb' bus='virtio'/>
<alias name='virtio-disk1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06'
function='0x0'/>
</disk>
Around the time the "slow requests" messages pop up, the virtual machines
experience
a spike in disk latency, e.g. from inside one of the virtuals:
load %user %nice %sys %iow %stl %idle dev rrqm/s wrqm/s
r/s w/s rkB/s wkB/s arq-sz aqu-sz await rwait wwait %util
2013-02-12-18:55:23 1.7 0.3 0.0 0.9 42.2 0.0 18.5 vdb 0.0 25.8
0.0 9.6 0.00 136.53 28.44 1.34 139.56 0.00 139.6 70.7
2013-02-12-18:55:38 1.5 0.9 0.0 0.4 35.8 0.0 42.8 vdb 0.0 26.6
0.0 10.8 0.00 143.20 26.52 0.46 41.98 0.00 42.0 40.0
2013-02-12-18:55:53 1.4 0.7 0.0 0.4 38.0 0.0 51.7 vdb 0.0 7.5
0.0 6.8 0.00 53.33 15.69 0.46 68.08 0.00 68.1 38.8
2013-02-12-18:56:10 2.2 0.0 0.0 0.1 8.0 0.0 15.8 vdb 0.0 2.1
0.0 0.2 0.00 9.18 78.00 0.98 4164.00 0.00 4164.0 70.6
2013-02-12-18:56:32 3.7 0.0 0.0 0.1 0.0 0.0 0.0 vdb 0.0 0.1
0.0 0.8 0.00 3.27 8.00 24.12 14519.78 0.00 14519.8 100.0
2013-02-12-18:56:47 5.1 0.5 0.0 0.6 0.0 0.0 0.0 vdb 0.0 11.5
0.0 5.4 0.00 65.87 24.40 1.60 3620.15 0.00 3620.1 88.4
2013-02-12-18:57:03 5.2 0.6 0.0 1.1 3.5 0.0 0.0 vdb 0.0 19.6
0.0 6.9 0.00 101.25 29.19 1.13 162.41 0.00 162.4 87.4
2013-02-12-18:57:20 5.2 0.6 0.0 0.9 9.7 0.0 0.0 vdb 0.0 44.6
0.0 9.7 0.00 214.82 44.27 2.41 248.22 0.00 248.2 85.3
2013-02-12-18:57:36 4.4 0.4 0.0 0.5 28.7 0.0 46.5 vdb 0.0 17.6
0.0 5.2 0.00 87.50 33.33 0.56 107.14 0.00 107.1 48.5
...corresponding to this in the b5 / osd.1 log:
2013-02-12 18:52:08.812880 7f9c0bc16700 0 monclient: hunting for new mon
2013-02-12 18:55:18.851791 7f9c0bc16700 0 monclient: hunting for new mon
2013-02-12 18:56:25.414948 7f9c13c26700 0 log [WRN] : 6 slow requests, 6
included below; oldest blocked for > 30.372124 secs
2013-02-12 18:56:25.414958 7f9c13c26700 0 log [WRN] : slow request 30.372124
seconds old, received at 2013-02-12 18:55:55.042767:
osd_op(client.9821.0:144779 rb.0.209f.74b0dc51.000000000023 [write
1593344~4096] 2.1882ddb7) v4 currently no flag points reached
...
2013-02-12 18:57:13.427008 7f9c13c26700 0 log [WRN] : slow request 40.721769
seconds old, received at 2013-02-12 18:56:32.705190:
osd_op(client.9821.0:146756 rb.0.209f.74b0dc51.000000000128 [write 819200~8192]
2.b4390173) v4 currently commit sent
2013-02-12 18:59:43.886517 7f9c0bc16700 0 monclient: hunting for new mon
2013-02-12 19:02:53.911641 7f9c0bc16700 0 monclient: hunting for new mon
Cheers,
Chris.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html