Jason, Thanks for the suggestion. That seems to show it is not the OSD that got stuck:
ceph7:~$ sudo rbd -c debug/ceph.conf info app/image1
…
2017-04-24 13:13:49.761076 7f739aefc700 1 -- 192.168.206.17:0/1250293899 -->
192.168.206.13:6804/22934 -- osd_op(client.4384.0:3 1.af6f1e38
rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f737c0077f0 con
0x7f737c0064e0
…
2017-04-24 13:14:04.756328 7f73a2880700 1 -- 192.168.206.17:0/1250293899 -->
192.168.206.13:6804/22934 -- ping magic: 0 v1 -- ?+0 0x7f7374000fc0 con
0x7f737c0064e0
ceph0:~$ sudo ceph pg map 1.af6f1e38
osdmap e27 pg 1.af6f1e38 (1.38) -> up [11,16,2] acting [11,16,2]
ceph3:~$ sudo ceph daemon osd.11 ops
{
"ops": [],
"num_ops": 0
}
I repeated this a few times and it’s always the same command and same placement
group that hangs, but OSD11 has no ops (and neither do OSD16 and OSD2, although
I think that’s expected).
Is there other tracing I should do on the OSD or something more to look at on
the client?
Thanks,
Phil
> On Apr 24, 2017, at 12:39 PM, Jason Dillaman <[email protected]> wrote:
>
> On Mon, Apr 24, 2017 at 2:53 PM, Phil Lacroute
> <[email protected]> wrote:
>> 2017-04-24 11:30:57.058233 7f5512ffd700 1 -- 192.168.206.17:0/3282647735
>> --> 192.168.206.13:6804/22934 -- osd_op(client.4375.0:3 1.af6f1e38
>> rbd_header.1058238e1f29 [call rbd.get_size,call rbd.get_object_prefix] snapc
>> 0=[] ack+read+known_if_redirected e27) v7 -- ?+0 0x7f54f40077f0 con
>> 0x7f54f40064e0
>
>
> You can attempt to run "ceph daemon osd.XYZ ops" against the
> potentially stuck OSD to figure out what it's stuck doing.
>
> --
> Jason
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
