No I did not. I don't want that I can't reproduce it any longer. Stefan
Excuse my typo sent from my mobile phone. > Am 16.05.2017 um 22:54 schrieb Jason Dillaman <[email protected]>: > > It looks like it's just a ping message in that capture. > > Are you saying that you restarted OSD 46 and the problem persisted? > > On Tue, May 16, 2017 at 4:02 PM, Stefan Priebe - Profihost AG > <[email protected]> wrote: >> Hello, >> >> while reproducing the problem, objecter_requests looks like this: >> >> { >> "ops": [ >> { >> "tid": 42029, >> "pg": "5.bd9616ad", >> "osd": 46, >> "object_id": "rbd_data.e10ca56b8b4567.000000000000311c", >> "object_locator": "@5", >> "target_object_id": "rbd_data.e10ca56b8b4567.000000000000311c", >> "target_object_locator": "@5", >> "paused": 0, >> "used_replica": 0, >> "precalc_pgid": 0, >> "last_sent": "2.28854e+06s", >> "attempts": 1, >> "snapid": "head", >> "snap_context": "a07c2=[]", >> "mtime": "2017-05-16 21:53:22.0.069541s", >> "osd_ops": [ >> "delete" >> ] >> } >> ], >> "linger_ops": [ >> { >> "linger_id": 1, >> "pg": "5.5f3bd635", >> "osd": 17, >> "object_id": "rbd_header.e10ca56b8b4567", >> "object_locator": "@5", >> "target_object_id": "rbd_header.e10ca56b8b4567", >> "target_object_locator": "@5", >> "paused": 0, >> "used_replica": 0, >> "precalc_pgid": 0, >> "snapid": "head", >> "registered": "1" >> } >> ], >> "pool_ops": [], >> "pool_stat_ops": [], >> "statfs_ops": [], >> "command_ops": [] >> } >> >> Yes they've an established TCP connection. Qemu <=> osd.46. Attached is >> a pcap file of the traffic between them when it got stuck. >> >> Greets, >> Stefan >> >>> Am 16.05.2017 um 21:45 schrieb Jason Dillaman: >>> On Tue, May 16, 2017 at 3:37 PM, Stefan Priebe - Profihost AG >>> <[email protected]> wrote: >>>> We've enabled the op tracker for performance reasons while using SSD >>>> only storage ;-( >>> >>> Disabled you mean? >>> >>>> Can enable the op tracker using ceph osd tell? Than reproduce the >>>> problem. Check what has stucked again? Or should i generate an rbd log >>>> from the client? >>> >>> From a super-quick glance at the code, it looks like that isn't a >>> dynamic setting. Of course, it's possible that if you restart OSD 46 >>> to enable the op tracker, the stuck op will clear itself and the VM >>> will resume. You could attempt to generate a gcore of OSD 46 to see if >>> information on that op could be extracted via the debugger, but no >>> guarantees. >>> >>> You might want to verify that the stuck client and OSD 46 have an >>> actual established TCP connection as well before doing any further >>> actions. >>> > > > > -- > Jason
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
