Re: [ceph-users] corrupted rbd filesystems since jewel

Stefan Priebe - Profihost AG Tue, 16 May 2017 21:25:39 -0700

No I did not. I don't want that I can't reproduce it any longer.

Stefan


Excuse my typo sent from my mobile phone.

> Am 16.05.2017 um 22:54 schrieb Jason Dillaman <[email protected]>:
> 
> It looks like it's just a ping message in that capture.
> 
> Are you saying that you restarted OSD 46 and the problem persisted?
> 
> On Tue, May 16, 2017 at 4:02 PM, Stefan Priebe - Profihost AG
> <[email protected]> wrote:
>> Hello,
>> 
>> while reproducing the problem, objecter_requests looks like this:
>> 
>> {
>>    "ops": [
>>        {
>>            "tid": 42029,
>>            "pg": "5.bd9616ad",
>>            "osd": 46,
>>            "object_id": "rbd_data.e10ca56b8b4567.000000000000311c",
>>            "object_locator": "@5",
>>            "target_object_id": "rbd_data.e10ca56b8b4567.000000000000311c",
>>            "target_object_locator": "@5",
>>            "paused": 0,
>>            "used_replica": 0,
>>            "precalc_pgid": 0,
>>            "last_sent": "2.28854e+06s",
>>            "attempts": 1,
>>            "snapid": "head",
>>            "snap_context": "a07c2=[]",
>>            "mtime": "2017-05-16 21:53:22.0.069541s",
>>            "osd_ops": [
>>                "delete"
>>            ]
>>        }
>>    ],
>>    "linger_ops": [
>>        {
>>            "linger_id": 1,
>>            "pg": "5.5f3bd635",
>>            "osd": 17,
>>            "object_id": "rbd_header.e10ca56b8b4567",
>>            "object_locator": "@5",
>>            "target_object_id": "rbd_header.e10ca56b8b4567",
>>            "target_object_locator": "@5",
>>            "paused": 0,
>>            "used_replica": 0,
>>            "precalc_pgid": 0,
>>            "snapid": "head",
>>            "registered": "1"
>>        }
>>    ],
>>    "pool_ops": [],
>>    "pool_stat_ops": [],
>>    "statfs_ops": [],
>>    "command_ops": []
>> }
>> 
>> Yes they've an established TCP connection. Qemu <=> osd.46. Attached is
>> a pcap file of the traffic between them when it got stuck.
>> 
>> Greets,
>> Stefan
>> 
>>> Am 16.05.2017 um 21:45 schrieb Jason Dillaman:
>>> On Tue, May 16, 2017 at 3:37 PM, Stefan Priebe - Profihost AG
>>> <[email protected]> wrote:
>>>> We've enabled the op tracker for performance reasons while using SSD
>>>> only storage ;-(
>>> 
>>> Disabled you mean?
>>> 
>>>> Can enable the op tracker using ceph osd tell? Than reproduce the
>>>> problem. Check what has stucked again? Or should i generate an rbd log
>>>> from the client?
>>> 
>>> From a super-quick glance at the code, it looks like that isn't a
>>> dynamic setting. Of course, it's possible that if you restart OSD 46
>>> to enable the op tracker, the stuck op will clear itself and the VM
>>> will resume. You could attempt to generate a gcore of OSD 46 to see if
>>> information on that op could be extracted via the debugger, but no
>>> guarantees.
>>> 
>>> You might want to verify that the stuck client and OSD 46 have an
>>> actual established TCP connection as well before doing any further
>>> actions.
>>> 
> 
> 
> 
> -- 
> Jason

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] corrupted rbd filesystems since jewel

Reply via email to