Re: [ceph-users] corrupted rbd filesystems since jewel

Stefan Priebe - Profihost AG Wed, 17 May 2017 08:50:33 -0700

Ah no wrong thread. Will Test your Suggestion 

Stefan


Excuse my typo sent from my mobile phone.

> Am 17.05.2017 um 17:05 schrieb Jason Dillaman <[email protected]>:
> 
> OSD 23 notes that object rbd_data.21aafa6b8b4567.0000000000000aaa is
> waiting for a scrub. What happens if you run "rados -p <rbd pool> rm
> rbd_data.21aafa6b8b4567.0000000000000aaa" (capturing the OSD 23 logs
> during this)? If that succeeds while your VM remains blocked on that
> remove op, it looks like there is some problem in the OSD where ops
> queued on a scrub are not properly awoken when the scrub completes.
> 
> On Wed, May 17, 2017 at 10:57 AM, Stefan Priebe - Profihost AG
> <[email protected]> wrote:
>> Hello Jason,
>> 
>> after enabling the log and generating a gcore dump, the request was
>> successful ;-(
>> 
>> So the log only contains the successfull request. So i was only able to
>> catch the successful request. I can send you the log on request.
>> 
>> Luckily i had another VM on another Cluster behaving the same.
>> 
>> This time osd.23:
>> # ceph --admin-daemon
>> /var/run/ceph/ceph-client.admin.22969.140085040783360.asok
>> objecter_requests
>> {
>>    "ops": [
>>        {
>>            "tid": 18777,
>>            "pg": "2.cebed0aa",
>>            "osd": 23,
>>            "object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa",
>>            "object_locator": "@2",
>>            "target_object_id": "rbd_data.21aafa6b8b4567.0000000000000aaa",
>>            "target_object_locator": "@2",
>>            "paused": 0,
>>            "used_replica": 0,
>>            "precalc_pgid": 0,
>>            "last_sent": "1.83513e+06s",
>>            "attempts": 1,
>>            "snapid": "head",
>>            "snap_context": "28a43=[]",
>>            "mtime": "2017-05-17 16:51:06.0.455475s",
>>            "osd_ops": [
>>                "delete"
>>            ]
>>        }
>>    ],
>>    "linger_ops": [
>>        {
>>            "linger_id": 1,
>>            "pg": "2.f0709c34",
>>            "osd": 23,
>>            "object_id": "rbd_header.21aafa6b8b4567",
>>            "object_locator": "@2",
>>            "target_object_id": "rbd_header.21aafa6b8b4567",
>>            "target_object_locator": "@2",
>>            "paused": 0,
>>            "used_replica": 0,
>>            "precalc_pgid": 0,
>>            "snapid": "head",
>>            "registered": "1"
>>        }
>>    ],
>>    "pool_ops": [],
>>    "pool_stat_ops": [],
>>    "statfs_ops": [],
>>    "command_ops": []
>> }
>> 
>> OSD Logfile of OSD 23 attached.
>> 
>> Greets,
>> Stefan
>> 
>>> Am 17.05.2017 um 16:26 schrieb Jason Dillaman:
>>> On Wed, May 17, 2017 at 10:21 AM, Stefan Priebe - Profihost AG
>>> <[email protected]> wrote:
>>>> You mean the request no matter if it is successful or not? Which log
>>>> level should be set to 20?
>>> 
>>> 
>>> I'm hoping you can re-create the hung remove op when OSD logging is
>>> increased -- "debug osd = 20" would be nice if you can turn it up that
>>> high while attempting to capture the blocked op.
>>> 
> 
> 
> 
> -- 
> Jason

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] corrupted rbd filesystems since jewel

Reply via email to