On Thu, Nov 30, 2017 at 4:00 AM, Wido den Hollander <[email protected]> wrote:
>
>> Op 29 november 2017 om 14:56 schreef Jason Dillaman <[email protected]>:
>>
>>
>> We experienced this problem in the past on older (pre-Jewel) releases
>> where a PG split that affected the RBD header object would result in
>> the watch getting lost by librados. Any chance you know if the
>> affected RBD header objects were involved in a PG split? Can you
>> generate a gcore dump of one of the affected VMs and ceph-post-file it
>> for analysis?
>>
>
> There was no PG splitting in the recent months on this cluster, so that's not 
> something that might have happened here.

Possible alternative explanation: are you using cache tiering?

> I've asked the OpenStack team for a gcore dump, but they have to get that 
> cleared before they can send it to me.
>
> This might take a bit of time!
>
> Wido
>
>> As for the VM going R/O, that is the expected behavior when a client
>> breaks the exclusive lock held by a (dead) client.
>>
>> On Wed, Nov 29, 2017 at 8:48 AM, Wido den Hollander <[email protected]> wrote:
>> > Hi,
>> >
>> > On a OpenStack environment I encountered a VM which went into R/O mode 
>> > after a RBD snapshot was created.
>> >
>> > Digging into this I found 10s (out of thousands) RBD images which DO have 
>> > a running VM, but do NOT have a watcher on the RBD image.
>> >
>> > For example:
>> >
>> > $ rbd status volumes/volume-79773f2e-1f40-4eca-b9f0-953fa8d83086
>> >
>> > 'Watchers: none'
>> >
>> > The VM is however running since September 5th 2017 with Jewel 10.2.7 on 
>> > the client.
>> >
>> > In the meantime the cluster was already upgraded to 10.2.10
>> >
>> > Looking further I also found a Compute node with 10.2.10 installed which 
>> > also has RBD images without watchers.
>> >
>> > Restarting or live migrating the VM to a different host resolves this 
>> > issue.
>> >
>> > The internet is full of posts where RBD images still have Watchers when 
>> > people don't expect them, but in this case I'm expecting a watcher which 
>> > isn't there.
>> >
>> > The main problem right now is that creating a snapshot potentially puts a 
>> > VM in Read-Only state because of the lack of notification.
>> >
>> > Has anybody seen this as well?
>> >
>> > Thanks,
>> >
>> > Wido
>> > _______________________________________________
>> > ceph-users mailing list
>> > [email protected]
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Jason



-- 
Jason
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to