> Op 29 november 2017 om 14:56 schreef Jason Dillaman <[email protected]>:
> 
> 
> We experienced this problem in the past on older (pre-Jewel) releases
> where a PG split that affected the RBD header object would result in
> the watch getting lost by librados. Any chance you know if the
> affected RBD header objects were involved in a PG split? Can you
> generate a gcore dump of one of the affected VMs and ceph-post-file it
> for analysis?
> 

I asked again for the gcore, but they can't release it as it contains 
confidential information about the Instance and the Ceph cluster. I understand 
their reasoning and they also understand that it makes it difficult to debug 
this.

I am allowed to look at the gcore dump when on location (next week), but I'm 
not allowed to share it.

> As for the VM going R/O, that is the expected behavior when a client
> breaks the exclusive lock held by a (dead) client.
> 

We noticed another VM going into RO when a snapshot was created. When checking 
last week this Instance had a watcher, but after the snapshot/RO we found out 
it no longer has a watcher registered.

Any suggestions or ideas?

Wido

> On Wed, Nov 29, 2017 at 8:48 AM, Wido den Hollander <[email protected]> wrote:
> > Hi,
> >
> > On a OpenStack environment I encountered a VM which went into R/O mode 
> > after a RBD snapshot was created.
> >
> > Digging into this I found 10s (out of thousands) RBD images which DO have a 
> > running VM, but do NOT have a watcher on the RBD image.
> >
> > For example:
> >
> > $ rbd status volumes/volume-79773f2e-1f40-4eca-b9f0-953fa8d83086
> >
> > 'Watchers: none'
> >
> > The VM is however running since September 5th 2017 with Jewel 10.2.7 on the 
> > client.
> >
> > In the meantime the cluster was already upgraded to 10.2.10
> >
> > Looking further I also found a Compute node with 10.2.10 installed which 
> > also has RBD images without watchers.
> >
> > Restarting or live migrating the VM to a different host resolves this issue.
> >
> > The internet is full of posts where RBD images still have Watchers when 
> > people don't expect them, but in this case I'm expecting a watcher which 
> > isn't there.
> >
> > The main problem right now is that creating a snapshot potentially puts a 
> > VM in Read-Only state because of the lack of notification.
> >
> > Has anybody seen this as well?
> >
> > Thanks,
> >
> > Wido
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> -- 
> Jason
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to