Definitely would like to see the "debug rbd = 20" logs from 192.168.254.17 when 
this occurs.  If you are co-locating your OSDs, MONs, and qemu-kvm processes, 
make sure your ceph.conf has "log file = </path/to/client.log>" defined in the 
[global] or [client] section.

-- 

Jason Dillaman 


----- Original Message -----
> From: "Василий Ангапов" <anga...@gmail.com>
> To: "Jason Dillaman" <dilla...@redhat.com>, "ceph-users" 
> <ceph-users@lists.ceph.com>
> Sent: Wednesday, January 13, 2016 4:22:02 AM
> Subject: Re: [ceph-users] How to do quiesced rbd snapshot in libvirt?
> 
> Hello again!
> 
> Unfortunately I have to raise the problem again. I have constantly
> hanging snapshots on several images.
> My Ceph version is now 0.94.5.
> RBD CLI always giving me this:
> root@slpeah001:[~]:# rbd snap create
> volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd --snap test
> 2016-01-13 12:04:39.107166 7fb70e4c2880 -1 librbd::ImageWatcher:
> 0x427a710 no lock owners detected
> 2016-01-13 12:04:44.108783 7fb70e4c2880 -1 librbd::ImageWatcher:
> 0x427a710 no lock owners detected
> 2016-01-13 12:04:49.110321 7fb70e4c2880 -1 librbd::ImageWatcher:
> 0x427a710 no lock owners detected
> 2016-01-13 12:04:54.112373 7fb70e4c2880 -1 librbd::ImageWatcher:
> 0x427a710 no lock owners detected
> 
> I turned "debug rbd = 20" and found this records only on one of OSDs
> (on the same host as RBD client):
> 2016-01-13 11:44:46.076780 7fb5f05d8700  0 --
> 192.168.252.11:6804/407141 >> 192.168.252.11:6800/407122
> pipe(0x392d2000 sd=257 :6804 s=2 pgs=17 cs=1 l=0 c=0x383b4160).fault
> with nothing to send, going to standby
> 2016-01-13 11:58:26.261460 7fb5efbce700  0 --
> 192.168.252.11:6804/407141 >> 192.168.252.11:6802/407124
> pipe(0x39e45000 sd=156 :6804 s=2 pgs=17 cs=1 l=0 c=0x386fbb20).fault
> with nothing to send, going to standby
> 2016-01-13 12:04:23.948931 7fb5fede2700  0 --
> 192.168.254.11:6804/407141 submit_message watch-notify(notify_complete
> (2) cookie 44850800 notify 99720550678667 ret -110) v3 remote,
> 192.168.254.11:0/1468572, failed lossy con, dropping message
> 0x3ab76fc0
> 2016-01-13 12:09:04.254329 7fb5fede2700  0 --
> 192.168.254.11:6804/407141 submit_message watch-notify(notify_complete
> (2) cookie 69846112 notify 99720550678721 ret -110) v3 remote,
> 192.168.254.11:0/1509673, failed lossy con, dropping message
> 0x3830cb40
> 
> Here is the image properties
> root@slpeah001:[~]:# rbd info
> volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd
> rbd image 'volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd':
>         size 200 GB in 51200 objects
>         order 22 (4096 kB objects)
>         block_name_prefix: rbd_data.2f2a81562fea59
>         format: 2
>         features: layering, striping, exclusive, object map
>         flags:
>         stripe unit: 4096 kB
>         stripe count: 1
> root@slpeah001:[~]:# rbd status
> volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd
> Watchers:
>         watcher=192.168.254.17:0/2088291 client.3424561 cookie=93888518795008
> root@slpeah001:[~]:# rbd lock list
> volumes/volume-26c89a0a-be4d-45d4-85a6-e0dc134941fd
> There is 1 exclusive lock on this image.
> Locker         ID                  Address
> client.3424561 auto 93888518795008 192.168.254.17:0/2088291
> 
> Also taking RBD snapshots from python API also is hanging...
> This image is being used by libvirt.
> 
> Any suggestions?
> Thanks!
> 
> Regards, Vasily.
> 
> 
> 2016-01-06 1:11 GMT+08:00 Мистер Сёма <anga...@gmail.com>:
> > Well, I believe the problem is no more valid.
> > My code before was:
> > virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}'
> > rbd snap create $RBD_ID --snap `date +%F-%T`
> >
> > and then snapshot creation was hanging forever. I inserted a 2 second
> > sleep.
> >
> > My code after
> > virsh qemu-agent-command $INSTANCE '{"execute":"guest-fsfreeze-freeze"}'
> > sleep 2
> > rbd snap create $RBD_ID --snap `date +%F-%T`
> >
> > And now it works perfectly. Again, I have no idea, how it solved the
> > problem.
> > Thanks :)
> >
> > 2016-01-06 0:49 GMT+08:00 Мистер Сёма <anga...@gmail.com>:
> >> I am very sorry, but I am not able to increase log versbosity because
> >> it's a production cluster with very limited space for logs. Sounds
> >> crazy, but that's it.
> >> I have found out that the RBD snapshot process hangs forever only when
> >> QEMU fsfreeze was issued just before the snapshot. If the guest is not
> >> frozen - snapshot is taken with no problem... I have absolutely no
> >> idea how these two things could be related to each other... And again
> >> this issue occurs only when there is an exclusive lock on image and
> >> exclusive lock feature is enabled also on it.
> >>
> >> Do somebody else have such a problem?
> >>
> >> 2016-01-05 2:55 GMT+08:00 Jason Dillaman <dilla...@redhat.com>:
> >>> I am surprised by the error you are seeing with exclusive lock enabled.
> >>> The rbd CLI should be able to send the 'snap create' request to QEMU
> >>> without an error.  Are you able to provide "debug rbd = 20" logs from
> >>> shortly before and after your snapshot attempt?
> >>>
> >>> --
> >>>
> >>> Jason Dillaman
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Мистер Сёма" <anga...@gmail.com>
> >>>> To: "ceph-users" <ceph-users@lists.ceph.com>
> >>>> Sent: Monday, January 4, 2016 12:37:07 PM
> >>>> Subject: [ceph-users] How to do quiesced rbd snapshot in libvirt?
> >>>>
> >>>> Hello,
> >>>>
> >>>> Can anyone please tell me what is the right way to do quiesced RBD
> >>>> snapshots in libvirt (OpenStack)?
> >>>> My Ceph version is 0.94.3.
> >>>>
> >>>> I found two possible ways, none of them is working for me. Wonder if
> >>>> I'm doing something wrong:
> >>>> 1) Do VM fsFreeze through QEMU guest agent, perform RBD snapshot, do
> >>>> fsThaw. Looks good but the bad thing here is that libvirt uses
> >>>> exclusive lock on image, which results in errors like that when taking
> >>>> snapshot: " 7f359d304880 -1 librbd::ImageWatcher: no lock owners
> >>>> detected". It seems like rbd client is trying to take snapshot on
> >>>> behalf of exclusive lock owner but is unable to find this owner.
> >>>> Without an exclusive lock everything is working nice.
> >>>>
> >>>> 2)  Performing QEMU external snapshots with local QCOW2 file being
> >>>> overlayed on top of RBD image. This seems really interesting but the
> >>>> bad thing is that there is no way currently to remove this kind of
> >>>> snapshot because active blockcommit is not currently working for RBD
> >>>> images (https://bugzilla.redhat.com/show_bug.cgi?id=1189998).
> >>>>
> >>>> So again my question is: how do you guys take quiesced RBD snapshots in
> >>>> libvirt?
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to