On Tue, Sep 26, 2017 at 04:31:22PM +0200, Nicolas Ecarnot wrote: > Le 21/09/2017 à 16:31, Stefan Hajnoczi a écrit : > > On Tue, Sep 19, 2017 at 12:09:06PM +0200, Nicolas Ecarnot wrote: > > > Hello, > > > > > > First post here, so maybe I should introduce myself : > > > - I'm a sysadmin for decades and currently managing 4 oVirt clusters, made > > > out of tens of hypervisors, all are CentOS 7.2+ based. > > > - I'm very happy with this solution we choose especially because it is > > > based > > > on qemu-kvm (open source, reliable, documented). > > > > > > On one VM, we experienced the following : > > > - oVirt/vdsm is detecting an issue on the image > > > - following this hints https://access.redhat.com/solutions/1173623, I > > > managed to detect one error and fix it > > > - the VM is now running perfectly > > > > > > On two other VMs, we experienced a similar situation, except the check > > > stage > > > is showing something like 14000+ errors, and the relevant logs are : > > > > > > Repairing refcount block 14 is outside image > > > ERROR could not resize image: Invalid argument > > > ERROR cluster 425984 refcount=0 reference=1 > > > ERROR cluster 425985 refcount=0 reference=1 > > > [... repeating the previous line 7000+ times...] > > > ERROR cluster 457166 refcount=0 reference=1 > > > Rebuilding refcount structure > > > ERROR writing refblock: No space left on device > > > qemu-img: Check failed: No space left on device > > > > Please run strace qemu-img info /the/relevant/logical/volume/path. It
Sorry, "qemu-img info" should be your "qemu-img check" command. > > will print all the syscalls that qemu-img makes. That way we'll be able > > to verify that the ENOSPC error is coming from a pwritev syscall. > I did but I'm not skilled enough to ensure where the ENOSPC error is coming > from. > > Is your question meaning the reads and/or the writes may come from or go to > places outside the expected boundaries? I was interested in the syscall (probably pwritev or similar) related to the following output from qemu-img check: ERROR writing refblock: No space left on device Feel free to post your strace log so we can analyze it. > > > You surely know that oVirt/RHEV is storing its qcow2 images in dedicated > > > logical volumes. > > > > > > pvs/vgs/lvs are all showing there is plenty of space available, so I > > > understand that I don't understand what "No space left on device" means. > > > > After you have the strace data you can look at the file offset from the > > failing pwritev syscall and check that it's really within the LV. > > > > I think there is no fancy thin provisioning going on at the LVM level > > with oVirt, but if there is then perhaps a write within the LV could > > still result in an ENOSPC error. It would be worth confirming that > > these are class "thick" LVs. > > I think there is no such thin prov. at the LVM level, but I wouldn't swear. > Don't you mind if I forward your question to the oVirt mailing-list? Sure, feel free to CC other mailing lists. I have added oVirt devel. Stefan _______________________________________________ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel