Hi,
I just started testing VMs inside ceph this week, ceph-hammer 0.94-5 here.
I built several pools, using pool tiering:
- A small replicated SSD pool (5 SSDs only, but I thought it'd be
better for IOPS, I intend to test the difference with disks only)
- Overlaying a larger EC pool
I just have 2 VMs in Ceph... and one of them is breaking something.
The VM that is not breaking was migrated using qemu-img for creating the ceph
volume, then migrating the data. Its rbd format is 1 :
rbd image 'xxx-disk1':
size 20480 MB in 5120 objects
order 22 (4096 kB objects)
block_name_prefix: rb.0.83a49.3d1b58ba
format: 1
The VM that's failing has a rbd format 2
this is what I had before things started breaking :
rbd image 'yyy-disk1':
size 10240 MB in 2560 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.8ae1f47398c89
format: 2
features: layering, striping
flags:
stripe unit: 4096 kB
stripe count: 1
The VM started behaving weirdly with a huge IOwait % during its install (that's
to say it did not take long to go wrong ;) )
Now, this is the only thing that I can get
[root@ceph0 ~]# rbd -p irfu-virt info yyy-disk1
2016-02-24 18:30:33.213590 7f00e6f6d7c0 -1 librbd::ImageCtx: error reading
image id: (95) Operation not supported
rbd: error opening image yyy-disk1: (95) Operation not supported
One thing to note : the VM *IS STILL* working : I can still do disk operations,
apparently.
During the VM installation, I realized I wrongly set the target SSD caching
size to 100Mbytes, instead of 100Gbytes, and ceph complained it was almost full
:
health HEALTH_WARN
'ssd-hot-irfu-virt' at/near target max
My question is...... am I facing the bug as reported in this list thread with
title "Possible Cache Tier Bug - Can someone confirm" ?
Or did I do something wrong ?
The libvirt and kvm that are writing into ceph are the following :
libvirt-1.2.17-13.el7_2.3.x86_64
qemu-kvm-1.5.3-105.el7_2.3.x86_64
Any idea how I could recover the VM file, if possible ?
Please note I have no problem with deleting the VM and rebuilding it, I just
spawned it to test.
As a matter of fact, I just "virsh destroyed" the VM, to see if I could start
it again... and I cant :
# virsh start yyy
error: Failed to start domain yyy
error: internal error: process exited while connecting to monitor:
2016-02-24T17:49:59.262170Z qemu-kvm: -drive
file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***==:auth_supported=cephx\;none:mon_host=_____\:6789,if=none,id=drive-virtio-disk0,format=raw:
error reading header from yyy-disk1
2016-02-24T17:49:59.263743Z qemu-kvm: -drive
file=rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=A***==:auth_supported=cephx\;none:mon_host=___\:6789,if=none,id=drive-virtio-disk0,format=raw:
could not open disk image
rbd:irfu-virt/___-disk1:id=irfu-***==:auth_supported=cephx\;none:mon_host=___\:6789:
Could not open 'rbd:irfu-virt/yyy-disk1:id=irfu-virt:key=***
Ideas ?
Thanks
Frederic
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com