Hello Dan, it is good to know that there are actually people using ceph + qemu in production!
Regarding replicas: I thought about using size = 2, but I see that this resembles raid5 and size = 3 is more or less equal in terms of loss to raid6. Regarding the kernel panics: I am still researching / trying to find out why they happen. They can easily be reproduced by triggering high amount of i/o in a VM. We are mostly running Debian (stable, testing, stable+backports) that shows the kernel panics. Ubuntu has not shown this behaviour so far, afair. So if anyone has experienced kernel panics in Qemu-VMs running on RBD (and fixed it), please let me know! Cheers, Nico p.s.: We are *not* using rbdmap / kernel mounts - it's just qemu running with qemu-system-x86_64 -enable-kvm -name one-204 -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 512 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid d7c3374e-349e-4db6-8f54-f3c607f93101 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-204.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device lsi,id=scsi0,bus=pci.0,addr=0x4 -drive file=rbd:one/one-53-204-0:id=libvirt:key=...:auth_supported=cephx\;none:mon_host=kaffee.private.ungleich.ch\;wein.private.ungleich.ch\;tee.private.ungleich.ch,if=none,id=drive-scsi0-0-0,format=raw,cache=none -device scsi-hd,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0,bootindex=1 -drive file=/var/lib/one//datastores/0/204/disk.1,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=02:00:4d:6d:96:ae,bus=pci.0,addr=0x3 -vnc 0.0.0.0:204 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 Dan Van Der Ster [Wed, Jan 07, 2015 at 08:12:29PM +0000]: > Hi Nico, > Yes Ceph is production ready. Yes people are using it in production for qemu. > Last time I heard, Ceph was surveyed as the most popular backend for > OpenStack Cinder in production. > > When using RBD in production, it really is critically important to (a) use 3 > replicas and (b) pay attention to pg distribution early on so that you don't > end up with unbalanced OSDs. > > Replication is especially important for RBD because you > _must_not_ever_lose_an_entire_pg_. Parts of every single rbd device are > stored on every single PG... So losing a PG means you lost random parts of > every single block device. If this happens, the only safe course of action is > to restore from backups. But the whole point of Ceph is that it enables you > to configure adequate replication across failure domains, which makes this > scenario very very very unlikely to occur. > > I don't know why you were getting kernel panics. It's probably advisable to > stick to the most recent mainline kernel when using kRBD. > > Cheers, Dan > > On 7 Jan 2015 20:45, Nico Schottelius <[email protected]> wrote: > Good evening, > > we also tried to rescue data *from* our old / broken pool by map'ing the > rbd devices, mounting them on a host and rsync'ing away as much as > possible. > > However, after some time rsync got completly stuck and eventually the > host which mounted the rbd mapped devices decided to kernel panic at > which time we decided to drop the pool and go with a backup. > > This story and the one of Christian makes me wonder: > > Is anyone using ceph as a backend for qemu VM images in production? > > And: > > Has anyone on the list been able to recover from a pg incomplete / > stuck situation like ours? > > Reading about the issues on the list here gives me the impression that > ceph as a software is stuck/incomplete and has not yet become ready > "clean" for production (sorry for the word joke). > > Cheers, > > Nico > > Christian Eichelmann [Tue, Dec 30, 2014 at 12:17:23PM +0100]: > > Hi Nico and all others who answered, > > > > After some more trying to somehow get the pgs in a working state (I've > > tried force_create_pg, which was putting then in creating state. But > > that was obviously not true, since after rebooting one of the containing > > osd's it went back to incomplete), I decided to save what can be saved. > > > > I've created a new pool, created a new image there, mapped the old image > > from the old pool and the new image from the new pool to a machine, to > > copy data on posix level. > > > > Unfortunately, formatting the image from the new pool hangs after some > > time. So it seems that the new pool is suffering from the same problem > > as the old pool. Which is totaly not understandable for me. > > > > Right now, it seems like Ceph is giving me no options to either save > > some of the still intact rbd volumes, or to create a new pool along the > > old one to at least enable our clients to send data to ceph again. > > > > To tell the truth, I guess that will result in the end of our ceph > > project (running for already 9 Monthes). > > > > Regards, > > Christian > > > > Am 29.12.2014 15:59, schrieb Nico Schottelius: > > > Hey Christian, > > > > > > Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]: > > >> [incomplete PG / RBD hanging, osd lost also not helping] > > > > > > that is very interesting to hear, because we had a similar situation > > > with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg > > > directories to allow OSDs to start after the disk filled up completly. > > > > > > So I am sorry not to being able to give you a good hint, but I am very > > > interested in seeing your problem solved, as it is a show stopper for > > > us, too. (*) > > > > > > Cheers, > > > > > > Nico > > > > > > (*) We migrated from sheepdog to gluster to ceph and so far sheepdog > > > seems to run much smoother. The first one is however not supported > > > by opennebula directly, the second one not flexible enough to host > > > our heterogeneous infrastructure (mixed disk sizes/amounts) - so we > > > are using ceph at the moment. > > > > > > > > > -- > > Christian Eichelmann > > Systemadministrator > > > > 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting > > Brauerstraße 48 · DE-76135 Karlsruhe > > Telefon: +49 721 91374-8026 > > [email protected] > > > > Amtsgericht Montabaur / HRB 6484 > > Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert > > Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen > > Aufsichtsratsvorsitzender: Michael Scheeren > > -- > New PGP key: 659B 0D91 E86E 7E24 FD15 69D0 C729 21A1 293F 2D24 > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- New PGP key: 659B 0D91 E86E 7E24 FD15 69D0 C729 21A1 293F 2D24 _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
