Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

Nico Schottelius Wed, 07 Jan 2015 13:14:52 -0800

Hello Dan,

it is good to know that there are actually people using ceph + qemu in
production!


Regarding replicas: I thought about using size = 2, but I see that
this resembles raid5 and size = 3 is more or less equal in terms of loss
to raid6.

Regarding the kernel panics: I am still researching / trying to find out
why they happen. They can easily be reproduced by triggering high amount
of i/o in a VM. 

We are mostly running Debian (stable, testing, stable+backports) that
shows the kernel panics.
Ubuntu has not shown this behaviour so far, afair.

So if anyone has experienced kernel panics in Qemu-VMs running on RBD
(and fixed it), please let me know!

Cheers,

Nico

p.s.: We are *not* using rbdmap / kernel mounts - it's just qemu running with
qemu-system-x86_64 -enable-kvm -name one-204 -S -machine 
pc-i440fx-trusty,accel=kvm,usb=off -m 512 -realtime mlock=off -smp 
2,sockets=2,cores=1,threads=1 -uuid d7c3374e-349e-4db6-8f54-f3c607f93101
-no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/one-204.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
lsi,id=scsi0,bus=pci.0,addr=0x4 -drive
file=rbd:one/one-53-204-0:id=libvirt:key=...:auth_supported=cephx\;none:mon_host=kaffee.private.ungleich.ch\;wein.private.ungleich.ch\;tee.private.ungleich.ch,if=none,id=drive-scsi0-0-0,format=raw,cache=none
-device 
scsi-hd,bus=scsi0.0,scsi-id=0,drive=drive-scsi0-0-0,id=scsi0-0-0,bootindex=1 
-drive 
file=/var/lib/one//datastores/0/204/disk.1,if=none,id=drive-ide0-0-0,readonly=on,format=raw
 -device
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev 
tap,fd=24,id=hostnet0 -device 
rtl8139,netdev=hostnet0,id=net0,mac=02:00:4d:6d:96:ae,bus=pci.0,addr=0x3 -vnc 
0.0.0.0:204 -device
cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


Dan Van Der Ster [Wed, Jan 07, 2015 at 08:12:29PM +0000]:
> Hi Nico,
> Yes Ceph is production ready. Yes people are using it in production for qemu. 
> Last time I heard, Ceph was surveyed as the most popular backend for 
> OpenStack Cinder in production.
> 
> When using RBD in production, it really is critically important to (a) use 3 
> replicas and (b) pay attention to pg distribution early on so that you don't 
> end up with unbalanced OSDs.
> 
> Replication is especially important for RBD because you 
> _must_not_ever_lose_an_entire_pg_. Parts of every single rbd device are 
> stored on every single PG... So losing a PG means you lost random parts of 
> every single block device. If this happens, the only safe course of action is 
> to restore from backups. But the whole point of Ceph is that it enables you 
> to configure adequate replication across failure domains, which makes this 
> scenario very very very unlikely to occur.
> 
> I don't know why you were getting kernel panics. It's probably advisable to 
> stick to the most recent mainline kernel when using kRBD.
> 
> Cheers, Dan
> 
> On 7 Jan 2015 20:45, Nico Schottelius <[email protected]> wrote:
> Good evening,
> 
> we also tried to rescue data *from* our old / broken pool by map'ing the
> rbd devices, mounting them on a host and rsync'ing away as much as
> possible.
> 
> However, after some time rsync got completly stuck and eventually the
> host which mounted the rbd mapped devices decided to kernel panic at
> which time we decided to drop the pool and go with a backup.
> 
> This story and the one of Christian makes me wonder:
> 
>     Is anyone using ceph as a backend for qemu VM images in production?
> 
> And:
> 
>     Has anyone on the list been able to recover from a pg incomplete /
>     stuck situation like ours?
> 
> Reading about the issues on the list here gives me the impression that
> ceph as a software is stuck/incomplete and has not yet become ready
> "clean" for production (sorry for the word joke).
> 
> Cheers,
> 
> Nico
> 
> Christian Eichelmann [Tue, Dec 30, 2014 at 12:17:23PM +0100]:
> > Hi Nico and all others who answered,
> >
> > After some more trying to somehow get the pgs in a working state (I've
> > tried force_create_pg, which was putting then in creating state. But
> > that was obviously not true, since after rebooting one of the containing
> > osd's it went back to incomplete), I decided to save what can be saved.
> >
> > I've created a new pool, created a new image there, mapped the old image
> > from the old pool and the new image from the new pool to a machine, to
> > copy data on posix level.
> >
> > Unfortunately, formatting the image from the new pool hangs after some
> > time. So it seems that the new pool is suffering from the same problem
> > as the old pool. Which is totaly not understandable for me.
> >
> > Right now, it seems like Ceph is giving me no options to either save
> > some of the still intact rbd volumes, or to create a new pool along the
> > old one to at least enable our clients to send data to ceph again.
> >
> > To tell the truth, I guess that will result in the end of our ceph
> > project (running for already 9 Monthes).
> >
> > Regards,
> > Christian
> >
> > Am 29.12.2014 15:59, schrieb Nico Schottelius:
> > > Hey Christian,
> > >
> > > Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]:
> > >> [incomplete PG / RBD hanging, osd lost also not helping]
> > >
> > > that is very interesting to hear, because we had a similar situation
> > > with ceph 0.80.7 and had to re-create a pool, after I deleted 3 pg
> > > directories to allow OSDs to start after the disk filled up completly.
> > >
> > > So I am sorry not to being able to give you a good hint, but I am very
> > > interested in seeing your problem solved, as it is a show stopper for
> > > us, too. (*)
> > >
> > > Cheers,
> > >
> > > Nico
> > >
> > > (*) We migrated from sheepdog to gluster to ceph and so far sheepdog
> > >     seems to run much smoother. The first one is however not supported
> > >     by opennebula directly, the second one not flexible enough to host
> > >     our heterogeneous infrastructure (mixed disk sizes/amounts) - so we
> > >     are using ceph at the moment.
> > >
> >
> >
> > --
> > Christian Eichelmann
> > Systemadministrator
> >
> > 1&1 Internet AG - IT Operations Mail & Media Advertising & Targeting
> > Brauerstraße 48 · DE-76135 Karlsruhe
> > Telefon: +49 721 91374-8026
> > [email protected]
> >
> > Amtsgericht Montabaur / HRB 6484
> > Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Robert
> > Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr. Oliver Mauss, Jan Oetjen
> > Aufsichtsratsvorsitzender: Michael Scheeren
> 
> --
> New PGP key: 659B 0D91 E86E 7E24 FD15  69D0 C729 21A1 293F 2D24
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
New PGP key: 659B 0D91 E86E 7E24 FD15  69D0 C729 21A1 293F 2D24
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

Reply via email to