Christian Schnidrig writes:
> Well that’s strange. I wonder why our systems behave so differently.

One point about our cluster (I work with Christian, who's still on
vacation, and Jens-Christian) is that it has 124 OSDs and 2048 PGs (I
think) in the pool used for these RBD volumes.  As a result, each
connected RBD volume can result in 124 (or slightly less) connections
from the RBD client inside Qemu/KVM to each OSD that stores data from
that RBD volume.

I don't know how librbd's connection management works.  I assume that
these librbd-to-OSD connections are only created once the client
actually tries to access data on that OSD.  But when you have a lot of
data on the RBD volumes that the VM actually accesses (which we have),
then these many connections will actually be created.  And apparently
librbd doesn't handle the situation very gracefully when its process
runs into the limit of open file descriptors.

George only has 20 OSDs, so I guess that's an upper bound on the number
of TCP connections that librbd will open per RBD volume.  He should be
safe up to about 50 volumes per VM, assuming the default nfiles limit of
1024.

The nasty thing is when everything has been running fine for ages, and
then you add a bunch of OSDs, run a few benchmarks, see that everything
should run much BETTER (as promised :-), but then suddenly some VMs with
lots of mounted volumes mysteriously start hanging.

> Maybe the number of placement groups plays a major role as
> well. Jens-Christian may be able to give you the specifics of our ceph
> cluster.

Me too, see above.

> I’m about to leave on vacation and don’t have time to look that up
> anymore.

Enjoy your well-earned vacation!!
-- 
Simon.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to