Thanks for the pointer to the patched kernel. I'll give that a shot. On Thu, Apr 9, 2015, 5:56 AM Ilya Dryomov <[email protected]> wrote:
> On Wed, Apr 8, 2015 at 5:25 PM, Shawn Edwards <[email protected]> > wrote: > > We've been working on a storage repository for xenserver 6.5, which uses > the > > 3.10 kernel (ug). I got the xenserver guys to include the rbd and > libceph > > kernel modules into the 6.5 release, so that's at least available. > > > > Where things go bad is when we have many (>10 or so) VMs on one host, all > > using RBD clones for the storage mapped using the rbd kernel module. The > > Xenserver crashes so badly that it doesn't even get a chance to kernel > > panic. The whole box just hangs. > > I'm not very familiar with Xen and ways to debug it but if the problem > lies in libceph or rbd kernel modules we'd like to fix it. Perhaps try > grabbing a vmcore? If it just hangs and doesn't panic you can normally > induce a crash with a sysrq. > > > > > Has anyone else seen this sort of behavior? > > > > We have a lot of ways to try to work around this, but none of them are > very > > pretty: > > > > * move the code to user space, ditch the kernel driver: The build tools > for > > Xenserver are all CentOS5 based, and it is painful to get all of the deps > > built to get the ceph user space libs built. > > > > * backport the ceph and rbd kernel modules to 3.10. Has proven painful, > as > > the block device code changed somewhere in the 3.14-3.16 timeframe. > > https://github.com/ceph/ceph-client/commits/rhel7-3.10.0-123.9.3 branch > would be a good start - it has libceph.ko and rbd.ko as of 3.18-rc5 > backported to rhel7 (which is based on 3.10) and may be updated in the > future as well, although no promises on that. > > Thanks, > > Ilya >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
