Thanks for the pointer to the patched kernel.  I'll give that a shot.

On Thu, Apr 9, 2015, 5:56 AM Ilya Dryomov <[email protected]> wrote:

> On Wed, Apr 8, 2015 at 5:25 PM, Shawn Edwards <[email protected]>
> wrote:
> > We've been working on a storage repository for xenserver 6.5, which uses
> the
> > 3.10 kernel (ug).  I got the xenserver guys to include the rbd and
> libceph
> > kernel modules into the 6.5 release, so that's at least available.
> >
> > Where things go bad is when we have many (>10 or so) VMs on one host, all
> > using RBD clones for the storage mapped using the rbd kernel module.  The
> > Xenserver crashes so badly that it doesn't even get a chance to kernel
> > panic.  The whole box just hangs.
>
> I'm not very familiar with Xen and ways to debug it but if the problem
> lies in libceph or rbd kernel modules we'd like to fix it.  Perhaps try
> grabbing a vmcore?  If it just hangs and doesn't panic you can normally
> induce a crash with a sysrq.
>
> >
> > Has anyone else seen this sort of behavior?
> >
> > We have a lot of ways to try to work around this, but none of them are
> very
> > pretty:
> >
> > * move the code to user space, ditch the kernel driver:  The build tools
> for
> > Xenserver are all CentOS5 based, and it is painful to get all of the deps
> > built to get the ceph user space libs built.
> >
> > * backport the ceph and rbd kernel modules to 3.10.  Has proven painful,
> as
> > the block device code changed somewhere in the 3.14-3.16 timeframe.
>
> https://github.com/ceph/ceph-client/commits/rhel7-3.10.0-123.9.3 branch
> would be a good start - it has libceph.ko and rbd.ko as of 3.18-rc5
> backported to rhel7 (which is based on 3.10) and may be updated in the
> future as well, although no promises on that.
>
> Thanks,
>
>                 Ilya
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to