Hi James,
That doesn't sound like a fun one to debug. I'll try your messaging
stack size tweak after the current (super ugly) hack experiment, to be
described next....
Thanks-
John
On 10/28/2013 11:11 PM, James Harper wrote:
> Maybe nothing to do with your issue, but I was having problems using librbd
> with blktap, and ended up adding:
>
> [client]
> ms rwthread stack bytes = 8388608
>
> to my config. This is a workaround, not a fix though (IMHO) as there is
> nothing to indicate that librbd is running out of stack space, rather that
> stack is being clobbered and this works around it. I spent a fair bit of time
> trying to debug it but could never pin it down.
>
> James
>
>> -----Original Message-----
>> From: [email protected] [mailto:ceph-users-
>> [email protected]] On Behalf Of John Morris
>> Sent: Tuesday, 29 October 2013 6:01 AM
>> To: [email protected]
>> Subject: [ceph-users] Ceph + Xen - RBD io hang
>>
>> I'm encountering a problem with RBD-backed Xen. During a VM boot,
>> pygrub attaches the VM's root VDI to dom0. This hangs with these
>> messages in the debug log:
>>
>> Oct 27 21:19:59 xen27 kernel:
>> vbd vbd-51728: 16 Device in use; refusing to close
>> Oct 27 21:19:59 xen27 xenopsd-xenlight:
>> [xenops] waiting for backend to close
>> Oct 27 21:19:59 xen27 kernel:
>> qemu-system-i38[2899]: segfault at 7fac042e4000 ip 00007fac0447b129
>> sp 00007fffe7028630 error 4 in qemu-system-i386[7fac042ed000+309000]
>>
>> More details here:
>>
>> http://pastebin.ca/2472234
>>
>> - Scientific Linux 6
>> - 64-bit, Phenom CPU
>> - Ceph from RPM ceph-0.67.4-0.el6.x86_64
>> - XenAPI from Dave Scott's technology preview
>> - two btrfs-backed OSDs with journals on separate drives
>> - various kernels, incl. 3.4.6 from Dave Scott's repo and 3.11.6
>> from elrepo.org.
>>
>> This thread (whose Subject: I borrowed) describes what I'm seeing quite
>> well, but no resolution was posted:
>>
>> http://comments.gmane.org/gmane.comp.file-systems.ceph.user/3636
>>
>> In my case, udevd starts a 'blkid' process that holds /dev/xvdb open.
>> Like in James's case, any interaction with the device will hang, and
>> usually can't be killed. This same problem prevents the machine from
>> completing shutdown.
>>
>> In that thread, Sylvain Munaut says the OSD and kernel driver shouldn't
>> be run in the same host. I believe my setup does not violate that,
>> since the rbd kernel module is not loaded, and instead the device is
>> attached through the xen_blkfront module instead.
>>
>> Thanks-
>>
>> John
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com