Hi Ilya,

We just tried the 3.10.83 kernel with more rbd fixes back-ported from higher 
kernel version. At this time, we tried again to run rbd and 3 OSD deamons on 
the same node, but rbd IO will still hang and OSD filestore thread will time 
out to suicide when the memory becomes very low under high load.  

When this happened, enabling rbd log can even cause system unresponsive, so 
haven't collected some logs. Only osd log says filestore thread was doing 
filestore::_write before time-out, will look into it further.

I know this is not an appropriate usage of CEPH and rbd, but I still want to 
ask if there is a way or workaround to do this? Is there any successful case in 
the community?  

Thanks.
David Zhang
From: [email protected]
To: [email protected]
Date: Fri, 31 Jul 2015 09:21:40 +0800
CC: [email protected]; [email protected]
Subject: Re: [ceph-users] which kernel version can help avoid kernel client 
deadlock





> Date: Thu, 30 Jul 2015 13:11:11 +0300
> Subject: Re: [ceph-users] which kernel version can help avoid kernel client 
> deadlock
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
> 
> On Thu, Jul 30, 2015 at 12:46 PM, Z Zhang <[email protected]> wrote:
> >
> >> Date: Thu, 30 Jul 2015 11:37:37 +0300
> >> Subject: Re: [ceph-users] which kernel version can help avoid kernel
> >> client deadlock
> >> From: [email protected]
> >> To: [email protected]
> >> CC: [email protected]; [email protected]
> >>
> >> On Thu, Jul 30, 2015 at 10:29 AM, Z Zhang <[email protected]>
> >> wrote:
> >> >
> >> > ________________________________
> >> > Subject: Re: [ceph-users] which kernel version can help avoid kernel
> >> > client
> >> > deadlock
> >> > From: [email protected]
> >> > Date: Thu, 30 Jul 2015 13:16:16 +0800
> >> > CC: [email protected]; [email protected]
> >> > To: [email protected]
> >> >
> >> >
> >> > On Jul 30, 2015, at 12:48 PM, Z Zhang <[email protected]> wrote:
> >> >
> >> > We also hit the similar issue from time to time on centos with 3.10.x
> >> > kernel. By iostat, we can see kernel rbd client's util is 100%, but no
> >> > r/w
> >> > io, and we can't umount/unmap this rbd client. After restarting OSDs, it
> >> > will become normal.
> >>
> >> 3.10.x is rather vague, what is the exact version you saw this on? Can you
> >> provide syslog logs (I'm interested in dmesg)?
> >
> > The kernel version should be 3.10.0.
> >
> > I don't have sys logs at hand. It is not easily reproduced, and it happened
> > at very low memory situation. We are running DB instances over rbd as
> > storage. DB instances will use lot of memory when running high concurrent
> > rw, and after running for a long time, rbd might hit this problem, but not
> > always. Enabling rbd log makes our system behave strange during our test.
> >
> > I back-ported one of your fixes:
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/block/rbd.c?id=5a60e87603c4c533492c515b7f62578189b03c9c
> >
> > So far test looks fine for few days, but still under observation. So want to
> > know if there are some other fixes?
> 
> I'd suggest following 3.10 stable series (currently at 3.10.84).  The
> fix you backported is crucial in low memory situations, so I wouldn't
> be surprised if it alone fixed your problem.  (It is not in 3.10.84,
> I assume it'll show up in 3.10.85 - for now just apply your backport.)
> 
cool, looking forward 3.10.85 to see what else would be brought in.
Thanks.
> Thanks,
> 
>                 Ilya
                                          

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com                          
          
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to