Hi Patrick, We continue to hit this bug. Just a couple of questions:
1. I see that http://tracker.ceph.com/issues/16983 has been updated and you believe it is related to http://tracker.ceph.com/issues/16013. It looks like this fix is scheduled to be backported to Jewel at some point... is there any sense as to when that might happen and a point release made? 2. Looking at the pull request: https://github.com/ceph/ceph/pull/8778 I ran through the testing steps that were posted and was unable to replicate the crash. 3. When we do hit this condition, what is the best way to recover? I can continue to restart the MDS services and reboot the hosts, but the condition remains for some period of time. Even after blacklisting all clients the condition persists. It's actually unclear to me how/why this is recovering at all. If it will be some period of time before the fix is released is there any workaround or temporary solution? Thanks in advance, Randy On Wed, Aug 10, 2016 at 4:38 PM, Randy Orr <[email protected]> wrote: > Patrick, > > We are using the kernel client. We have a mix of 4.4 and 3.19 kernels on > the client side with plans to move away from the 3.19 kernel where/when we > can. > > -Randy > > On Wed, Aug 10, 2016 at 4:24 PM, Patrick Donnelly <[email protected]> > wrote: > >> Randy, are you using ceph-fuse or the kernel client (or something else)? >> >> On Wed, Aug 10, 2016 at 2:33 PM, Randy Orr <[email protected]> wrote: >> > Great, thank you. Please let me know if I can be of any assistance in >> > testing or validating a fix. >> > >> > -Randy >> > >> > On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly <[email protected]> >> > wrote: >> >> >> >> Hello Randy, >> >> >> >> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr <[email protected]> >> wrote: >> >> > mds/Locker.cc: In function 'bool Locker::check_inode_max_size(C >> Inode*, >> >> > bool, >> >> > bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time >> >> > 2016-08-09 18:51:50.626630 >> >> > mds/Locker.cc: 2190: FAILED assert(in->is_file()) >> >> > >> >> > ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269) >> >> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> >> > const*)+0x8b) [0x563d1e0a2d3b] >> >> > 2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned >> long, >> >> > bool, >> >> > unsigned long, utime_t)+0x15e3) [0x563d1de506a3] >> >> > 3: (Server::handle_client_open(std::shared_ptr<MDRequestImpl>&) >> +0x1061) >> >> > [0x563d1dd386a1] >> >> > 4: >> >> > (Server::dispatch_client_request(std::shared_ptr<MDRequestIm >> pl>&)+0xa0b) >> >> > [0x563d1dd5709b] >> >> > 5: (Server::handle_client_request(MClientRequest*)+0x47f) >> >> > [0x563d1dd5768f] >> >> > 6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db] >> >> > 7: (MDSRank::handle_deferrable_message(Message*)+0x80c) >> >> > [0x563d1dce1f8c] >> >> > 8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081] >> >> > 9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5] >> >> > 10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83] >> >> > 11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb] >> >> > 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d] >> >> > 13: (()+0x8184) [0x7fc30bd7c184] >> >> > 14: (clone()+0x6d) [0x7fc30a2d337d] >> >> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> >> > needed to >> >> > interpret this. >> >> >> >> I have a bug report filed for this issue: >> >> http://tracker.ceph.com/issues/16983 >> >> >> >> I believe it should be straightforward to solve and we'll have a fix >> >> for it soon. >> >> >> >> Thanks for the report! >> >> >> >> -- >> >> Patrick Donnelly >> > >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > [email protected] >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> >> >> -- >> Patrick Donnelly >> > >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
