Hi Patrick,

We continue to hit this bug. Just a couple of questions:

1. I see that http://tracker.ceph.com/issues/16983 has been updated and you
believe it is related to http://tracker.ceph.com/issues/16013. It looks
like this fix is scheduled to be backported to Jewel at some point... is
there any sense as to when that might happen and a point release made?

2. Looking at the pull request: https://github.com/ceph/ceph/pull/8778 I
ran through the testing steps that were posted and was unable to replicate
the crash.

3. When we do hit this condition, what is the best way to recover? I can
continue to restart the MDS services and reboot the hosts, but the
condition remains for some period of time. Even after blacklisting all
clients the condition persists. It's actually unclear to me how/why this is
recovering at all. If it will be some period of time before the fix is
released is there any workaround or temporary solution?

Thanks in advance,
Randy

On Wed, Aug 10, 2016 at 4:38 PM, Randy Orr <[email protected]> wrote:

> Patrick,
>
> We are using the kernel client. We have a mix of 4.4 and 3.19 kernels on
> the client side with plans to move away from the 3.19 kernel where/when we
> can.
>
> -Randy
>
> On Wed, Aug 10, 2016 at 4:24 PM, Patrick Donnelly <[email protected]>
> wrote:
>
>> Randy, are you using ceph-fuse or the kernel client (or something else)?
>>
>> On Wed, Aug 10, 2016 at 2:33 PM, Randy Orr <[email protected]> wrote:
>> > Great, thank you. Please let me know if I can be of any assistance in
>> > testing or validating a fix.
>> >
>> > -Randy
>> >
>> > On Wed, Aug 10, 2016 at 1:21 PM, Patrick Donnelly <[email protected]>
>> > wrote:
>> >>
>> >> Hello Randy,
>> >>
>> >> On Wed, Aug 10, 2016 at 12:20 PM, Randy Orr <[email protected]>
>> wrote:
>> >> > mds/Locker.cc: In function 'bool Locker::check_inode_max_size(C
>> Inode*,
>> >> > bool,
>> >> > bool, uint64_t, bool, uint64_t, utime_t)' thread 7fc305b83700 time
>> >> > 2016-08-09 18:51:50.626630
>> >> > mds/Locker.cc: 2190: FAILED assert(in->is_file())
>> >> >
>> >> >  ceph version 10.2.1 (3a66dd4f30852819c1bdaa8ec23c795d4ad77269)
>> >> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> >> > const*)+0x8b) [0x563d1e0a2d3b]
>> >> >  2: (Locker::check_inode_max_size(CInode*, bool, bool, unsigned
>> long,
>> >> > bool,
>> >> > unsigned long, utime_t)+0x15e3) [0x563d1de506a3]
>> >> >  3: (Server::handle_client_open(std::shared_ptr<MDRequestImpl>&)
>> +0x1061)
>> >> > [0x563d1dd386a1]
>> >> >  4:
>> >> > (Server::dispatch_client_request(std::shared_ptr<MDRequestIm
>> pl>&)+0xa0b)
>> >> > [0x563d1dd5709b]
>> >> >  5: (Server::handle_client_request(MClientRequest*)+0x47f)
>> >> > [0x563d1dd5768f]
>> >> >  6: (Server::dispatch(Message*)+0x3bb) [0x563d1dd5b8db]
>> >> >  7: (MDSRank::handle_deferrable_message(Message*)+0x80c)
>> >> > [0x563d1dce1f8c]
>> >> >  8: (MDSRank::_dispatch(Message*, bool)+0x1e1) [0x563d1dceb081]
>> >> >  9: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x563d1dcec1d5]
>> >> >  10: (MDSDaemon::ms_dispatch(Message*)+0xc3) [0x563d1dcd3f83]
>> >> >  11: (DispatchQueue::entry()+0x78b) [0x563d1e1996cb]
>> >> >  12: (DispatchQueue::DispatchThread::entry()+0xd) [0x563d1e08862d]
>> >> >  13: (()+0x8184) [0x7fc30bd7c184]
>> >> >  14: (clone()+0x6d) [0x7fc30a2d337d]
>> >> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> >> > needed to
>> >> > interpret this.
>> >>
>> >> I have a bug report filed for this issue:
>> >> http://tracker.ceph.com/issues/16983
>> >>
>> >> I believe it should be straightforward to solve and we'll have a fix
>> >> for it soon.
>> >>
>> >> Thanks for the report!
>> >>
>> >> --
>> >> Patrick Donnelly
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > [email protected]
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Patrick Donnelly
>>
>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to