Hi all, I just want to confirm that the patch works in our environment. Thanks!
On 08/30/2016 02:04 PM, Dennis Kramer (DBS) wrote: > Awesome Goncalo, that is very helpful. > > Cheers. > > On 08/30/2016 01:21 PM, Goncalo Borges wrote: >> Hi Dennis. >> >> That is the first issue we saw and has nothing to do with the amd processors >> (which only relates to the second issue we saw). So the fix in the patch >> >> https://github.com/ceph/ceph/pull/10027 >> >> should work for you. >> >> In our case we went for the full compilation for our own specific reasons. >> But you should only need to recompile the ceph fuse client. If you want a >> temp solution while this is not fixed in jewel, just deploy ceph-fuse using >> an infernalis client. That is how we did it during the 3 weeks we were >> debugging our issues. >> >> Cheers >> Goncalo >> >> ________________________________________ >> From: Dennis Kramer (DBS) [den...@holmes.nl] >> Sent: 30 August 2016 20:59 >> To: Goncalo Borges; ceph-users@lists.ceph.com >> Subject: Re: [ceph-users] ceph-fuse "Transport endpoint is not connected" on >> Jewel 10.2.2 >> >> Hi Goncalo, >> >> Thank you for providing below info. I'm getting the exact same errors: >> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >> 1: (()+0x2ae88e) [0x5647a76f488e] >> 2: (()+0x113d0) [0x7f7d14c393d0] >> 3: (Client::get_root_ino()+0x10) [0x5647a75eb730] >> 4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175) >> [0x5647a75e9595] >> 5: (()+0x1a3eb1) [0x5647a75e9eb1] >> 6: (()+0x14ef5) [0x7f7d15283ef5] >> 7: (()+0x15679) [0x7f7d15284679] >> 8: (()+0x11e38) [0x7f7d15280e38] >> 9: (()+0x76fa) [0x7f7d14c2f6fa] >> 10: (clone()+0x6d) [0x7f7d1351ab5d] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> After reading your thread I wasn't sure if your solution would work in >> our environment, since we don't use the AMD procs you mentioned. Though >> the segfaults are identical in debugging. >> >> Have you recompiled ceph completely for your cluster or just the MDS server? >> >> >> On 08/25/2016 02:45 AM, Goncalo Borges wrote: >>> Hi Dennis... >>> >>> We use ceph-fuse in 10.2.2 and we saw two main issues with it immediately >>> after >>> upgrading from Infernalis to Jewel. >>> >>> In our case, we are enabling ceph-fuse in a heavily used Linux cluster, and >>> our >>> users complained about the mount points becoming unavailable some time after >>> their applications start up. >>> >>> First we saw >>> >>> https://github.com/ceph/ceph/pull/10027 >>> >>> and once that was fixed, we saw >>> >>> http://tracker.ceph.com/issues/16610 >>> >>> >>> There is a long ML thread with the subject 'ceph-fuse segfaults ( jewel >>> 10.2.2)' >>> on the topic. At the end, RH staff proposed some patches which we applied >>> (we >>> recompile ceph ourselves) and which resolved the issues we saw. >>> >>> You should run ceph-fuse in debug mode to actually check what segfaults you >>> may >>> have, and if it is a similar problem. You can do that by mounting ceph-fuse >>> with >>> nohup and the '-d'. Something like: >>> >>> nohup ceph-fuse --id mount_user -k <path to you key> -m <mon ip>:6789 -d -r >>> /cephfs /coepp/cephfs > /path/to/some/log 2>&1 & >>> >>> If you want an even bigger log level, you should set 'debug client = 20' in >>> your >>> /etc/ceph/ceph.conf before mounting. >>> >>> >>> Cheers >>> Goncalo >>> >>> On 08/24/2016 10:28 PM, Dennis Kramer (DT) wrote: >>>> Hi all, >>>> >>>> Running ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) on >>>> Ubuntu 16.04LTS. >>>> >>>> Currently I have the weirdest thing, I have a bunch of linux clients, >>>> mostly >>>> debian based (Ubuntu/Mint). They all use version 10.2.2 of ceph-fuse. I'm >>>> running cephfs since Hammer without any issues, but upgraded last week to >>>> Jewel and now my clients get: >>>> "Transport endpoint is not connected". >>>> >>>> It seems the error only arises when the client is using the GUI when they >>>> browse through the ceph-fuse mount, some use nemo, some nautilus. The error >>>> doesnt show up immediatly, sometimes the client can browse through the >>>> share >>>> for some time before they are kicked out with the error. >>>> >>>> But when I strictly use the shell to browse the ceph-fuse mount in the CLI >>>> it >>>> works without any issues, when I try to use the GUI browser on the same >>>> client, the error shows and I get kicked out of the ceph-fuse mount until I >>>> remount. >>>> >>>> Any suggestions? >>>> >>>> With regards, >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> -- >>> Goncalo Borges >>> Research Computing >>> ARC Centre of Excellence for Particle Physics at the Terascale >>> School of Physics A28 | University of Sydney, NSW 2006 >>> T: +61 2 93511937 >>> >> >> -- _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com