Hi all,

I just want to confirm that the patch works in our environment.
Thanks!

On 08/30/2016 02:04 PM, Dennis Kramer (DBS) wrote:
> Awesome Goncalo, that is very helpful.
> 
> Cheers.
> 
> On 08/30/2016 01:21 PM, Goncalo Borges wrote:
>> Hi Dennis.
>>
>> That is the first issue we saw and has nothing to do with the amd processors 
>> (which only relates to the second issue we saw). So the fix in the patch
>>
>> https://github.com/ceph/ceph/pull/10027
>>
>> should work for you.
>>
>> In our case we went for the full compilation for our own specific reasons. 
>> But you should only need to recompile the ceph fuse client. If you want a 
>> temp solution while this is not fixed in jewel,  just deploy ceph-fuse using 
>> an infernalis client. That is how we did it during the 3 weeks we were 
>> debugging our issues. 
>>
>> Cheers
>> Goncalo
>>
>> ________________________________________
>> From: Dennis Kramer (DBS) [den...@holmes.nl]
>> Sent: 30 August 2016 20:59
>> To: Goncalo Borges; ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] ceph-fuse "Transport endpoint is not connected" on 
>> Jewel 10.2.2
>>
>> Hi Goncalo,
>>
>> Thank you for providing below info. I'm getting the exact same errors:
>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>>  1: (()+0x2ae88e) [0x5647a76f488e]
>>  2: (()+0x113d0) [0x7f7d14c393d0]
>>  3: (Client::get_root_ino()+0x10) [0x5647a75eb730]
>>  4: (CephFuse::Handle::make_fake_ino(inodeno_t, snapid_t)+0x175)
>> [0x5647a75e9595]
>>  5: (()+0x1a3eb1) [0x5647a75e9eb1]
>>  6: (()+0x14ef5) [0x7f7d15283ef5]
>>  7: (()+0x15679) [0x7f7d15284679]
>>  8: (()+0x11e38) [0x7f7d15280e38]
>>  9: (()+0x76fa) [0x7f7d14c2f6fa]
>>  10: (clone()+0x6d) [0x7f7d1351ab5d]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed to interpret this.
>>
>> After reading your thread I wasn't sure if your solution would work in
>> our environment, since we don't use the AMD procs you mentioned. Though
>> the segfaults are identical in debugging.
>>
>> Have you recompiled ceph completely for your cluster or just the MDS server?
>>
>>
>> On 08/25/2016 02:45 AM, Goncalo Borges wrote:
>>> Hi Dennis...
>>>
>>> We use ceph-fuse in 10.2.2 and we saw two main issues with it immediately 
>>> after
>>> upgrading from Infernalis to Jewel.
>>>
>>> In our case, we are enabling ceph-fuse in a heavily used Linux cluster, and 
>>> our
>>> users complained about the mount points becoming unavailable some time after
>>> their applications start up.
>>>
>>> First we saw
>>>
>>> https://github.com/ceph/ceph/pull/10027
>>>
>>> and once that was fixed, we saw
>>>
>>> http://tracker.ceph.com/issues/16610
>>>
>>>
>>> There is a long ML thread with the subject 'ceph-fuse segfaults ( jewel 
>>> 10.2.2)'
>>> on the topic. At the end, RH staff proposed some patches which we applied 
>>> (we
>>> recompile ceph ourselves) and which resolved the issues we saw.
>>>
>>> You should run ceph-fuse in debug mode to actually check what segfaults you 
>>> may
>>> have, and if it is a similar problem. You can do that by mounting ceph-fuse 
>>> with
>>> nohup and the '-d'. Something like:
>>>
>>> nohup ceph-fuse --id mount_user -k <path to you key> -m <mon ip>:6789 -d -r
>>> /cephfs /coepp/cephfs > /path/to/some/log 2>&1 &
>>>
>>> If you want an even bigger log level, you should set 'debug client = 20' in 
>>> your
>>> /etc/ceph/ceph.conf before mounting.
>>>
>>>
>>> Cheers
>>> Goncalo
>>>
>>> On 08/24/2016 10:28 PM, Dennis Kramer (DT) wrote:
>>>> Hi all,
>>>>
>>>> Running ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) on
>>>> Ubuntu 16.04LTS.
>>>>
>>>> Currently I have the weirdest thing, I have a bunch of linux clients, 
>>>> mostly
>>>> debian based (Ubuntu/Mint). They all use version 10.2.2 of ceph-fuse. I'm
>>>> running cephfs since Hammer without any issues, but upgraded last week to
>>>> Jewel and now my clients get:
>>>> "Transport endpoint is not connected".
>>>>
>>>> It seems the error only arises when the client is using the GUI when they
>>>> browse through the ceph-fuse mount, some use nemo, some nautilus. The error
>>>> doesnt show up immediatly, sometimes the client can browse through the 
>>>> share
>>>> for some time before they are kicked out with the error.
>>>>
>>>> But when I strictly use the shell to browse the ceph-fuse mount in the CLI 
>>>> it
>>>> works without any issues, when I try to use the GUI browser on the same
>>>> client, the error shows and I get kicked out of the ceph-fuse mount until I
>>>> remount.
>>>>
>>>> Any suggestions?
>>>>
>>>> With regards,
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> --
>>> Goncalo Borges
>>> Research Computing
>>> ARC Centre of Excellence for Particle Physics at the Terascale
>>> School of Physics A28 | University of Sydney, NSW  2006
>>> T: +61 2 93511937
>>>
>>
>> --
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to