Re: [ceph-users] Mount of CephFS hangs

John Spray Thu, 01 Dec 2016 02:24:07 -0800

(Copying list back in)


On Thu, Dec 1, 2016 at 10:22 AM, John Spray <[email protected]> wrote:
> On Wed, Nov 30, 2016 at 3:48 PM, Jens Offenbach <[email protected]> wrote:
>> Thanks a lot... "ceph daemon mds.<id> session ls" was a good starting point.
>>
>> What is happening:
>> I am in an OpenStack environment and start a VM. Afterwards, I mount a 
>> Manila share via ceph-fuse. I get a new client session in state "open" on 
>> the MDS node. Everything looks fine so far. The problem arises when you add 
>> or remove a floating IP from the VM. Now you get a "stale" connection that 
>> is removed after a default timeout of 300 seconds. Unfortunately, the 
>> connection is never re-established and accessing the mount causes everything 
>> to hang indefinitely.
>>
>> Is there a way to handle this use case transparently for the user in 
>> ceph-fuse? Is it possible to re-establish a connection to the MDS?
>
> Bouncing the TCP connection should never kill anything as long as it
> comes back within the session timeout.  It sounds like adding/removing
> your virtual IP is somehow breaking something on the client nodes
> outside of Ceph.
>
> You could try seeing if this affects other applications by leaving
> e.g. a SSH connection open while you change your network
> configuration, and see if that dies too.
>
> John
>
>> Regards,
>> Jens
>>
>>
>> Gesendet: Mittwoch, 30. November 2016 um 12:38 Uhr
>> Von: "John Spray" <[email protected]>
>> An: "Jens Offenbach" <[email protected]>
>> Cc: "[email protected]" <[email protected]>
>> Betreff: Re: [ceph-users] Mount of CephFS hangs
>> On Wed, Nov 30, 2016 at 6:39 AM, Jens Offenbach <[email protected]> wrote:
>>> Hi,
>>> I am confronted with a persistent problem during mounting of the CephFS. I 
>>> am using Ubuntu 16.04 and solely ceph-fuse. The CephFS gets mounted by 
>>> muliple machines and very ofen (not always, but in most cases) the mount 
>>> process hangs and does not continue. "df -h" also hangs and nothing 
>>> happens. Everything seems to be up an running. On the MDS side, I have the 
>>> following in the log:
>>>
>>> 2016-11-30 07:09:57.208736 7f680bc65700 0 -- 10.30.200.141:6800/1358 >> 
>>> 10.30.216.130:0/3257360291 pipe(0x5572ceaa0800 sd=22 :6800 s=2 pgs=2 cs=1 
>>> l=0 c=0x5572c4843500).fault with nothing to send, going to standby
>>> 2016-11-30 07:10:27.833523 7f6812394700 0 log_channel(cluster) log [WRN] : 
>>> 1 slow requests, 1 included below; oldest blocked for > 30.532515 secs
>>> 2016-11-30 07:10:27.833631 7f6812394700 0 log_channel(cluster) log [WRN] : 
>>> slow request 30.532515 seconds old, received at 2016-11-30 07:09:57.300940: 
>>> client_request(client.588036:6 setattr uid=3001 gid=3001 #10000000851 
>>> 2016-11-30 07:09:57.302543) currently failed to xlock, waiting
>>> 2016-11-30 07:14:52.841056 7f6812394700 0 log_channel(cluster) log [INF] : 
>>> closing stale session client.641683 10.30.216.130:0/3257360291 after 
>>> 300.119493
>>> 2016-11-30 07:17:02.844691 7f6812394700 0 log_channel(cluster) log [INF] : 
>>> closing stale session client.588036 10.30.216.130:0/1984817088 after 
>>> 304.588537
>>> 2016-11-30 07:17:02.859557 7f6809740700 0 -- 10.30.200.141:6800/1358 >> 
>>> 10.30.216.130:0/1984817088 pipe(0x5572ce891400 sd=23 :6800 s=0 pgs=0 cs=0 
>>> l=0 c=0x5572c4843080).accept we reset (peer sent cseq 2), sending 
>>> RESETSESSION
>>> 2016-11-30 07:17:18.344852 7f6809740700 0 -- 10.30.200.141:6800/1358 >> 
>>> 10.30.216.130:0/1984817088 pipe(0x5572ce891400 sd=23 :6800 s=2 pgs=4 cs=1 
>>> l=0 c=0x5572c4843080).reader missed message? skipped from seq 0 to 114931623
>>
>> Which of those clients is the one mounting? You can resolve the
>> "client.588036" etc to hostnames by looking at the "ceph daemon
>> mds.<id> session ls" output on the MDS node.
>>
>> Also, what version of Ceph is install on the servers and clients?
>>
>> John
>>
>>> It seems to be a network issue, but the network is up and running without 
>>> any failuries.
>>>
>>> What can I do to solve this issue and to make the mount process more 
>>> reliable?
>>>
>>> Thanks a lot,
>>> Jens
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Mount of CephFS hangs

Reply via email to