Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

Gregory Farnum Thu, 14 May 2015 10:49:33 -0700

On Thu, May 14, 2015 at 10:15 AM, Francois Lafont <flafdiv...@free.fr> wrote:
> Hi,
>
> I had a problem with a cephfs freeze in a client. Impossible to
> re-enable the mountpoint. A simple "ls /mnt" command totally
> blocked (of course impossible to umount-remount etc.) and I had
> to reboot the host. But even a "normal" reboot didn't work, the
> host didn't stop. I had to do a hard reboot of the host. In brief,
> it was like a big "NFS" freeze. ;)
>
> In the logs, nothing relevant in the client side and just this line
> in the cluster side:
>
>     ~# cat /var/log/ceph/ceph-mds.1.log
>     [...]
>     2015-05-14 17:07:17.259866 7f3b5cffc700  0 log_channel(cluster) log [INF] 
> : closing stale session client.1342358 192.168.21.207:0/519924348 after 
> 301.329013
>     [...]
>
> And indeed, the freeze was probably triggered by a little network
> interruption.
>
> Here is my configuration:
> - OS: Ubuntu 14.04 in the client and in the cluster nodes.
> - Kernel: 3.16.0-36-generic in the client and in the cluster nodes.
>   (apt-get install linux-image-generic-lts-utopic).
> - Ceph version: Hammer in the client and in cluster nodes (0.94.1-1trusty).
>
> In the client, I use the cephfs kernel module (not ceph-fuse). Here
> is the fstab line in the client node:
>
>     10.0.2.150,10.0.2.151,10.0.2.152:/ /mnt ceph 
> noatime,noacl,name=cephfs,secretfile=/etc/ceph/secret,_netdev 0 0
>
> My only configuration concerning mds in ceph.conf is just:
>
>   mds cache size = 1000000
>
> That's all.
>
> Here are my questions:
>
> 1. Is this kind of freeze normal? Can I avoid these freezes with a
> more recent version of the kernel in the client?


Yes, it's normal. Although you should have been able to do a lazy
and/or force umount. :)
You can't avoid the freeze with a newer client. :(

If you notice the problem quickly enough, you should be able to
reconnect everything by rebooting the MDS — although if the MDS hasn't
failed the client then things shouldn't be blocking, so actually that
probably won't help you.


> 2. Can I avoid these freezes with ceph-fuse instead of the kernel
> cephfs module? But in this case, the cephfs performance will be
> worse. Am I wrong?

No, ceph-fuse will suffer the same blockage, although obviously in
userspace it's a bit easier to clean up. Depending on your workload it
will be slightly faster to a lot slower. Though you'll also get
updates faster/more easily. ;)

> 3. Is there a parameter in ceph.conf to tell mds to be more patient
> before closing the "stale session" of a client?

Yes. You'll need to increase the "mds session timeout" value on the
MDS; it currently defaults to 60 seconds. You can increase that to
whatever values you like. The tradeoff here is that if you have a
client die, anything it had "capabilities' on (for read/write access)
will be unavailable for anybody who's doing something that might
conflict with those capabilities.
If you've got a new enough MDS (Hammer, probably, but you can check)
then you can use the admin socket to boot specific sessions, so it may
suit you to set very large timeouts and manually zap any client which
actually goes away badly (rather than getting disconnected by the
network).

>
> I'm in a testing period and a hard reboot of my cephfs clients would
> be quite annoying for me. Thanks in advance for your help.

Yeah. Unfortunately there's a basic tradeoff in strictly-consistent
(aka POSIX) network filesystems here: if the network goes away, you
can't be consistent any more because the disconnected client can make
conflicting changes. And you can't tell exactly when the network
disappeared.

So while we hope to make this less painful in the future, the network
dying that badly is a failure case that you need to be aware of
meaning that the client might have conflicting information. If it
*does* have conflicting info, the best we can do about it is be
polite, return a bunch of error codes, and unmount gracefully. We'll
get there eventually but it's a lot of work.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Complete freeze of a cephfs client (unavoidable hard reboot)

Reply via email to