Re: [Nfs-ganesha-devel] [ceph-users] NFS-Ganesha CEPH_FSAL | potential locking issue

Jeff Layton Tue, 16 Apr 2019 11:35:25 -0700

This list has been deprecated. Please subscribe to the new devel list at 
lists.nfs-ganesha.org.
On Tue, Apr 16, 2019 at 10:36 AM David C <dcsysengin...@gmail.com> wrote:
>
> Hi All
>
> I have a single export of my cephfs using the ceph_fsal [1]. A CentOS 7 
> machine mounts a sub-directory of the export [2] and is using it for the home 
> directory of a user (e.g everything under ~ is on the server).
>
> This works fine until I start a long sequential write into the home directory 
> such as:
>
> dd if=/dev/zero of=~/deleteme bs=1M count=8096
>
> This saturates the 1GbE link on the client which is great but during the 
> transfer, apps that are accessing files in home start to lock up. Google 
> Chrome for example, which puts it's config in ~/.config/google-chrome/,  
> locks up during the transfer, e.g I can't move between tabs, as soon as the 
> transfer finishes, Chrome goes back to normal. Essentially the desktop 
> environment reacts as I'd expect if the server was to go away. I'm using the 
> MATE DE.
>
> However, if I mount a separate directory from the same export on the machine 
> [3] and do the same write into that directory, my desktop experience isn't 
> affected.
>
> I hope that makes some sense, it's a bit of a weird one to describe. This 
> feels like a locking issue to me, although I can't explain why a single write 
> into the root of a mount would affect access to other files under that same 
> mount.
>


It's not a single write. You're doing 8G worth of 1M I/Os. The server
then has to do all of those to the OSD backing store.

> [1] CephFS export:
>
> EXPORT
> {
>     Export_ID=100;
>     Protocols = 4;
>     Transports = TCP;
>     Path = /;
>     Pseudo = /ceph/;
>     Access_Type = RW;
>     Attr_Expiration_Time = 0;
>     Disable_ACL = FALSE;
>     Manage_Gids = TRUE;
>     Filesystem_Id = 100.1;
>     FSAL {
>         Name = CEPH;
>     }
> }
>
> [2] Home directory mount:
>
> 10.10.10.226:/ceph/homes/username on /homes/username type nfs4 
> (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)
>
> [3] Test directory mount:
>
> 10.10.10.226:/ceph/testing on /tmp/testing type nfs4 
> (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)
>
> Versions:
>
> Luminous 12.2.10
> nfs-ganesha-2.7.1-0.1.el7.x86_64
> nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64
>
> Ceph.conf on nfs-ganesha server:
>
> [client]
>         mon host = 10.10.10.210:6789, 10.10.10.211:6789, 10.10.10.212:6789
>         client_oc_size = 8388608000
>         client_acl_type=posix_acl
>         client_quota = true
>         client_quota_df = true
>

No magic bullets here, I'm afraid.

Sounds like ganesha is probably just too swamped with write requests
to do much else, but you'll probably want to do the legwork starting
with the hanging application, and figure out what it's doing that
takes so long. Is it some syscall? Which one?

>From there you can start looking at statistics in the NFS client to
see what's going on there. Are certain RPCs taking longer than they
should? Which ones?

Once you know what's going on with the client, you can better tell
what's going on with the server.
-- 
Jeff Layton <jlay...@poochiereds.net>


_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Re: [Nfs-ganesha-devel] [ceph-users] NFS-Ganesha CEPH_FSAL | potential locking issue

Reply via email to