>     Hi,
>     There have been numerous threads about this in the past, but I wanted to
>     bring this up again in a new situation.
>     Running with Luminous v12.2.4 I'm seeing some odd Memory and CPU usage
>     when using the ceph-fuse client to mount a multi-MDS CephFS filesystem.
>         health: HEALTH_OK
>       services:
>         mon: 3 daemons, quorum luvil,sanomat,tide
>         mgr: luvil(active), standbys: tide, sanomat
>         mds: svw-2/2/2 up  {0=luvil=up:active,1=tide=up:active}, 1
>     up:standby
>         osd: 112 osds: 111 up, 111 in
>       data:
>         pools:   2 pools, 4352 pgs
>         objects: 85549k objects, 4415 GB
>         usage:   50348 GB used, 772 TB / 821 TB avail
>         pgs:     4352 active+clean
>     After running a rsync with millions of files (and some directories
>     having 1M files) a ceph-fuse process was using 44GB RSS and using
>     between 100% and 200% CPU usage.
>     Looking at this FUSE client through the admin socket the objecter was
>     one of my first suspects, but it claimed to only use ~300M of data in
>     it's case spread out over tens of thousands of files.
>     After unmounting and mounting again the Memory usage was gone and we
>     tried the rsync again, but it wasn't reproducible.
>     The CPU usage however is, a "simple" rsync would cause ceph-fuse to use
>     up to 100% CPU.
>     Switching to the kernel client (4.16 kernel) seems to solve this, but
>     the reason for using ceph-fuse in this would be the lack of a recent
>     kernel in Debian 9 in this case and the easiness to upgrade the FUSE
>     client.
>     I've tried to disable all logging inside the FUSE client, but that
>     didn't help.
>     When checking on the FUSE client's socket I saw that rename() operations
>     were hanging and that's something which rsync does a lot.
>     At the same time I saw a getfattr() being done to the same inode by the
>     FUSE client, but to a different MDS:
>     rename(): mds rank 0
>     getfattr: mds rank 1
>     Although the kernel client seems to perform better it has the same
>     behavior when looking at the mdsc file in /sys
>     216729  mds0    create  (unsafe)
>     #100021abbd9/.ddd.010236269.mpeg21.a0065.folia.xml.gz.AuxBQj
>     (reddata2/.ddd.010236269.mpeg21.a0065.folia.xml.gz.AuxBQj)
>     216731  mds1    rename 
>      #100021abbd9/ddd.010236269.mpeg21.a0065.folia.xml.gz
>     (reddata2/ddd.010236269.mpeg21.a0065.folia.xml.gz)
>     #100021abbd9/.ddd.010236269.mpeg21.a0065.folia.xml.gz.AuxBQj
>     (reddata2/.ddd.010236269.mpeg21.a0065.folia.xml.gz.AuxBQj)
>     So this is rsync talking to two MDS, one for a create and one for a
>     rename.
>     Is this normal? Is this expected behavior?
> If the directory got large enough to be sharded across MDSes, yes, it's
> expected behavior. There are filesystems that attempt to recognize rsync
> and change their normal behavior specifically to deal with this case,
> but CephFS isn't one of them (yet, anyway).

Yes, that directory is rather large.

I've set max_mds to 1 for now and suddenly both FUSE and the kclient are
a lot after, not 10% but something like 80 to 100% faster.

It seems like that directory was being balanced between two MDS and that
caused a 'massive' slow down.

This can probably be influenced by tuning the MDS balancer settings, but
I am not sure yet where to start, any suggestions?


> Not sure about the specifics of the client memory or CPU usage; I think
> you'd have to profile. rsync is a pretty pessimal CephFS workload though
> and I think I've heard about this before...
-Greg
>     To me it seems like that possibly the Subtree Partitioning might be
>     interfering here, but it wanted to double check.
>     Apart from that the CPU and Memory usage of the FUSE client seems very
>     high and that might be related to this.
>     Any ideas?
Thanks,
>     Wido
