Hi,

On 03.11.18 10:31, jes...@krogh.cc wrote:
I suspect that mds asked client to trim its cache. Please run
following commands on an idle client.
In the mean time - we migrated to the RH Ceph version and deliered the MDS
both SSD's and more memory and the problem went away.

It still puzzles my mind a bit - why is there a connection between the
"client page cache" and the MDS server performance/etc. The only argument
I can find is that if the MDS cannot cache data, then and it need to go
back and get metadata from the Ceph metadata poll then it exposes
data as "new" to the clients, despite it being the same. - if that is
the case, then I would say there is a significant room for performance
optimization here.

CephFS is a distributed system, so there's a bookkeeping about every file in use by any CephFS client. These entities are 'capabilities'; they also implement stuff like distributed locking.


The MDS has to cache every capability it has assigned to a CephFS client, in addition to the cache for inode information and other stuff. The cache size is limited to control the memory consumption of the MDS process. If a MDS is running out of cache, it tries to revoke capabilities assigned to CephFS clients to free some memory for new capabilities. This revoke process runs asynchronous from MDS to CephFS client, similar to NFS delegation.


If a CephFS client receive a cap release request and it is able to perform it (no processes accessing the file at the moment), the client cleaned up its internal state and allows the MDS to release the cap. This cleanup also involves removing file data from the page cache.


If your MDS was running with a too small cache size, it had to revoke caps over and over to adhere to its cache size, and the clients had to cleanup their cache over and over, too.


You did not mention any details about the MDS settings, especially the cache size. I assume you increased the cache size after adding more memory, since the problem seems to be solved now.


It actually is not solved, but only mitigated. If your working set size increases or the number of clients increases, the MDS has to manage more caps and will have to revoke caps more often. You will probably reach an equilibrium at some point. The MDS is the most memory hungry part of Ceph, and it often caught people by surprise. We had the same problem in our setup; even worse the nightly backup is also trashing the MDS cache.


The best way to monitor the MDS is using the 'ceph daemonperf mds.XYZ' command on the MDS host. It gives you the current performance counters including the inode and caps count. Our MDS is configured with a 40 GB cache size and currently has 15 million inodes cached and is managing 3.1 million capabilities.


TL;DR: MDS needs huge amounts of memory for its internal bookkeeping.


Hope this helps.


Regards,

Burkhard



If you can reproduce this issue. please send kernel log to us.
Will do if/when it reappears.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to