Hello Zack,

On Wed, Mar 6, 2019 at 1:18 PM Zack Brenton <z...@imposium.com> wrote:
>
> Hello,
>
> We're running Ceph on Kubernetes 1.12 using the Rook operator 
> (https://rook.io), but we've been struggling to scale applications mounting 
> CephFS volumes above 600 pods / 300 nodes. All our instances use the kernel 
> client and run kernel `4.19.23-coreos-r1`.
>
> We've tried increasing the MDS memory limits, running multiple active MDS 
> pods, and running different versions of Ceph (up to the latest Luminous and 
> Mimic releases), but we run into MDS_SLOW_REQUEST errors at the same scale 
> regardless of the memory limits we set. See this GitHub issue for more info 
> on what we've tried up to this point: https://github.com/rook/rook/issues/2590
>
> I've written a simple load test that reads all the files in a given directory 
> on an interval. While running this test, I've noticed that the `mds_co.bytes` 
> value (from `ceph daemon mds.myfs-a dump_mempools | jq -c 
> '.mempool.by_pool.mds_co'`) increases each time files are read. Why is this 
> number increasing after the first iteration? If the same client is reading 
> the same cached files, why would the data in the cache change at all? What is 
> `mds_co.bytes` actually reporting?
>
> My most important question is this: How do I configure Ceph to be able to 
> scale to large numbers of clients?

Please post more information about your cluster: types of devices,
`ceph osd tree`, `ceph osd df`, and `ceph osd lspools`.

There's no reason why CephFS shouldn't be able to scale to that number
of clients. The issue is probably related configuration of the
pools/MDS. From your ticket, I have a *lot* of trouble believing the
MDS still at 3GB memory usage with that number of clients and
mds_cache_memory_limit=17179869184 (16GB).

-- 
Patrick Donnelly
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to