Thanks for the suggestions Greg. One thing I forgot to mention, restarting the main MDS service fixes the problem temporarily.
Clearing inodes and dentries on the client with "echo 2 | sudo tee /proc/sys/vm/drop_caches" on the two cephfs clients that were failing to respond to cache pressure fixed the warning. Additionally, I realized my ceph clients were using 0.87.1 while my cluster is on 0.94.1-111. I've updated all the clients and remounted my cephfs shares and will cross my fingers that it resolves the issue. Thanks for the help! On Tue, May 12, 2015 at 12:55 PM, Gregory Farnum <[email protected]> wrote: > On Tue, May 12, 2015 at 12:03 PM, Cullen King <[email protected]> > wrote: > > I'm operating a fairly small ceph cluster, currently three nodes (with > plans > > to expand to five in the next couple of months) with more than adequate > > hardware. Node specs: > > > > 2x Xeon E5-2630 > > 64gb ram > > 2x RAID1 SSD for system > > 2x 256gb SSDs for journals > > 4x 4tb drives for OSDs > > 1GbE for frontend (shared with rest of my app servers, etc) > > 10GbE switch for cluster (only used for ceph storage nodes) > > > > I am using CephFS along with the object store w/ RadosGW in front of it. > My > > problems existed when using only CephFS. I use CephFS as a shared > datastore > > for two low volume OSM map tile servers to have a shared tile cache. > Usage > > isn't heavy, it's mostly read. Here's a typical output from ceph status: > > > > https://gist.github.com/kingcu/499c3d9373726e5c7a95 > > > > Here's my current ceph.conf: > > > > https://gist.github.com/kingcu/78ab0fe8669b7acb120c > > > > I've upped the mds cache size as recommended by some historical > > correspondence on the mailing list, which helped for a while. There > doesn't > > seem to be any real issue with the cluster operating in this WARN state, > as > > it has been in production for a couple months now without issue. I'm > > starting to migrate other data into the Ceph cluster, and before making > the > > final plunge with critical data, wanted to get a handle on this issue. > > Suggestions are appreciated! > > This warning has come up several times recently. It means that the MDS > has exceeded its specified cache size and has asked the clients to > return inodes/dentries, but they have not done so. > > This either means that the clients are broken in some way (unlikely if > you're using the same Ceph release on both), or that the > kernel/applications are pinning so many dentries that the client can't > drop any of them from cache. You could try telling it to dump caches > and see if that releases stuff. > > There's a new PR to make ceph-fuse be more aggressive about trying to > make the kernel dump stuff out > (https://github.com/ceph/ceph/pull/4653) but it just got created so > I'm not sure whether that will solve this problem or not. > > > Note that if your MDS isn't using too much memory this is probably not > going to be an issue for you, despite the WARN state. > -Greg >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
