Re: [ceph-users] Cluster always in WARN state, failing to respond to cache pressure

Cullen King Tue, 12 May 2015 13:56:55 -0700

Thanks for the suggestions Greg. One thing I forgot to mention, restarting
the main MDS service fixes the problem temporarily.


Clearing inodes and dentries on the client with "echo 2 | sudo tee
/proc/sys/vm/drop_caches" on the two cephfs clients that were failing to
respond to cache pressure fixed the warning. Additionally, I realized my
ceph clients were using 0.87.1 while my cluster is on 0.94.1-111. I've
updated all the clients and remounted my cephfs shares and will cross my
fingers that it resolves the issue.

Thanks for the help!

On Tue, May 12, 2015 at 12:55 PM, Gregory Farnum <[email protected]> wrote:

> On Tue, May 12, 2015 at 12:03 PM, Cullen King <[email protected]>
> wrote:
> > I'm operating a fairly small ceph cluster, currently three nodes (with
> plans
> > to expand to five in the next couple of months) with more than adequate
> > hardware. Node specs:
> >
> > 2x Xeon E5-2630
> > 64gb ram
> > 2x RAID1 SSD for system
> > 2x 256gb SSDs for journals
> > 4x 4tb drives for OSDs
> > 1GbE for frontend (shared with rest of my app servers, etc)
> > 10GbE switch for cluster (only used for ceph storage nodes)
> >
> > I am using CephFS along with the object store w/ RadosGW in front of it.
> My
> > problems existed when using only CephFS. I use CephFS as a shared
> datastore
> > for two low volume OSM map tile servers to have a shared tile cache.
> Usage
> > isn't heavy, it's mostly read. Here's a typical output from ceph status:
> >
> > https://gist.github.com/kingcu/499c3d9373726e5c7a95
> >
> > Here's my current ceph.conf:
> >
> > https://gist.github.com/kingcu/78ab0fe8669b7acb120c
> >
> > I've upped the mds cache size as recommended by some historical
> > correspondence on the mailing list, which helped for a while. There
> doesn't
> > seem to be any real issue with the cluster operating in this WARN state,
> as
> > it has been in production for a couple months now without issue. I'm
> > starting to migrate other data into the Ceph cluster, and before making
> the
> > final plunge with critical data, wanted to get a handle on this issue.
> > Suggestions are appreciated!
>
> This warning has come up several times recently. It means that the MDS
> has exceeded its specified cache size and has asked the clients to
> return inodes/dentries, but they have not done so.
>
> This either means that the clients are broken in some way (unlikely if
> you're using the same Ceph release on both), or that the
> kernel/applications are pinning so many dentries that the client can't
> drop any of them from cache. You could try telling it to dump caches
> and see if that releases stuff.
>
> There's a new PR to make ceph-fuse be more aggressive about trying to
> make the kernel dump stuff out
> (https://github.com/ceph/ceph/pull/4653) but it just got created so
> I'm not sure whether that will solve this problem or not.
>
>
> Note that if your MDS isn't using too much memory this is probably not
> going to be an issue for you, despite the WARN state.
> -Greg
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cluster always in WARN state, failing to respond to cache pressure

Reply via email to