Ok, I finally got the cluster back to HEALTH_OK. After rebooting the whole cluster didn't fix the problem, I did a:

  ceph osd set noscrub
  ceph osd set nodeep-scrub

That made the "slow metadata IOs" and "behind on trimming" warnings go away, replaced by "noscrub, nodeep-scrub flag(s) set". When all the pgs were active+clean, I did:

  ceph osd unset noscub
  ceph osd unset nodeep-scrub

An now the cluster is back to HEALTH_OK.

Now to figure out what is causing the problem in the first place...

Jorge

On 6/5/19 5:33 PM, Yan, Zheng wrote:
On Thu, Jun 6, 2019 at 6:36 AM Jorge Garcia <jgar...@soe.ucsc.edu> wrote:
We have been testing a new installation of ceph (mimic 13.2.2) mostly
using cephfs (for now). The current test is just setting up a filesystem
for backups of our other filesystems. After rsyncing data for a few
days, we started getting this from ceph -s:

health: HEALTH_WARN
              1 MDSs report slow metadata IOs
              1 MDSs behind on trimming

I have been googling for solutions and reading the docs and the
ceph-users list, but I haven't found a way to get rid of these messages
and get back to HEALTH_OK. Some of the things I have tried (from
suggestions around the internet):

- Increasing the amount of RAM on the MDS server (Currently 192 GB)
- Increasing mds_log_max_segments (Currently 256)
- Increasing mds_cache_memory_limit

The message still reports a HEALTH_WARN. Currently, the filesystem is
idle, no I/O happening. Not sure what to try next. Any suggestions?

maybe mds is trimming its log. please check if mds' cpu usage and
whole cluster's IO stats.

Thanks in advance!

Jorge

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to