We have half a dozen clusters of varying sizes and all of them have high memory usage on the mons every 1-3 months. I've thought about opening a ticket with Ceph Enterprise support or bringing it up here, but there's no way for us to really get logs on it because we can't run with high logging for multiple months and we can't tell which of our clusters is going to have the issue next. We've seen this on 0.94.5 and 0.94.7.
We've noticed that the memory usage is either high on the primary mon or all of the secondary mons. I've never seen high memory usage on all primary and secondary mons at once. Our fix has been to monitor memory usage on the server and restart the mon processes for the entire cluster when one of them spikes. ________________________________ [cid:[email protected]]<https://storagecraft.com> David Turner | Cloud Operations Engineer | StorageCraft Technology Corporation<https://storagecraft.com> 380 Data Drive Suite 300 | Draper | Utah | 84020 Office: 801.871.2760 | Mobile: 385.224.2943 ________________________________ If you are not the intended recipient of this message or received it erroneously, please notify the sender and delete it, together with any attachments, and be advised that any dissemination or copying of this message is prohibited. ________________________________ ________________________________________ From: ceph-users [[email protected]] on behalf of mj [[email protected]] Sent: Friday, November 04, 2016 7:06 AM To: [email protected] Subject: [ceph-users] suddenly high memory usage for ceph-mon process Hi, Running ceph 0.94.9 on jessie (proxmox), three hosts, 4 OSDs per host, ssd journal, 10G cluster network. Hosts have 65G ram. The cluster is generally not very buzy. Suddenly we were getting HEALTH_WRN today, with two osd's (both on the same server) being slow. Looking into this, we noticed very high memory usage on that host: 75% memory for ceph-mon! (normally here ceph-mon uses around 1% - 2%) I restarted ceph-mon on that host, and that seems to have brought things back to normal immediately. I don't see anything out of the ordinary in /var/log/syslog on that server, and also generally the cluster is HEALTH_OK. No changes to configs lately (last many weeks) and last time I applied updates and rebooted is 30 days ago. No idea what could have caused this. Any ideas what to check, where to look? What would typically cause such high memory usage for the ceph-mon process? MJ _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
