Are you also seeing osds marking themselves down for a little bit and then coming back up? There are 2 very likely problems causing/contributing to this. The first is if you are using a lot of snapshots. Deleting snapshots is a very expensive operation for your cluster and can cause a lot of slowness. The second is PG subfolder splitting. This will show as blocked requests and osds marking themselves down and coming back up a little later without any errors in the log. I linked a previous thread where someone was having these problems where both causes were investigated.
https://www.mail-archive.com/[email protected]/msg36923.html If you have 0.94.9 or 10.2.5 or later, then you can split your PG subfolders sanely while your osds are temporarily turned off using the 'ceph-objectstore-tool apply-layout-settings'. There are a lot of ways to skin the cat of snap trimming, but it depends greatly on your use case. On Mon, Aug 7, 2017 at 11:49 PM Mclean, Patrick <[email protected]> wrote: > High CPU utilization and inexplicably slow I/O requests > > We have been having similar performance issues across several ceph > clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK > for a while, but eventually performance worsens and becomes (at first > intermittently, but eventually continually) HEALTH_WARN due to slow I/O > request blocked for longer than 32 sec. These slow requests are > accompanied by "currently waiting for rw locks", but we have not found > any network issue that normally is responsible for this warning. > > Examining the individual slow OSDs from `ceph health detail` has been > unproductive; there don't seem to be any slow disks and if we stop the > OSD the problem just moves somewhere else. > > We also think this trends with increased number of RBDs on the clusters, > but not necessarily a ton of Ceph I/O. At the same time, user %CPU time > spikes up to 95-100%, at first frequently and then consistently, > simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz CPU > with 6 cores and 64GiB RAM per node. > > ceph1 ~ $ sudo ceph status > cluster XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX > health HEALTH_WARN > 547 requests are blocked > 32 sec > monmap e1: 3 mons at > > {cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XX:XXXX/0,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0} > election epoch 16, quorum 0,1,2 > > cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX > osdmap e577122: 72 osds: 68 up, 68 in > flags sortbitwise,require_jewel_osds > pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091 kobjects > 126 TB used, 368 TB / 494 TB avail > 4084 active+clean > 12 active+clean+scrubbing+deep > client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr > > ceph1 ~ $ vmstat 5 5 > procs -----------memory---------- ---swap-- -----io---- -system-- > ------cpu----- > r b swpd free buff cache si so bi bo in cs us sy > id wa st > 27 1 0 3112660 165544 36261692 0 0 472 1274 0 1 22 > 1 76 1 0 > 25 0 0 3126176 165544 36246508 0 0 858 12692 12122 110478 > 97 2 1 0 0 > 22 0 0 3114284 165544 36258136 0 0 1 6118 9586 118625 > 97 2 1 0 0 > 11 0 0 3096508 165544 36276244 0 0 8 6762 10047 188618 > 89 3 8 0 0 > 18 0 0 2990452 165544 36384048 0 0 1209 21170 11179 179878 > 85 4 11 0 0 > > There is no apparent memory shortage, and none of the HDDs or SSDs show > consistently high utilization, slow service times, or any other form of > hardware saturation, other than user CPU utilization. Can CPU starvation > be responsible for "waiting for rw locks"? > > Our main pool (the one with all the data) currently has 1024 PGs, > leaving us room to add more PGs if needed, but we're concerned if we do > so that we'd consume even more CPU. > > We have moved to running Ceph + jemalloc instead of tcmalloc, and that > has helped with CPU utilization somewhat, but we still see occurences of > 95-100% CPU with not terribly high Ceph workload. > > Any suggestions of what else to look at? We have a peculiar use case > where we have many RBDs but only about 1-5% of them are active at the > same time, and we're constantly making and expiring RBD snapshots. Could > this lead to aberrant performance? For instance, is it normal to have > ~40k snaps still in cached_removed_snaps? > > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
