> -----Original Message----- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Mclean, Patrick > Sent: 08 August 2017 20:13 > To: David Turner <drakonst...@gmail.com>; ceph-us...@ceph.com > Cc: Colenbrander, Roelof <roderick.colenbran...@sony.com>; Payno, > Victor <victor.pa...@sony.com>; Yip, Rae <rae....@sony.com> > Subject: Re: [ceph-users] ceph cluster experiencing major performance issues > > On 08/08/17 10:50 AM, David Turner wrote: > > Are you also seeing osds marking themselves down for a little bit and > > then coming back up? There are 2 very likely problems > > causing/contributing to this. The first is if you are using a lot of > > snapshots. Deleting snapshots is a very expensive operation for your > > cluster and can cause a lot of slowness. The second is PG subfolder > > splitting. This will show as blocked requests and osds marking > > themselves down and coming back up a little later without any errors > > in the log. I linked a previous thread where someone was having these > > problems where both causes were investigated. > > > > https://www.mail-archive.com/ceph- > us...@lists.ceph.com/msg36923.html > > We are not seeing OSDs marking themselves down a little bit and coming > back as far as we can tell. We will do some more investigation in to this. > > We are creating and deleting quite a few snapshots, is there anything we can > do to make this less expensive? We are going to attempt to create less > snapshots in our systems, but unfortunately we have to create a fair number > due to our use case.
That's probably most likely your problem. Upgrade to 10.2.9 and enable the snap trim sleep option on your OSD's to somewhere around 0.1, it has a massive effect on snapshot removal. > > Is slow snapshot deletion likely to cause a slow backlog of purged snaps? In > some cases we are seeing ~40k snaps still in cached_removed_snaps. > > > If you have 0.94.9 or 10.2.5 or later, then you can split your PG > > subfolders sanely while your osds are temporarily turned off using the > > 'ceph-objectstore-tool apply-layout-settings'. There are a lot of > > ways to skin the cat of snap trimming, but it depends greatly on your use > case. > > We are currently running 10.2.5, and are planning to update to 10.2.9 at > some point soon. Our clients are using the 4.9 kernel RBD driver (which sort > of forces us to keep our snapshot count down below 510), we are currently > testing the possibility of using the nbd-rbd driver as an alternative. > > > On Mon, Aug 7, 2017 at 11:49 PM Mclean, Patrick > > <patrick.mcl...@sony.com <mailto:patrick.mcl...@sony.com>> wrote: > > > > High CPU utilization and inexplicably slow I/O requests > > > > We have been having similar performance issues across several ceph > > clusters. When all the OSDs are up in the cluster, it can stay HEALTH_OK > > for a while, but eventually performance worsens and becomes (at first > > intermittently, but eventually continually) HEALTH_WARN due to slow > I/O > > request blocked for longer than 32 sec. These slow requests are > > accompanied by "currently waiting for rw locks", but we have not found > > any network issue that normally is responsible for this warning. > > > > Examining the individual slow OSDs from `ceph health detail` has been > > unproductive; there don't seem to be any slow disks and if we stop the > > OSD the problem just moves somewhere else. > > > > We also think this trends with increased number of RBDs on the clusters, > > but not necessarily a ton of Ceph I/O. At the same time, user %CPU time > > spikes up to 95-100%, at first frequently and then consistently, > > simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz > CPU > > with 6 cores and 64GiB RAM per node. > > > > ceph1 ~ $ sudo ceph status > > cluster XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX > > health HEALTH_WARN > > 547 requests are blocked > 32 sec > > monmap e1: 3 mons at > > > {cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0,cephmon1. > XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XX:XXXX/0,cephmon1.XXXXXXXXXXX > XXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0} > > election epoch 16, quorum 0,1,2 > > > cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXX > XX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX > > osdmap e577122: 72 osds: 68 up, 68 in > > flags sortbitwise,require_jewel_osds > > pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091 kobjects > > 126 TB used, 368 TB / 494 TB avail > > 4084 active+clean > > 12 active+clean+scrubbing+deep > > client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr > > > > ceph1 ~ $ vmstat 5 5 > > procs -----------memory---------- ---swap-- -----io---- -system-- > > ------cpu----- > > r b swpd free buff cache si so bi bo in cs us sy > > id wa st > > 27 1 0 3112660 165544 36261692 0 0 472 1274 0 1 22 > > 1 76 1 0 > > 25 0 0 3126176 165544 36246508 0 0 858 12692 12122 110478 > > 97 2 1 0 0 > > 22 0 0 3114284 165544 36258136 0 0 1 6118 9586 118625 > > 97 2 1 0 0 > > 11 0 0 3096508 165544 36276244 0 0 8 6762 10047 188618 > > 89 3 8 0 0 > > 18 0 0 2990452 165544 36384048 0 0 1209 21170 11179 179878 > > 85 4 11 0 0 > > > > There is no apparent memory shortage, and none of the HDDs or SSDs > show > > consistently high utilization, slow service times, or any other form of > > hardware saturation, other than user CPU utilization. Can CPU starvation > > be responsible for "waiting for rw locks"? > > > > Our main pool (the one with all the data) currently has 1024 PGs, > > leaving us room to add more PGs if needed, but we're concerned if we > do > > so that we'd consume even more CPU. > > > > We have moved to running Ceph + jemalloc instead of tcmalloc, and that > > has helped with CPU utilization somewhat, but we still see occurences of > > 95-100% CPU with not terribly high Ceph workload. > > > > Any suggestions of what else to look at? We have a peculiar use case > > where we have many RBDs but only about 1-5% of them are active at the > > same time, and we're constantly making and expiring RBD snapshots. > Could > > this lead to aberrant performance? For instance, is it normal to have > > ~40k snaps still in cached_removed_snaps? > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com