Re: [ceph-users] ceph cluster experiencing major performance issues

Nick Fisk Tue, 08 Aug 2017 14:06:59 -0700


> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Mclean, Patrick
> Sent: 08 August 2017 20:13
> To: David Turner <drakonst...@gmail.com>; ceph-us...@ceph.com
> Cc: Colenbrander, Roelof <roderick.colenbran...@sony.com>; Payno,
> Victor <victor.pa...@sony.com>; Yip, Rae <rae....@sony.com>
> Subject: Re: [ceph-users] ceph cluster experiencing major performance
issues
> 
> On 08/08/17 10:50 AM, David Turner wrote:
> > Are you also seeing osds marking themselves down for a little bit and
> > then coming back up?  There are 2 very likely problems
> > causing/contributing to this.  The first is if you are using a lot of
> > snapshots.  Deleting snapshots is a very expensive operation for your
> > cluster and can cause a lot of slowness.  The second is PG subfolder
> > splitting.  This will show as blocked requests and osds marking
> > themselves down and coming back up a little later without any errors
> > in the log.  I linked a previous thread where someone was having these
> > problems where both causes were investigated.
> >
> > https://www.mail-archive.com/ceph-
> us...@lists.ceph.com/msg36923.html
> 
> We are not seeing OSDs marking themselves down a little bit and coming
> back as far as we can tell. We will do some more investigation in to this.
> 
> We are creating and deleting quite a few snapshots, is there anything we
can
> do to make this less expensive? We are going to attempt to create less
> snapshots in our systems, but unfortunately we have to create a fair
number
> due to our use case.


That's probably most likely your problem. Upgrade to 10.2.9 and enable the
snap trim sleep option on your OSD's to somewhere around 0.1, it has a
massive effect on snapshot removal.

> 
> Is slow snapshot deletion likely to cause a slow backlog of purged snaps?
In
> some cases we are seeing ~40k snaps still in cached_removed_snaps.
> 
> > If you have 0.94.9 or 10.2.5 or later, then you can split your PG
> > subfolders sanely while your osds are temporarily turned off using the
> > 'ceph-objectstore-tool apply-layout-settings'.  There are a lot of
> > ways to skin the cat of snap trimming, but it depends greatly on your
use
> case.
> 
> We are currently running 10.2.5, and are planning to update to 10.2.9 at
> some point soon. Our clients are using the 4.9 kernel RBD driver (which
sort
> of forces us to keep our snapshot count down below 510), we are currently
> testing the possibility of using the nbd-rbd driver as an alternative.
> 
> > On Mon, Aug 7, 2017 at 11:49 PM Mclean, Patrick
> > <patrick.mcl...@sony.com <mailto:patrick.mcl...@sony.com>> wrote:
> >
> >     High CPU utilization and inexplicably slow I/O requests
> >
> >     We have been having similar performance issues across several ceph
> >     clusters. When all the OSDs are up in the cluster, it can stay
HEALTH_OK
> >     for a while, but eventually performance worsens and becomes (at
first
> >     intermittently, but eventually continually) HEALTH_WARN due to slow
> I/O
> >     request blocked for longer than 32 sec. These slow requests are
> >     accompanied by "currently waiting for rw locks", but we have not
found
> >     any network issue that normally is responsible for this warning.
> >
> >     Examining the individual slow OSDs from `ceph health detail` has
been
> >     unproductive; there don't seem to be any slow disks and if we stop
the
> >     OSD the problem just moves somewhere else.
> >
> >     We also think this trends with increased number of RBDs on the
clusters,
> >     but not necessarily a ton of Ceph I/O. At the same time, user %CPU
time
> >     spikes up to 95-100%, at first frequently and then consistently,
> >     simultaneously across all cores. We are running 12 OSDs on a 2.2 GHz
> CPU
> >     with 6 cores and 64GiB RAM per node.
> >
> >     ceph1 ~ $ sudo ceph status
> >         cluster XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
> >          health HEALTH_WARN
> >                 547 requests are blocked > 32 sec
> >          monmap e1: 3 mons at
> >
> {cephmon1.XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0,cephmon1.
> XXXXXXXXXXXXXXXXXXXXXXX=XXX.XXX.XXX.XX:XXXX/0,cephmon1.XXXXXXXXXXX
> XXXXXXXXXXXX=XXX.XXX.XXX.XXX:XXXX/0}
> >                 election epoch 16, quorum 0,1,2
> >
> cephmon1.XXXXXXXXXXXXXXXXXXXXXXX,cephmon1.XXXXXXXXXXXXXXXXXXXXX
> XX,cephmon1.XXXXXXXXXXXXXXXXXXXXXXX
> >          osdmap e577122: 72 osds: 68 up, 68 in
> >                 flags sortbitwise,require_jewel_osds
> >           pgmap v6799002: 4096 pgs, 4 pools, 13266 GB data, 11091
kobjects
> >                 126 TB used, 368 TB / 494 TB avail
> >                     4084 active+clean
> >                       12 active+clean+scrubbing+deep
> >       client io 113 kB/s rd, 11486 B/s wr, 135 op/s rd, 7 op/s wr
> >
> >     ceph1 ~ $ vmstat 5 5
> >     procs -----------memory---------- ---swap-- -----io---- -system--
> >     ------cpu-----
> >      r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us
sy
> >     id wa st
> >     27  1      0 3112660 165544 36261692    0    0   472  1274    0    1
22
> >     1 76  1  0
> >     25  0      0 3126176 165544 36246508    0    0   858 12692 12122
110478
> >     97  2  1  0  0
> >     22  0      0 3114284 165544 36258136    0    0     1  6118 9586
118625
> >     97  2  1  0  0
> >     11  0      0 3096508 165544 36276244    0    0     8  6762 10047
188618
> >     89  3  8  0  0
> >     18  0      0 2990452 165544 36384048    0    0  1209 21170 11179
179878
> >     85  4 11  0  0
> >
> >     There is no apparent memory shortage, and none of the HDDs or SSDs
> show
> >     consistently high utilization, slow service times, or any other form
of
> >     hardware saturation, other than user CPU utilization. Can CPU
starvation
> >     be responsible for "waiting for rw locks"?
> >
> >     Our main pool (the one with all the data) currently has 1024 PGs,
> >     leaving us room to add more PGs if needed, but we're concerned if we
> do
> >     so that we'd consume even more CPU.
> >
> >     We have moved to running Ceph + jemalloc instead of tcmalloc, and
that
> >     has helped with CPU utilization somewhat, but we still see
occurences of
> >     95-100% CPU with not terribly high Ceph workload.
> >
> >     Any suggestions of what else to look at? We have a peculiar use case
> >     where we have many RBDs but only about 1-5% of them are active at
the
> >     same time, and we're constantly making and expiring RBD snapshots.
> Could
> >     this lead to aberrant performance? For instance, is it normal to
have
> >     ~40k snaps still in cached_removed_snaps?
> >
> >
> >
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph cluster experiencing major performance issues

Reply via email to