Re: [ceph-users] IMPORTANT : NEED HELP : Low IOPS on hdd : MAX AVAIL Draining fast

Nikhil R Sun, 28 Apr 2019 09:29:14 -0700

Thanks Ashley.
Is there a way we could stop writes to the old osd’s and write to only the
new osd’s??


On Sun, 28 Apr 2019 at 19:21, Ashley Merrick <singap...@amerrick.co.uk>
wrote:

> It will mean you have some OSD"s that will perform better than others, but
> it won't cause any issues within CEPH.
>
> It may help you expand your cluster at the speed you need to fix the MAX
> Avail issue, however your only going to be able to backfill as fast as the
> source OSD's can handle, but writing is more intensive I can imagine the
> new OSD's will perform better during the backfill vs not having the SSD
> Journal.
>
> If you have the hardware to do it it won't hurt, however I would say you
> want to be careful playing too much with a cluster that's in a recovering
> state and having OSD's go up and down, if you end up with a new OSD failing
> due to high load it may cause you some further issues with lost objects
> e.t.c that wasn't fully replicated.
>
> So it will be at your own risk to add further OSD's..
>
> On Sun, Apr 28, 2019 at 9:40 PM Nikhil R <nikh.ravin...@gmail.com> wrote:
>
>> Thanks Paul,
>> Coming back to my question, is it a good idea to add SSD Journals for HDD
>> on a new node in an existing hdd journal and osd cluster?
>>
>>
>>
>>
>> On Sun, Apr 28, 2019 at 2:49 PM Paul Emmerich <paul.emmer...@croit.io>
>> wrote:
>>
>>> Looks like you got lots of tiny objects. By default the recovery speed
>>> on HDDs is limited to 10 objects per second (40 with DB on a SSD) per
>>> thread.
>>>
>>>
>>> Decrease osd_recovery_sleep_hdd (default 0.1) to increase
>>> recovery/backfill speed.
>>>
>>>
>>> Paul
>>>
>>> --
>>> Paul Emmerich
>>>
>>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>>
>>> croit GmbH
>>> Freseniusstr. 31h
>>> <https://maps.google.com/?q=Freseniusstr.+31h+%0D%0A81247+M%C3%BCnchen&entry=gmail&source=g>
>>> 81247 München
>>> <https://maps.google.com/?q=Freseniusstr.+31h+%0D%0A81247+M%C3%BCnchen&entry=gmail&source=g>
>>> www.croit.io
>>> Tel: +49 89 1896585 90
>>>
>>> On Sun, Apr 28, 2019 at 6:57 AM Nikhil R <nikh.ravin...@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> > I have set noout, noscrub and nodeep-scrub and the last time we added
>>> osd's we adding few at a time.
>>> > The main issue here is with IOPS where the existing osd's are not able
>>> to backfill at a higher rate - not even 1 thread during peak hours and a
>>> max of 2 threads during off-peak. We are getting more client i/o and the
>>> documents ingested are more than the space being freed up by backfilling
>>> pg's to new osd's added.
>>> > Below is our cluster health
>>> >  health HEALTH_WARN
>>> >             5221 pgs backfill_wait
>>> >             31 pgs backfilling
>>> >             1453 pgs degraded
>>> >             4 pgs recovering
>>> >             1054 pgs recovery_wait
>>> >             1453 pgs stuck degraded
>>> >             6310 pgs stuck unclean
>>> >             384 pgs stuck undersized
>>> >             384 pgs undersized
>>> >             recovery 130823732/9142530156 objects degraded (1.431%)
>>> >             recovery 2446840943/9142530156 objects misplaced (26.763%)
>>> >             noout,nobackfill,noscrub,nodeep-scrub flag(s) set
>>> >             mon.mon_1 store is getting too big! 26562 MB >= 15360 MB
>>> >             mon.mon_2 store is getting too big! 26828 MB >= 15360 MB
>>> >             mon.mon_3 store is getting too big! 26504 MB >= 15360 MB
>>> >      monmap e1: 3 mons at
>>> {mon_1=x.x.x.x:x.yyyy/0,mon_2=x.x.x.x:yyyy/0,mon_3=x.x.x.x:yyyy/0}
>>> >             election epoch 7996, quorum 0,1,2 mon_1,mon_2,mon_3
>>> >      osdmap e194833: 105 osds: 105 up, 105 in; 5931 remapped pgs
>>> >             flags
>>> noout,nobackfill,noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
>>> >       pgmap v48390703: 10536 pgs, 18 pools, 144 TB data, 2906 Mobjects
>>> >             475 TB used, 287 TB / 763 TB avail
>>> >             130823732/9142530156 objects degraded (1.431%)
>>> >             2446840943/9142530156 objects misplaced (26.763%)
>>> >                 4851 active+remapped+wait_backfill
>>> >                 4226 active+clean
>>> >                  659 active+recovery_wait+degraded+remapped
>>> >                  377 active+recovery_wait+degraded
>>> >                  357 active+undersized+degraded+remapped+wait_backfill
>>> >                   18 active+recovery_wait+undersized+degraded+remapped
>>> >                   16 active+degraded+remapped+backfilling
>>> >                   13 active+degraded+remapped+wait_backfill
>>> >                    9 active+undersized+degraded+remapped+backfilling
>>> >                    6 active+remapped+backfilling
>>> >                    2 active+recovering+degraded
>>> >                    2 active+recovering+degraded+remapped
>>> >   client io 11894 kB/s rd, 105 kB/s wr, 981 op/s rd, 72 op/s wr
>>> >
>>> > So, is it a good option to add new osd's on a new node with ssd's as
>>> journals?
>>> > in.linkedin.com/in/nikhilravindra
>>> >
>>> >
>>> >
>>> > On Sun, Apr 28, 2019 at 6:05 AM Erik McCormick <
>>> emccorm...@cirrusseven.com> wrote:
>>> >>
>>> >> On Sat, Apr 27, 2019, 3:49 PM Nikhil R <nikh.ravin...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> We have baremetal nodes 256GB RAM, 36core CPU
>>> >>> We are on ceph jewel 10.2.9 with leveldb
>>> >>> The osd’s and journals are on the same hdd.
>>> >>> We have 1 backfill_max_active, 1 recovery_max_active and 1
>>> recovery_op_priority
>>> >>> The osd crashes and starts once a pg is backfilled and the next pg
>>> tried to backfill. This is when we see iostat and the disk is utilised upto
>>> 100%.
>>> >>
>>> >>
>>> >> I would set noout to prevent excess movement in the event of OSD
>>> flapping, and disable scrubbing and deep scrubbing until your backfilling
>>> has completed. I would also bring the new OSDs online a few at a time
>>> rather than all 25 at once if you add more servers.
>>> >>
>>> >>>
>>> >>> Appreciate your help David
>>> >>>
>>> >>> On Sun, 28 Apr 2019 at 00:46, David C <dcsysengin...@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Sat, 27 Apr 2019, 18:50 Nikhil R, <nikh.ravin...@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>> Guys,
>>> >>>>> We now have a total of 105 osd’s on 5 baremetal nodes each hosting
>>> 21 osd’s on HDD which are 7Tb with journals on HDD too. Each journal is
>>> about 5GB
>>> >>>>
>>> >>>>
>>> >>>> This would imply you've got a separate hdd partition for journals,
>>> I don't think there's any value in that and would probabaly be detrimental
>>> to performance.
>>> >>>>>
>>> >>>>>
>>> >>>>> We expanded our cluster last week and added 1 more node with 21
>>> HDD and journals on same disk.
>>> >>>>> Our client i/o is too heavy and we are not able to backfill even 1
>>> thread during peak hours - incase we backfill during peak hours osd's are
>>> crashing causing undersized pg's and if we have another osd crash we wont
>>> be able to use our cluster due to undersized and recovery pg's. During
>>> non-peak we can just backfill 8-10 pgs.
>>> >>>>> Due to this our MAX AVAIL is draining out very fast.
>>> >>>>
>>> >>>>
>>> >>>> How much ram have you got in your nodes? In my experience that's a
>>> common reason for crashing OSDs during recovery ops
>>> >>>>
>>> >>>> What does your recovery and backfill tuning look like?
>>> >>>>
>>> >>>>
>>> >>>>>
>>> >>>>> We are thinking of adding 2 more baremetal nodes with 21 *7tb
>>> osd’s on  HDD and add 50GB SSD Journals for these.
>>> >>>>> We aim to backfill from the 105 osd’s a bit faster and expect
>>> writes of backfillis coming to these osd’s faster.
>>> >>>>
>>> >>>>
>>> >>>> Ssd journals would certainly help, just be sure it's a model that
>>> performs well with Ceph
>>> >>>>>
>>> >>>>>
>>> >>>>> Is this a good viable idea?
>>> >>>>> Thoughts please?
>>> >>>>
>>> >>>>
>>> >>>> I'd recommend sharing more detail e.g full spec of the nodes, Ceph
>>> version etc.
>>> >>>>>
>>> >>>>>
>>> >>>>> -Nikhil
>>> >>>>> _______________________________________________
>>> >>>>> ceph-users mailing list
>>> >>>>> ceph-users@lists.ceph.com
>>> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>>
>>> >>> --
>>> >>> Sent from my iPhone
>>> >>> _______________________________________________
>>> >>> ceph-users mailing list
>>> >>> ceph-users@lists.ceph.com
>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> --
Sent from my iPhone

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] IMPORTANT : NEED HELP : Low IOPS on hdd : MAX AVAIL Draining fast

Reply via email to