Re: [ceph-users] Backfill stops after a while after OSD reweight

Paul Emmerich Thu, 21 Jun 2018 07:17:44 -0700

Your CRUSH rules will not change automatically.
Check out the documentation for changing tunables:


http://docs.ceph.com/docs/mimic/rados/operations/crush-map/#tunables

2018-06-20 18:27 GMT+02:00 Oliver Schulz <oliver.sch...@tu-dortmund.de>:

> Thanks, Paul - I could probably activate the Jewel tunables
> profile without losing too many clients - most are running
> at least kernel 4.2, I think. I'll go hunting for older
> clients ...
>
> After changing the tunables, do I need to restart any
> Ceph daemons?
>
> Another question, if I may: The hammer tunables bring
> CRUSH_V4 with straw2 buckets. Can I / should I convert
> the existing buckets to straw2 somehow? Or will it
> happen automatically?
>
>
> Cheers,
>
> Oliver
>
>
>
>
> On 20.06.2018 18:10, Paul Emmerich wrote:
>
>> Yeah, your tunables are ancient. Probably wouldn't have happened with
>> modern ones.
>> If this was my cluster I would probably update the clients and update
>> that (caution: lots of data movement!),
>> but I know how annoying it can be to chase down everyone who runs ancient
>> clients.
>>
>> For comparison, this is what a fresh installation of Luminous looks like:
>> {
>>      "choose_local_tries": 0,
>>      "choose_local_fallback_tries": 0,
>>      "choose_total_tries": 50,
>>      "chooseleaf_descend_once": 1,
>>      "chooseleaf_vary_r": 1,
>>      "chooseleaf_stable": 1,
>>      "straw_calc_version": 1,
>>      "allowed_bucket_algs": 54,
>>      "profile": "jewel",
>>      "optimal_tunables": 1,
>>      "legacy_tunables": 0,
>>      "minimum_required_version": "jewel",
>>      "require_feature_tunables": 1,
>>      "require_feature_tunables2": 1,
>>      "has_v2_rules": 1,
>>      "require_feature_tunables3": 1,
>>      "has_v3_rules": 0,
>>      "has_v4_buckets": 1,
>>      "require_feature_tunables5": 1,
>>      "has_v5_rules": 0
>> }
>>
>>
>> For a work-around/fix, I'd probably either figure out which can be
>> adjusted
>> without breaking the oldest clients. Incrementing choose*tries in the
>> crush rule
>> or tunables is probably sufficient.
>> But since you are apparently running into data balance problems you'll
>> have
>> to update that to something more modern sooner or later.
>>
>> You can also play around with crushtool, it can simulate how PGs are
>> mapped,
>> that's usually better than changing random things on a production cluster:
>> http://docs.ceph.com/docs/mimic/man/8/crushtool/
>>
>> Good luck
>>
>>
>> Paul
>>
>> 2018-06-20 17:57 GMT+02:00 Oliver Schulz <oliver.sch...@tu-dortmund.de
>> <mailto:oliver.sch...@tu-dortmund.de>>:
>>
>>
>>     Hi Paul,
>>
>>     ah, right, "ceph pg dump | grep remapped", that's what I was looking
>>     for. I added the output and the result of the pg query at the end of
>>
>>     https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb
>>     <https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb>
>>
>>
>>     > But my guess here is that you are running a CRUSH rule to
>> distribute across 3 racks
>>     > and you only have 3 racks in total.
>>
>>     Yes - I always assumed that 3 failure domains would be suitable
>>     for replication factor of 3. The three racks are absolutely
>>     identical, though, hardware-wise, including HDD sizes, and we
>>     never had any trouble like this before Luminous (we often used
>>     significant reweighting in the past).
>>
>>     We are way behind on Ceph tunables though:
>>
>>     # ceph osd crush show-tunables
>>     {
>>          "choose_local_tries": 0,
>>          "choose_local_fallback_tries": 0,
>>          "choose_total_tries": 50,
>>          "chooseleaf_descend_once": 1,
>>          "chooseleaf_vary_r": 0,
>>          "chooseleaf_stable": 0,
>>          "straw_calc_version": 1,
>>          "allowed_bucket_algs": 22,
>>          "profile": "bobtail",
>>          "optimal_tunables": 0,
>>          "legacy_tunables": 0,
>>          "minimum_required_version": "bobtail",
>>          "require_feature_tunables": 1,
>>          "require_feature_tunables2": 1,
>>          "has_v2_rules": 0,
>>          "require_feature_tunables3": 0,
>>          "has_v3_rules": 0,
>>          "has_v4_buckets": 0,
>>          "require_feature_tunables5": 0,
>>          "has_v5_rules": 0
>>     }
>>
>>     We still have some old clients (trying to get rid of those, so I
>>     can activate more recent tunables, but it may be a while) ...
>>
>>     Are my tunables at fault? If so, can you recommend a solution
>>     or a temporary workaround?
>>
>>
>>     Cheers (and thanks for helping!),
>>
>>     Oliver
>>
>>
>>
>>
>>     On 06/20/2018 05:01 PM, Paul Emmerich wrote:
>>
>>         Hi,
>>
>>         have a look at "ceph pg dump" to see which ones are stuck in
>>         remapped.
>>
>>         But my guess here is that you are running a CRUSH rule to
>>         distribute across 3 racks
>>         and you only have 3 racks in total.
>>         CRUSH will sometimes fail to find a mapping in this scenario.
>>         There are a few parameters
>>         that you can tune in your CRUSH rule to increase the number of
>>         retries.
>>         For example, the settings set_chooseleaf_tries and
>>         set_choose_tries can help, they are
>>         set by default for erasure coding rules (where this scenario is
>>         more common). Values used
>>         for EC are set_chooseleaf_tries = 5 and set_choose_tries = 100.
>>         You can configure them by adding them as the first steps of the
>>         rule.
>>
>>         You can also configure an upmap exception.
>>
>>         But in general it is often not the best idea to have only 3
>>         racks for replica = 3 if you want
>>         to achieve a good data balance.
>>
>>
>>
>>         Paul
>>
>>
>>         2018-06-20 16:50 GMT+02:00 Oliver Schulz
>>         <oliver.sch...@tu-dortmund.de
>>         <mailto:oliver.sch...@tu-dortmund.de>
>>         <mailto:oliver.sch...@tu-dortmund.de
>>         <mailto:oliver.sch...@tu-dortmund.de>>>:
>>
>>              Dear Paul,
>>
>>              thanks, here goes (output of "ceph -s", etc.):
>>
>>         https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb
>>         <https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb
>> >
>>         <https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb
>> <https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb>>
>>
>>              > Also please run "ceph pg X.YZ query" on one of the PGs
>>         not backfilling.
>>
>>              Silly question: How do I get a list of the PGs not
>> backfilling?
>>
>>
>>
>>              On 06/20/2018 04:00 PM, Paul Emmerich wrote:
>>
>>                  Can you post the full output of "ceph -s", "ceph health
>>         detail, and ceph osd df tree
>>                  Also please run "ceph pg X.YZ query" on one of the PGs
>>         not backfilling.
>>
>>
>>                  Paul
>>
>>                  2018-06-20 15:25 GMT+02:00 Oliver Schulz
>>         <oliver.sch...@tu-dortmund.de
>>         <mailto:oliver.sch...@tu-dortmund.de>
>>         <mailto:oliver.sch...@tu-dortmund.de
>>         <mailto:oliver.sch...@tu-dortmund.de>>
>>         <mailto:oliver.sch...@tu-dortmund.de
>>         <mailto:oliver.sch...@tu-dortmund.de>
>>         <mailto:oliver.sch...@tu-dortmund.de
>>         <mailto:oliver.sch...@tu-dortmund.de>>>>:
>>
>>                       Dear all,
>>
>>                       we (somewhat) recently extended our Ceph cluster,
>>                       and updated it to Luminous. By now, the fill level
>>                       on some ODSs is quite high again, so I'd like to
>>                       re-balance via "OSD reweight".
>>
>>                       I'm running into the following problem, however:
>>                       Not matter what I do (reweigt a little, or a lot,
>>                       or only reweight a single OSD by 5%) - after a
>>                       while, backfilling simply stops and lots of objects
>>                       stay misplaced.
>>
>>                       I do have up to 250 PGs per OSD (early sins from
>>                       the first days of the cluster), but I've set
>>                       "mon_max_pg_per_osd = 400" and
>>                       "osd_max_pg_per_osd_hard_ratio = 1.5" to compensate.
>>
>>                       How can I find out why backfill stops? Any advice
>>                       would be very much appreciated.
>>
>>
>>                       Cheers,
>>
>>                       Oliver
>>                       _______________________________________________
>>                       ceph-users mailing list
>>         ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>         <mailto:ceph-users@lists.ceph.com
>>         <mailto:ceph-users@lists.ceph.com>>
>>         <mailto:ceph-users@lists.ceph.com
>>         <mailto:ceph-users@lists.ceph.com>
>>         <mailto:ceph-users@lists.ceph.com
>>         <mailto:ceph-users@lists.ceph.com>>>
>>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>>                  <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>         <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>>
>>
>>
>>
>>
>>                  --         Paul Emmerich
>>
>>                  Looking for help with your Ceph cluster? Contact us at
>>         https://croit.io
>>
>>                  croit GmbH
>>                  Freseniusstr. 31h
>>                  81247 München
>>         www.croit.io <http://www.croit.io> <http://www.croit.io>
>>         <http://www.croit.io>
>>                  Tel: +49 89 1896585 90
>>
>>
>>
>>
>>
>>         --         Paul Emmerich
>>
>>         Looking for help with your Ceph cluster? Contact us at
>>         https://croit.io
>>
>>         croit GmbH
>>         Freseniusstr. 31h
>>         81247 München
>>         www.croit.io <http://www.croit.io> <http://www.croit.io>
>>         Tel: +49 89 1896585 90
>>
>>
>>
>>
>>
>> --
>> Paul Emmerich
>>
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>
>> croit GmbH
>> Freseniusstr. 31h
>> 81247 München
>> www.croit.io <http://www.croit.io>
>> Tel: +49 89 1896585 90
>>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Backfill stops after a while after OSD reweight

Reply via email to