Your CRUSH rules will not change automatically. Check out the documentation for changing tunables:
http://docs.ceph.com/docs/mimic/rados/operations/crush-map/#tunables 2018-06-20 18:27 GMT+02:00 Oliver Schulz <oliver.sch...@tu-dortmund.de>: > Thanks, Paul - I could probably activate the Jewel tunables > profile without losing too many clients - most are running > at least kernel 4.2, I think. I'll go hunting for older > clients ... > > After changing the tunables, do I need to restart any > Ceph daemons? > > Another question, if I may: The hammer tunables bring > CRUSH_V4 with straw2 buckets. Can I / should I convert > the existing buckets to straw2 somehow? Or will it > happen automatically? > > > Cheers, > > Oliver > > > > > On 20.06.2018 18:10, Paul Emmerich wrote: > >> Yeah, your tunables are ancient. Probably wouldn't have happened with >> modern ones. >> If this was my cluster I would probably update the clients and update >> that (caution: lots of data movement!), >> but I know how annoying it can be to chase down everyone who runs ancient >> clients. >> >> For comparison, this is what a fresh installation of Luminous looks like: >> { >> "choose_local_tries": 0, >> "choose_local_fallback_tries": 0, >> "choose_total_tries": 50, >> "chooseleaf_descend_once": 1, >> "chooseleaf_vary_r": 1, >> "chooseleaf_stable": 1, >> "straw_calc_version": 1, >> "allowed_bucket_algs": 54, >> "profile": "jewel", >> "optimal_tunables": 1, >> "legacy_tunables": 0, >> "minimum_required_version": "jewel", >> "require_feature_tunables": 1, >> "require_feature_tunables2": 1, >> "has_v2_rules": 1, >> "require_feature_tunables3": 1, >> "has_v3_rules": 0, >> "has_v4_buckets": 1, >> "require_feature_tunables5": 1, >> "has_v5_rules": 0 >> } >> >> >> For a work-around/fix, I'd probably either figure out which can be >> adjusted >> without breaking the oldest clients. Incrementing choose*tries in the >> crush rule >> or tunables is probably sufficient. >> But since you are apparently running into data balance problems you'll >> have >> to update that to something more modern sooner or later. >> >> You can also play around with crushtool, it can simulate how PGs are >> mapped, >> that's usually better than changing random things on a production cluster: >> http://docs.ceph.com/docs/mimic/man/8/crushtool/ >> >> Good luck >> >> >> Paul >> >> 2018-06-20 17:57 GMT+02:00 Oliver Schulz <oliver.sch...@tu-dortmund.de >> <mailto:oliver.sch...@tu-dortmund.de>>: >> >> >> Hi Paul, >> >> ah, right, "ceph pg dump | grep remapped", that's what I was looking >> for. I added the output and the result of the pg query at the end of >> >> https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb >> <https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb> >> >> >> > But my guess here is that you are running a CRUSH rule to >> distribute across 3 racks >> > and you only have 3 racks in total. >> >> Yes - I always assumed that 3 failure domains would be suitable >> for replication factor of 3. The three racks are absolutely >> identical, though, hardware-wise, including HDD sizes, and we >> never had any trouble like this before Luminous (we often used >> significant reweighting in the past). >> >> We are way behind on Ceph tunables though: >> >> # ceph osd crush show-tunables >> { >> "choose_local_tries": 0, >> "choose_local_fallback_tries": 0, >> "choose_total_tries": 50, >> "chooseleaf_descend_once": 1, >> "chooseleaf_vary_r": 0, >> "chooseleaf_stable": 0, >> "straw_calc_version": 1, >> "allowed_bucket_algs": 22, >> "profile": "bobtail", >> "optimal_tunables": 0, >> "legacy_tunables": 0, >> "minimum_required_version": "bobtail", >> "require_feature_tunables": 1, >> "require_feature_tunables2": 1, >> "has_v2_rules": 0, >> "require_feature_tunables3": 0, >> "has_v3_rules": 0, >> "has_v4_buckets": 0, >> "require_feature_tunables5": 0, >> "has_v5_rules": 0 >> } >> >> We still have some old clients (trying to get rid of those, so I >> can activate more recent tunables, but it may be a while) ... >> >> Are my tunables at fault? If so, can you recommend a solution >> or a temporary workaround? >> >> >> Cheers (and thanks for helping!), >> >> Oliver >> >> >> >> >> On 06/20/2018 05:01 PM, Paul Emmerich wrote: >> >> Hi, >> >> have a look at "ceph pg dump" to see which ones are stuck in >> remapped. >> >> But my guess here is that you are running a CRUSH rule to >> distribute across 3 racks >> and you only have 3 racks in total. >> CRUSH will sometimes fail to find a mapping in this scenario. >> There are a few parameters >> that you can tune in your CRUSH rule to increase the number of >> retries. >> For example, the settings set_chooseleaf_tries and >> set_choose_tries can help, they are >> set by default for erasure coding rules (where this scenario is >> more common). Values used >> for EC are set_chooseleaf_tries = 5 and set_choose_tries = 100. >> You can configure them by adding them as the first steps of the >> rule. >> >> You can also configure an upmap exception. >> >> But in general it is often not the best idea to have only 3 >> racks for replica = 3 if you want >> to achieve a good data balance. >> >> >> >> Paul >> >> >> 2018-06-20 16:50 GMT+02:00 Oliver Schulz >> <oliver.sch...@tu-dortmund.de >> <mailto:oliver.sch...@tu-dortmund.de> >> <mailto:oliver.sch...@tu-dortmund.de >> <mailto:oliver.sch...@tu-dortmund.de>>>: >> >> Dear Paul, >> >> thanks, here goes (output of "ceph -s", etc.): >> >> https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb >> <https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb >> > >> <https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb >> <https://gist.github.com/oschulz/7d637c7a1dfa28660b1cdd5cc5dffbcb>> >> >> > Also please run "ceph pg X.YZ query" on one of the PGs >> not backfilling. >> >> Silly question: How do I get a list of the PGs not >> backfilling? >> >> >> >> On 06/20/2018 04:00 PM, Paul Emmerich wrote: >> >> Can you post the full output of "ceph -s", "ceph health >> detail, and ceph osd df tree >> Also please run "ceph pg X.YZ query" on one of the PGs >> not backfilling. >> >> >> Paul >> >> 2018-06-20 15:25 GMT+02:00 Oliver Schulz >> <oliver.sch...@tu-dortmund.de >> <mailto:oliver.sch...@tu-dortmund.de> >> <mailto:oliver.sch...@tu-dortmund.de >> <mailto:oliver.sch...@tu-dortmund.de>> >> <mailto:oliver.sch...@tu-dortmund.de >> <mailto:oliver.sch...@tu-dortmund.de> >> <mailto:oliver.sch...@tu-dortmund.de >> <mailto:oliver.sch...@tu-dortmund.de>>>>: >> >> Dear all, >> >> we (somewhat) recently extended our Ceph cluster, >> and updated it to Luminous. By now, the fill level >> on some ODSs is quite high again, so I'd like to >> re-balance via "OSD reweight". >> >> I'm running into the following problem, however: >> Not matter what I do (reweigt a little, or a lot, >> or only reweight a single OSD by 5%) - after a >> while, backfilling simply stops and lots of objects >> stay misplaced. >> >> I do have up to 250 PGs per OSD (early sins from >> the first days of the cluster), but I've set >> "mon_max_pg_per_osd = 400" and >> "osd_max_pg_per_osd_hard_ratio = 1.5" to compensate. >> >> How can I find out why backfill stops? Any advice >> would be very much appreciated. >> >> >> Cheers, >> >> Oliver >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >> <mailto:ceph-users@lists.ceph.com >> <mailto:ceph-users@lists.ceph.com>> >> <mailto:ceph-users@lists.ceph.com >> <mailto:ceph-users@lists.ceph.com> >> <mailto:ceph-users@lists.ceph.com >> <mailto:ceph-users@lists.ceph.com>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>> >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>>> >> >> >> >> >> -- Paul Emmerich >> >> Looking for help with your Ceph cluster? Contact us at >> https://croit.io >> >> croit GmbH >> Freseniusstr. 31h >> 81247 München >> www.croit.io <http://www.croit.io> <http://www.croit.io> >> <http://www.croit.io> >> Tel: +49 89 1896585 90 >> >> >> >> >> >> -- Paul Emmerich >> >> Looking for help with your Ceph cluster? Contact us at >> https://croit.io >> >> croit GmbH >> Freseniusstr. 31h >> 81247 München >> www.croit.io <http://www.croit.io> <http://www.croit.io> >> Tel: +49 89 1896585 90 >> >> >> >> >> >> -- >> Paul Emmerich >> >> Looking for help with your Ceph cluster? Contact us at https://croit.io >> >> croit GmbH >> Freseniusstr. 31h >> 81247 München >> www.croit.io <http://www.croit.io> >> Tel: +49 89 1896585 90 >> > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com