David, So as I look at logs, it was originally 9.0956 for the 10TB drives and 0.9096 for the 1TB drives.
# zgrep -i weight /var/log/ceph/*.log*gz /var/log/ceph/ceph.audit.log.4.gz:...cmd=[{"prefix": "osd crush create-or-move", "id": 4, "weight":9.0956,... /var/log/ceph/ceph.audit.log.4.gz:...cmd=[{"prefix": "osd crush create-or-move", "id": 1, "weight":0.9096,... With that, I updated the crushmap with the 9.0956 weights for 10TB drives: $ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 27.28679 root default -5 9.09560 host osd1 3 9.09560 osd.3 up 1.00000 1.00000 -6 9.09560 host osd2 4 9.09560 osd.4 up 1.00000 1.00000 -2 9.09560 host osd3 0 9.09560 osd.0 up 1.00000 1.00000 Thanks much! Roger On Wed, Jul 19, 2017 at 7:34 AM David Turner <drakonst...@gmail.com> wrote: > I would go with the weight that was originally assigned to them. That way > it is in line with what new osds will be weighted. > > On Wed, Jul 19, 2017, 9:17 AM Roger Brown <rogerpbr...@gmail.com> wrote: > >> David, >> >> Thank you. I have it currently as... >> >> $ ceph osd df >> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >> 3 10.00000 1.00000 9313G 44404M 9270G 0.47 1.00 372 >> 4 10.00000 1.00000 9313G 46933M 9268G 0.49 1.06 372 >> 0 10.00000 1.00000 9313G 41283M 9273G 0.43 0.93 372 >> TOTAL 27941G 129G 27812G 0.46 >> MIN/MAX VAR: 0.93/1.06 STDDEV: 0.02 >> >> The above output shows size not as 10TB but as 9313G. So should I >> reweight each as 9.313? Or as the TiB value 9.09560? >> >> >> On Tue, Jul 18, 2017 at 11:18 PM David Turner <drakonst...@gmail.com> >> wrote: >> >>> I would recommend sucking with the weight of 9.09560 for the osds as >>> that is the TiB size of the osds that ceph details to as supposed to the TB >>> size of the osds. New osds will have their weights based on the TiB value. >>> What is your `ceph osd df` output just to see what things look like? >>> Hopefully very healthy. >>> >>> On Tue, Jul 18, 2017, 11:16 PM Roger Brown <rogerpbr...@gmail.com> >>> wrote: >>> >>>> Resolution confirmed! >>>> >>>> $ ceph -s >>>> cluster: >>>> id: eea7b78c-b138-40fc-9f3e-3d77afb770f0 >>>> health: HEALTH_OK >>>> >>>> services: >>>> mon: 3 daemons, quorum desktop,mon1,nuc2 >>>> mgr: desktop(active), standbys: mon1 >>>> osd: 3 osds: 3 up, 3 in >>>> >>>> data: >>>> pools: 19 pools, 372 pgs >>>> objects: 54243 objects, 71722 MB >>>> usage: 129 GB used, 27812 GB / 27941 GB avail >>>> pgs: 372 active+clean >>>> >>>> >>>> On Tue, Jul 18, 2017 at 8:47 PM Roger Brown <rogerpbr...@gmail.com> >>>> wrote: >>>> >>>>> Ah, that was the problem! >>>>> >>>>> So I edited the crushmap ( >>>>> http://docs.ceph.com/docs/master/rados/operations/crush-map/) with a >>>>> weight of 10.000 for all three 10TB OSD hosts. The instant result was all >>>>> those pgs with only 2 OSDs were replaced with 3 OSDs while the cluster >>>>> started rebalancing the data. I trust it will complete with time and I'll >>>>> be good to go! >>>>> >>>>> New OSD tree: >>>>> $ ceph osd tree >>>>> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >>>>> -1 30.00000 root default >>>>> -5 10.00000 host osd1 >>>>> 3 10.00000 osd.3 up 1.00000 1.00000 >>>>> -6 10.00000 host osd2 >>>>> 4 10.00000 osd.4 up 1.00000 1.00000 >>>>> -2 10.00000 host osd3 >>>>> 0 10.00000 osd.0 up 1.00000 1.00000 >>>>> >>>>> Kudos to Brad Hubbard for steering me in the right direction! >>>>> >>>>> >>>>> On Tue, Jul 18, 2017 at 8:27 PM Brad Hubbard <bhubb...@redhat.com> >>>>> wrote: >>>>> >>>>>> ID WEIGHT TYPE NAME >>>>>> -5 1.00000 host osd1 >>>>>> -6 9.09560 host osd2 >>>>>> -2 9.09560 host osd3 >>>>>> >>>>>> The weight allocated to host "osd1" should presumably be the same as >>>>>> the other two hosts? >>>>>> >>>>>> Dump your crushmap and take a good look at it, specifically the >>>>>> weighting of "osd1". >>>>>> >>>>>> >>>>>> On Wed, Jul 19, 2017 at 11:48 AM, Roger Brown <rogerpbr...@gmail.com> >>>>>> wrote: >>>>>> > I also tried ceph pg query, but it gave no helpful recommendations >>>>>> for any >>>>>> > of the stuck pgs. >>>>>> > >>>>>> > >>>>>> > On Tue, Jul 18, 2017 at 7:45 PM Roger Brown <rogerpbr...@gmail.com> >>>>>> wrote: >>>>>> >> >>>>>> >> Problem: >>>>>> >> I have some pgs with only two OSDs instead of 3 like all the other >>>>>> pgs >>>>>> >> have. This is causing active+undersized+degraded status. >>>>>> >> >>>>>> >> History: >>>>>> >> 1. I started with 3 hosts, each with 1 OSD process (min_size 2) >>>>>> for a 1TB >>>>>> >> drive. >>>>>> >> 2. Added 3 more hosts, each with 1 OSD process for a 10TB drive. >>>>>> >> 3. Removed the original 3 1TB OSD hosts from the osd tree >>>>>> (reweight 0, >>>>>> >> wait, stop, remove, del osd&host, rm). >>>>>> >> 4. The last OSD to be removed would never return to active+clean >>>>>> after >>>>>> >> reweight 0. It returned undersized instead, but I went on with >>>>>> removal >>>>>> >> anyway, leaving me stuck with 5 undersized pgs. >>>>>> >> >>>>>> >> Things tried that didn't help: >>>>>> >> * give it time to go away on its own >>>>>> >> * Replace replicated default.rgw.buckets.data pool with >>>>>> erasure-code 2+1 >>>>>> >> version. >>>>>> >> * ceph osd lost 1 (and 2) >>>>>> >> * ceph pg repair (pgs from dump_stuck) >>>>>> >> * googled 'ceph pg undersized' and similar searches for help. >>>>>> >> >>>>>> >> Current status: >>>>>> >> $ ceph osd tree >>>>>> >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >>>>>> >> -1 19.19119 root default >>>>>> >> -5 1.00000 host osd1 >>>>>> >> 3 1.00000 osd.3 up 1.00000 1.00000 >>>>>> >> -6 9.09560 host osd2 >>>>>> >> 4 9.09560 osd.4 up 1.00000 1.00000 >>>>>> >> -2 9.09560 host osd3 >>>>>> >> 0 9.09560 osd.0 up 1.00000 1.00000 >>>>>> >> $ ceph pg dump_stuck >>>>>> >> ok >>>>>> >> PG_STAT STATE UP UP_PRIMARY ACTING >>>>>> ACTING_PRIMARY >>>>>> >> 88.3 active+undersized+degraded [4,0] 4 [4,0] >>>>>> 4 >>>>>> >> 97.3 active+undersized+degraded [4,0] 4 [4,0] >>>>>> 4 >>>>>> >> 85.6 active+undersized+degraded [4,0] 4 [4,0] >>>>>> 4 >>>>>> >> 87.5 active+undersized+degraded [0,4] 0 [0,4] >>>>>> 0 >>>>>> >> 70.0 active+undersized+degraded [0,4] 0 [0,4] >>>>>> 0 >>>>>> >> $ ceph osd pool ls detail >>>>>> >> pool 70 'default.rgw.rgw.gc' replicated size 3 min_size 2 >>>>>> crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 548 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 83 'default.rgw.buckets.non-ec' replicated size 3 min_size 2 >>>>>> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change >>>>>> 576 owner >>>>>> >> 18446744073709551615 flags hashpspool stripe_width 0 >>>>>> >> pool 85 'default.rgw.control' replicated size 3 min_size 2 >>>>>> crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 652 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 86 'default.rgw.data.root' replicated size 3 min_size 2 >>>>>> crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 87 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 654 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 88 'default.rgw.lc' replicated size 3 min_size 2 crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 600 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 89 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 90 'default.rgw.users.uid' replicated size 3 min_size 2 >>>>>> crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 662 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 91 'default.rgw.users.email' replicated size 3 min_size 2 >>>>>> crush_rule >>>>>> >> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 660 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 92 'default.rgw.users.keys' replicated size 3 min_size 2 >>>>>> crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 659 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 93 'default.rgw.buckets.index' replicated size 3 min_size 2 >>>>>> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change >>>>>> 664 flags >>>>>> >> hashpspool stripe_width 0 >>>>>> >> pool 95 'default.rgw.intent-log' replicated size 3 min_size 2 >>>>>> crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 656 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 96 'default.rgw.meta' replicated size 3 min_size 2 crush_rule >>>>>> 0 >>>>>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 657 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 97 'default.rgw.usage' replicated size 3 min_size 2 >>>>>> crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 658 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 98 'default.rgw.users.swift' replicated size 3 min_size 2 >>>>>> crush_rule >>>>>> >> 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 661 flags >>>>>> hashpspool >>>>>> >> stripe_width 0 >>>>>> >> pool 99 'default.rgw.buckets.extra' replicated size 3 min_size 2 >>>>>> >> crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change >>>>>> 663 flags >>>>>> >> hashpspool stripe_width 0 >>>>>> >> pool 100 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 >>>>>> object_hash >>>>>> >> rjenkins pg_num 4 pgp_num 4 last_change 651 flags hashpspool >>>>>> stripe_width 0 >>>>>> >> pool 101 'default.rgw.reshard' replicated size 3 min_size 2 >>>>>> crush_rule 0 >>>>>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1529 owner >>>>>> >> 18446744073709551615 flags hashpspool stripe_width 0 >>>>>> >> pool 103 'default.rgw.buckets.data' erasure size 3 min_size 2 >>>>>> crush_rule 1 >>>>>> >> object_hash rjenkins pg_num 256 pgp_num 256 last_change 2106 flags >>>>>> >> hashpspool stripe_width 8192 >>>>>> >> >>>>>> >> I'll keep on googling, but I'm open to advice! >>>>>> >> >>>>>> >> Thank you, >>>>>> >> >>>>>> >> Roger >>>>>> >> >>>>>> > >>>>>> > _______________________________________________ >>>>>> > ceph-users mailing list >>>>>> > ceph-users@lists.ceph.com >>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Cheers, >>>>>> Brad >>>>>> >>>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com