Ah, that was the problem!

So I edited the crushmap (
http://docs.ceph.com/docs/master/rados/operations/crush-map/) with a weight
of 10.000 for all three 10TB OSD hosts. The instant result was all those
pgs with only 2 OSDs were replaced with 3 OSDs while the cluster started
rebalancing the data. I trust it will complete with time and I'll be good
to go!

New OSD tree:
$ ceph osd tree
ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 30.00000 root default
-5 10.00000     host osd1
 3 10.00000         osd.3      up  1.00000          1.00000
-6 10.00000     host osd2
 4 10.00000         osd.4      up  1.00000          1.00000
-2 10.00000     host osd3
 0 10.00000         osd.0      up  1.00000          1.00000

Kudos to Brad Hubbard for steering me in the right direction!


On Tue, Jul 18, 2017 at 8:27 PM Brad Hubbard <bhubb...@redhat.com> wrote:

> ID WEIGHT   TYPE NAME
> -5  1.00000     host osd1
> -6  9.09560     host osd2
> -2  9.09560     host osd3
>
> The weight allocated to host "osd1" should presumably be the same as
> the other two hosts?
>
> Dump your crushmap and take a good look at it, specifically the
> weighting of "osd1".
>
>
> On Wed, Jul 19, 2017 at 11:48 AM, Roger Brown <rogerpbr...@gmail.com>
> wrote:
> > I also tried ceph pg query, but it gave no helpful recommendations for
> any
> > of the stuck pgs.
> >
> >
> > On Tue, Jul 18, 2017 at 7:45 PM Roger Brown <rogerpbr...@gmail.com>
> wrote:
> >>
> >> Problem:
> >> I have some pgs with only two OSDs instead of 3 like all the other pgs
> >> have. This is causing active+undersized+degraded status.
> >>
> >> History:
> >> 1. I started with 3 hosts, each with 1 OSD process (min_size 2) for a
> 1TB
> >> drive.
> >> 2. Added 3 more hosts, each with 1 OSD process for a 10TB drive.
> >> 3. Removed the original 3 1TB OSD hosts from the osd tree (reweight 0,
> >> wait, stop, remove, del osd&host, rm).
> >> 4. The last OSD to be removed would never return to active+clean after
> >> reweight 0. It returned undersized instead, but I went on with removal
> >> anyway, leaving me stuck with 5 undersized pgs.
> >>
> >> Things tried that didn't help:
> >> * give it time to go away on its own
> >> * Replace replicated default.rgw.buckets.data pool with erasure-code 2+1
> >> version.
> >> * ceph osd lost 1 (and 2)
> >> * ceph pg repair (pgs from dump_stuck)
> >> * googled 'ceph pg undersized' and similar searches for help.
> >>
> >> Current status:
> >> $ ceph osd tree
> >> ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >> -1 19.19119 root default
> >> -5  1.00000     host osd1
> >>  3  1.00000         osd.3      up  1.00000          1.00000
> >> -6  9.09560     host osd2
> >>  4  9.09560         osd.4      up  1.00000          1.00000
> >> -2  9.09560     host osd3
> >>  0  9.09560         osd.0      up  1.00000          1.00000
> >> $ ceph pg dump_stuck
> >> ok
> >> PG_STAT STATE                      UP    UP_PRIMARY ACTING
> ACTING_PRIMARY
> >> 88.3    active+undersized+degraded [4,0]          4  [4,0]
> 4
> >> 97.3    active+undersized+degraded [4,0]          4  [4,0]
> 4
> >> 85.6    active+undersized+degraded [4,0]          4  [4,0]
> 4
> >> 87.5    active+undersized+degraded [0,4]          0  [0,4]
> 0
> >> 70.0    active+undersized+degraded [0,4]          0  [0,4]
> 0
> >> $ ceph osd pool ls detail
> >> pool 70 'default.rgw.rgw.gc' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 548 flags hashpspool
> >> stripe_width 0
> >> pool 83 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 576
> owner
> >> 18446744073709551615 flags hashpspool stripe_width 0
> >> pool 85 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 652 flags hashpspool
> >> stripe_width 0
> >> pool 86 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule
> 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags hashpspool
> >> stripe_width 0
> >> pool 87 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 654 flags hashpspool
> >> stripe_width 0
> >> pool 88 'default.rgw.lc' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 600 flags hashpspool
> >> stripe_width 0
> >> pool 89 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 flags hashpspool
> >> stripe_width 0
> >> pool 90 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule
> 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 662 flags hashpspool
> >> stripe_width 0
> >> pool 91 'default.rgw.users.email' replicated size 3 min_size 2
> crush_rule
> >> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 660 flags
> hashpspool
> >> stripe_width 0
> >> pool 92 'default.rgw.users.keys' replicated size 3 min_size 2
> crush_rule 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 659 flags hashpspool
> >> stripe_width 0
> >> pool 93 'default.rgw.buckets.index' replicated size 3 min_size 2
> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 664
> flags
> >> hashpspool stripe_width 0
> >> pool 95 'default.rgw.intent-log' replicated size 3 min_size 2
> crush_rule 0
> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 656 flags hashpspool
> >> stripe_width 0
> >> pool 96 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 657 flags hashpspool
> >> stripe_width 0
> >> pool 97 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 658 flags hashpspool
> >> stripe_width 0
> >> pool 98 'default.rgw.users.swift' replicated size 3 min_size 2
> crush_rule
> >> 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 661 flags
> hashpspool
> >> stripe_width 0
> >> pool 99 'default.rgw.buckets.extra' replicated size 3 min_size 2
> >> crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 663
> flags
> >> hashpspool stripe_width 0
> >> pool 100 '.rgw.root' replicated size 3 min_size 2 crush_rule 0
> object_hash
> >> rjenkins pg_num 4 pgp_num 4 last_change 651 flags hashpspool
> stripe_width 0
> >> pool 101 'default.rgw.reshard' replicated size 3 min_size 2 crush_rule 0
> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1529 owner
> >> 18446744073709551615 flags hashpspool stripe_width 0
> >> pool 103 'default.rgw.buckets.data' erasure size 3 min_size 2
> crush_rule 1
> >> object_hash rjenkins pg_num 256 pgp_num 256 last_change 2106 flags
> >> hashpspool stripe_width 8192
> >>
> >> I'll keep on googling, but I'm open to advice!
> >>
> >> Thank you,
> >>
> >> Roger
> >>
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
>
> --
> Cheers,
> Brad
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to