Re: [ceph-users] undersized pgs after removing smaller OSDs

Roger Brown Tue, 18 Jul 2017 20:16:47 -0700

Resolution confirmed!

$ ceph -s
  cluster:
    id:     eea7b78c-b138-40fc-9f3e-3d77afb770f0
    health: HEALTH_OK


  services:
    mon: 3 daemons, quorum desktop,mon1,nuc2
    mgr: desktop(active), standbys: mon1
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   19 pools, 372 pgs
    objects: 54243 objects, 71722 MB
    usage:   129 GB used, 27812 GB / 27941 GB avail
    pgs:     372 active+clean


On Tue, Jul 18, 2017 at 8:47 PM Roger Brown <[email protected]> wrote:

> Ah, that was the problem!
>
> So I edited the crushmap (
> http://docs.ceph.com/docs/master/rados/operations/crush-map/) with a
> weight of 10.000 for all three 10TB OSD hosts. The instant result was all
> those pgs with only 2 OSDs were replaced with 3 OSDs while the cluster
> started rebalancing the data. I trust it will complete with time and I'll
> be good to go!
>
> New OSD tree:
> $ ceph osd tree
> ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 30.00000 root default
> -5 10.00000     host osd1
>  3 10.00000         osd.3      up  1.00000          1.00000
> -6 10.00000     host osd2
>  4 10.00000         osd.4      up  1.00000          1.00000
> -2 10.00000     host osd3
>  0 10.00000         osd.0      up  1.00000          1.00000
>
> Kudos to Brad Hubbard for steering me in the right direction!
>
>
> On Tue, Jul 18, 2017 at 8:27 PM Brad Hubbard <[email protected]> wrote:
>
>> ID WEIGHT   TYPE NAME
>> -5  1.00000     host osd1
>> -6  9.09560     host osd2
>> -2  9.09560     host osd3
>>
>> The weight allocated to host "osd1" should presumably be the same as
>> the other two hosts?
>>
>> Dump your crushmap and take a good look at it, specifically the
>> weighting of "osd1".
>>
>>
>> On Wed, Jul 19, 2017 at 11:48 AM, Roger Brown <[email protected]>
>> wrote:
>> > I also tried ceph pg query, but it gave no helpful recommendations for
>> any
>> > of the stuck pgs.
>> >
>> >
>> > On Tue, Jul 18, 2017 at 7:45 PM Roger Brown <[email protected]>
>> wrote:
>> >>
>> >> Problem:
>> >> I have some pgs with only two OSDs instead of 3 like all the other pgs
>> >> have. This is causing active+undersized+degraded status.
>> >>
>> >> History:
>> >> 1. I started with 3 hosts, each with 1 OSD process (min_size 2) for a
>> 1TB
>> >> drive.
>> >> 2. Added 3 more hosts, each with 1 OSD process for a 10TB drive.
>> >> 3. Removed the original 3 1TB OSD hosts from the osd tree (reweight 0,
>> >> wait, stop, remove, del osd&host, rm).
>> >> 4. The last OSD to be removed would never return to active+clean after
>> >> reweight 0. It returned undersized instead, but I went on with removal
>> >> anyway, leaving me stuck with 5 undersized pgs.
>> >>
>> >> Things tried that didn't help:
>> >> * give it time to go away on its own
>> >> * Replace replicated default.rgw.buckets.data pool with erasure-code
>> 2+1
>> >> version.
>> >> * ceph osd lost 1 (and 2)
>> >> * ceph pg repair (pgs from dump_stuck)
>> >> * googled 'ceph pg undersized' and similar searches for help.
>> >>
>> >> Current status:
>> >> $ ceph osd tree
>> >> ID WEIGHT   TYPE NAME     UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> >> -1 19.19119 root default
>> >> -5  1.00000     host osd1
>> >>  3  1.00000         osd.3      up  1.00000          1.00000
>> >> -6  9.09560     host osd2
>> >>  4  9.09560         osd.4      up  1.00000          1.00000
>> >> -2  9.09560     host osd3
>> >>  0  9.09560         osd.0      up  1.00000          1.00000
>> >> $ ceph pg dump_stuck
>> >> ok
>> >> PG_STAT STATE                      UP    UP_PRIMARY ACTING
>> ACTING_PRIMARY
>> >> 88.3    active+undersized+degraded [4,0]          4  [4,0]
>>   4
>> >> 97.3    active+undersized+degraded [4,0]          4  [4,0]
>>   4
>> >> 85.6    active+undersized+degraded [4,0]          4  [4,0]
>>   4
>> >> 87.5    active+undersized+degraded [0,4]          0  [0,4]
>>   0
>> >> 70.0    active+undersized+degraded [0,4]          0  [0,4]
>>   0
>> >> $ ceph osd pool ls detail
>> >> pool 70 'default.rgw.rgw.gc' replicated size 3 min_size 2 crush_rule 0
>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 548 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 83 'default.rgw.buckets.non-ec' replicated size 3 min_size 2
>> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 576
>> owner
>> >> 18446744073709551615 flags hashpspool stripe_width 0
>> >> pool 85 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 652 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 86 'default.rgw.data.root' replicated size 3 min_size 2
>> crush_rule 0
>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 653 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 87 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 654 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 88 'default.rgw.lc' replicated size 3 min_size 2 crush_rule 0
>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 600 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 89 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 655 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 90 'default.rgw.users.uid' replicated size 3 min_size 2
>> crush_rule 0
>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 662 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 91 'default.rgw.users.email' replicated size 3 min_size 2
>> crush_rule
>> >> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 660 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 92 'default.rgw.users.keys' replicated size 3 min_size 2
>> crush_rule 0
>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 659 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 93 'default.rgw.buckets.index' replicated size 3 min_size 2
>> >> crush_rule 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 664
>> flags
>> >> hashpspool stripe_width 0
>> >> pool 95 'default.rgw.intent-log' replicated size 3 min_size 2
>> crush_rule 0
>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 656 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 96 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 657 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 97 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
>> >> object_hash rjenkins pg_num 4 pgp_num 4 last_change 658 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 98 'default.rgw.users.swift' replicated size 3 min_size 2
>> crush_rule
>> >> 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 661 flags
>> hashpspool
>> >> stripe_width 0
>> >> pool 99 'default.rgw.buckets.extra' replicated size 3 min_size 2
>> >> crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 663
>> flags
>> >> hashpspool stripe_width 0
>> >> pool 100 '.rgw.root' replicated size 3 min_size 2 crush_rule 0
>> object_hash
>> >> rjenkins pg_num 4 pgp_num 4 last_change 651 flags hashpspool
>> stripe_width 0
>> >> pool 101 'default.rgw.reshard' replicated size 3 min_size 2 crush_rule
>> 0
>> >> object_hash rjenkins pg_num 8 pgp_num 8 last_change 1529 owner
>> >> 18446744073709551615 flags hashpspool stripe_width 0
>> >> pool 103 'default.rgw.buckets.data' erasure size 3 min_size 2
>> crush_rule 1
>> >> object_hash rjenkins pg_num 256 pgp_num 256 last_change 2106 flags
>> >> hashpspool stripe_width 8192
>> >>
>> >> I'll keep on googling, but I'm open to advice!
>> >>
>> >> Thank you,
>> >>
>> >> Roger
>> >>
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > [email protected]
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Cheers,
>> Brad
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] undersized pgs after removing smaller OSDs

Reply via email to