[ceph-users] Degraded data redundancy: NUM pgs undersized

Jörg Kastning Tue, 04 Sep 2018 00:47:20 -0700

Good morning folks,

As a newbie to Ceph yesterday was the first time I've configured my CRUSH map, added a CRUSH rule and created my first pool using this rule.


Since then I get the status HEALTH_WARN with the following output:

~~~
$ sudo ceph status
  cluster:
    id:     47c108bd-db66-4197-96df-cadde9e9eb45
    health: HEALTH_WARN
            Degraded data redundancy: 128 pgs undersized
            1 pools have pg_num > pgp_num

  services:
    mon: 3 daemons, quorum ccp-tcnm01,ccp-tcnm02,ccp-tcnm03
    mgr: ccp-tcnm01(active), standbys: ccp-tcnm03, ccp-tcnm02
    osd: 3 osds: 3 up, 3 in

  data:
    pools:   1 pools, 128 pgs
    objects: 0 objects, 0 bytes
    usage:   3088 MB used, 3068 GB / 3071 GB avail
    pgs:     128 active+undersized
~~~

The pool was created running `sudo ceph osd pool create joergsfirstpool 128 replicated replicate_datacenter`.

I've figured out that I forgot to set the value for the key pgp_num accordingly. So I've done that by running `sudo ceph osd pool set joergsfirstpool pgp_num 128`. As you could see in the following output 15 PGs were remapped but 113 still remain in active+undersized.


~~~
$ sudo ceph status
  cluster:
    id:     47c108bd-db66-4197-96df-cadde9e9eb45
    health: HEALTH_WARN
            Degraded data redundancy: 113 pgs undersized

  services:
    mon: 3 daemons, quorum ccp-tcnm01,ccp-tcnm02,ccp-tcnm03
    mgr: ccp-tcnm01(active), standbys: ccp-tcnm03, ccp-tcnm02
    osd: 3 osds: 3 up, 3 in; 15 remapped pgs

  data:
    pools:   1 pools, 128 pgs
    objects: 0 objects, 0 bytes
    usage:   3089 MB used, 3068 GB / 3071 GB avail
    pgs:     113 active+undersized
             15  active+clean+remapped
~~~

My questions are:

1. What does active+undersized actually mean? I did not find anything about it in the documentation on docs.ceph.com.

2. Why are only 15 PGs were getting remapped after I've corrected the mistake with the wrong pgp_num value?

3. What's wrong here and what do I have to do to get the cluster back to active+clean, again?


For further information you could find my current CRUSH map below:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host ccp-tcnm01 {
        id -5           # do not change unnecessarily
        id -6 class hdd         # do not change unnecessarily
        # weight 1.000
        alg straw2
        hash 0  # rjenkins1
        item osd.1 weight 1.000
}
host ccp-tcnm03 {
        id -7           # do not change unnecessarily
        id -8 class hdd         # do not change unnecessarily
        # weight 1.000
        alg straw2
        hash 0  # rjenkins1
        item osd.2 weight 1.000
}
datacenter dc1 {
        id -9           # do not change unnecessarily
        id -12 class hdd                # do not change unnecessarily
        # weight 2.000
        alg straw2
        hash 0  # rjenkins1
        item ccp-tcnm01 weight 1.000
        item ccp-tcnm03 weight 1.000
}
host ccp-tcnm02 {
        id -3           # do not change unnecessarily
        id -4 class hdd         # do not change unnecessarily
        # weight 1.000
        alg straw2
        hash 0  # rjenkins1
        item osd.0 weight 1.000
}
datacenter dc3 {
        id -10          # do not change unnecessarily
        id -11 class hdd                # do not change unnecessarily
        # weight 1.000
        alg straw2
        hash 0  # rjenkins1
        item ccp-tcnm02 weight 1.000
}
root default {
        id -1           # do not change unnecessarily
        id -2 class hdd         # do not change unnecessarily
        # weight 3.000
        alg straw2
        hash 0  # rjenkins1
        item dc1 weight 2.000
        item dc3 weight 1.000
}

# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type host
        step emit
}
rule replicate_datacenter {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type datacenter
        step emit
}

# end crush map

Best regards,
Joerg

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Degraded data redundancy: NUM pgs undersized

Reply via email to