[ceph-users] Fwd: Re: pgs incomplete; pgs stuck inactive; pgs stuck unclean

Michael Tue, 27 May 2014 15:28:33 -0700

Hi,

Under normal conditions one OSD on one host is not enough to get acluster healthy. You'd need a minimum of one OSD on three hosts to getclean.

Your OSD dump shows "*replicated size 3 min_size 2*" so that's healthyat 3 copies of data, not healthy at two copies but still usable, clusterstops accepting data at one accessible copy.


-Michael

On 27/05/2014 18:38, Sudarsan, Rajesh wrote:

I am seeing the same error message with ceph health command. I amusing Ubuntu 14.04 with ceph 0.79. I am using the ceph distributionthat comes with the Ubuntu release. My configuration is
1 x mon

1x OSD

Both the OSD and mon are on the same host.

rsudarsa@rsudarsa-ce1:~/mycluster$ ceph -s

cluster 5330b56b-bfbb-4ff8-aeb8-138233c2bd9a
health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgsstuck unclean
monmap e1: 1 mons at {rsudarsa-ce2=192.168.252.196:6789/0}, electionepoch 2, quorum 0 rsudarsa-ce2
osdmap e4: 1 osds: 1 up, 1 in

pgmap v12: 192 pgs, 3 pools, 0 bytes data, 0 objects

6603 MB used, 856 GB / 908 GB avail

192 incomplete

rsudarsa@rsudarsa-ce1:~/mycluster$ ceph osd tree

# id    weight  type name       up/down reweight

-1 0.89    root default

-2 0.89            host rsudarsa-ce2

0 0.89                    osd.0   up      1

rsudarsa@rsudarsa-ce1:~/mycluster$ ceph osd dump

epoch 4

fsid 5330b56b-bfbb-4ff8-aeb8-138233c2bd9a

created 2014-05-27 10:11:33.995272

modified 2014-05-27 10:13:34.157068

flags
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hashrjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspoolcrash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flagshashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hashrjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspoolstripe_width 0
max_osd 1
osd.0 up in weight 1 up_from 4 up_thru 0 down_at 0last_clean_interval [0,0) 192.168.252.196:6800/7071192.168.252.196:6801/7071 192.168.252.196:6802/7071192.168.252.196:6803/7071 exists,up 8b1c2bbb-b2f0-4974-b0f5-266c558cc732
*From:*ceph-users [mailto:[email protected]] *OnBehalf Of *[email protected]
*Sent:* Friday, May 23, 2014 6:31 AM
*To:* [email protected]; [email protected]
*Subject:* Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgsstuck unclean
Thanks for your tips & tricks.

This setup is now based on ubuntu 12.04, ceph version 0.80.1

Still using

1 x mon

3 x osds

root@ceph-node2:~# ceph osd tree

# id        weight type name         up/down            reweight

-1 0             root default

-2 0                             host ceph-node2

0 0                                             osd.0 up          1

-3 0                             host ceph-node3

1 0                                             osd.1 up          1

-4 0                             host ceph-node1

2 0                                             osd.2 up          1

root@ceph-node2:~# ceph -s

    cluster c30e1410-fe1a-4924-9112-c7a5d789d273
health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive;192 pgs stuck unclean
monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, electionepoch 2, quorum 0 ceph-node1
     osdmap e11: 3 osds: 3 up, 3 in

      pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects

            102 MB used, 15224 MB / 15326 MB avail

       192 incomplete

root@ceph-node2:~# cat mycrushmap.txt

# begin crush map

tunable choose_local_tries 0

tunable choose_local_fallback_tries 0

tunable choose_total_tries 50

tunable chooseleaf_descend_once 1

# devices

device 0 osd.0

device 1 osd.1

device 2 osd.2

# types

type 0 osd

type 1 host

type 2 chassis

type 3 rack

type 4 row

type 5 pdu

type 6 pod

type 7 room

type 8 datacenter

type 9 region

type 10 root

# buckets

host ceph-node2 {

                id -2                      # do not change unnecessarily

                # weight 0.000

                alg straw

hash 0   # rjenkins1

item osd.0 weight 0.000

}

host ceph-node3 {

                id -3                      # do not change unnecessarily

                # weight 0.000

                alg straw

hash 0   # rjenkins1

item osd.1 weight 0.000

}

host ceph-node1 {

                id -4                      # do not change unnecessarily

                # weight 0.000

                alg straw

hash 0   # rjenkins1

item osd.2 weight 0.000

}

root default {

                id -1                      # do not change unnecessarily

                # weight 0.000

                alg straw

hash 0   # rjenkins1

item ceph-node2 weight 0.000

item ceph-node3 weight 0.000

item ceph-node1 weight 0.000

}

# rules

rule replicated_ruleset {

ruleset 0

type replicated

min_size 1

max_size 10

step take default

step chooseleaf firstn 0 type host

step emit

}

# end crush map

Is there anything wrong with it ?

root@ceph-node2:~# ceph osd dump

epoch 11

fsid c30e1410-fe1a-4924-9112-c7a5d789d273

created 2014-05-23 15:16:57.772981

modified 2014-05-23 15:18:17.022152

flags
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hashrjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspoolcrash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flagshashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hashrjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspoolstripe_width 0 max_osd 3
osd.0 up in weight 1 up_from 4 up_thru 5 down_at 0last_clean_interval [0,0) 192.168.123.49:6800/4714192.168.123.49:6801/4714 192.168.123.49:6802/4714192.168.123.49:6803/4714 exists,up bc991a4b-9e60-4759-b35a-7f58852aa804
osd.1 up in weight 1 up_from 8 up_thru 0 down_at 0last_clean_interval [0,0) 192.168.123.50:6800/4685192.168.123.50:6801/4685 192.168.123.50:6802/4685192.168.123.50:6803/4685 exists,up bd099d83-2483-42b9-9dbc-7f4e4043ca60
osd.2 up in weight 1 up_from 11 up_thru 0 down_at 0last_clean_interval [0,0) 192.168.123.53:6800/16807192.168.123.53:6801/16807 192.168.123.53:6802/16807192.168.123.53:6803/16807 exists,up 80a302d0-3493-4c39-b34b-5af233b32ba1
thanks
*Von:*ceph-users [mailto:[email protected]] *ImAuftrag von *Michael
*Gesendet:* Freitag, 23. Mai 2014 12:36
*An:* [email protected] <mailto:[email protected]>
*Betreff:* Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgsstuck unclean
64 PG's per pool /shouldn't/ cause any issues while there's only 3OSD's. It'll be something to pay attention to if a lot more get addedthrough.
Your replication setup is probably anything other than host.
You'll want to extract your crush map then decompile it and see ifyour "step" is set to osd or rack.
If it's not host then change it to that and pull it in again.
Check the docs on crush mapshttp://ceph.com/docs/master/rados/operations/crush-map/ for more info.
-Michael

On 23/05/2014 10:53, Karan Singh wrote:

    Try increasing the placement groups for pools

    ceph osd pool set data pg_num 128

    ceph osd pool set data pgp_num 128

    similarly for other 2 pools as well.

    - karan -

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fwd: Re: pgs incomplete; pgs stuck inactive; pgs stuck unclean

Reply via email to