Hi,
Under normal conditions one OSD on one host is not enough to get a
cluster healthy. You'd need a minimum of one OSD on three hosts to get
clean.
Your OSD dump shows "*replicated size 3 min_size 2*" so that's healthy
at 3 copies of data, not healthy at two copies but still usable, cluster
stops accepting data at one accessible copy.
-Michael
On 27/05/2014 18:38, Sudarsan, Rajesh wrote:
I am seeing the same error message with ceph health command. I am
using Ubuntu 14.04 with ceph 0.79. I am using the ceph distribution
that comes with the Ubuntu release. My configuration is
1 x mon
1x OSD
Both the OSD and mon are on the same host.
rsudarsa@rsudarsa-ce1:~/mycluster$ ceph -s
cluster 5330b56b-bfbb-4ff8-aeb8-138233c2bd9a
health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive; 192 pgs
stuck unclean
monmap e1: 1 mons at {rsudarsa-ce2=192.168.252.196:6789/0}, election
epoch 2, quorum 0 rsudarsa-ce2
osdmap e4: 1 osds: 1 up, 1 in
pgmap v12: 192 pgs, 3 pools, 0 bytes data, 0 objects
6603 MB used, 856 GB / 908 GB avail
192 incomplete
rsudarsa@rsudarsa-ce1:~/mycluster$ ceph osd tree
# id weight type name up/down reweight
-1 0.89 root default
-2 0.89 host rsudarsa-ce2
0 0.89 osd.0 up 1
rsudarsa@rsudarsa-ce1:~/mycluster$ ceph osd dump
epoch 4
fsid 5330b56b-bfbb-4ff8-aeb8-138233c2bd9a
created 2014-05-27 10:11:33.995272
modified 2014-05-27 10:13:34.157068
flags
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags
hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
stripe_width 0
max_osd 1
osd.0 up in weight 1 up_from 4 up_thru 0 down_at 0
last_clean_interval [0,0) 192.168.252.196:6800/7071
192.168.252.196:6801/7071 192.168.252.196:6802/7071
192.168.252.196:6803/7071 exists,up 8b1c2bbb-b2f0-4974-b0f5-266c558cc732
*From:*ceph-users [mailto:[email protected]] *On
Behalf Of *[email protected]
*Sent:* Friday, May 23, 2014 6:31 AM
*To:* [email protected]; [email protected]
*Subject:* Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs
stuck unclean
Thanks for your tips & tricks.
This setup is now based on ubuntu 12.04, ceph version 0.80.1
Still using
1 x mon
3 x osds
root@ceph-node2:~# ceph osd tree
# id weight type name up/down reweight
-1 0 root default
-2 0 host ceph-node2
0 0 osd.0 up 1
-3 0 host ceph-node3
1 0 osd.1 up 1
-4 0 host ceph-node1
2 0 osd.2 up 1
root@ceph-node2:~# ceph -s
cluster c30e1410-fe1a-4924-9112-c7a5d789d273
health HEALTH_WARN 192 pgs incomplete; 192 pgs stuck inactive;
192 pgs stuck unclean
monmap e1: 1 mons at {ceph-node1=192.168.123.48:6789/0}, election
epoch 2, quorum 0 ceph-node1
osdmap e11: 3 osds: 3 up, 3 in
pgmap v18: 192 pgs, 3 pools, 0 bytes data, 0 objects
102 MB used, 15224 MB / 15326 MB avail
192 incomplete
root@ceph-node2:~# cat mycrushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host ceph-node2 {
id -2 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item osd.0 weight 0.000
}
host ceph-node3 {
id -3 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item osd.1 weight 0.000
}
host ceph-node1 {
id -4 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item osd.2 weight 0.000
}
root default {
id -1 # do not change unnecessarily
# weight 0.000
alg straw
hash 0 # rjenkins1
item ceph-node2 weight 0.000
item ceph-node3 weight 0.000
item ceph-node1 weight 0.000
}
# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
Is there anything wrong with it ?
root@ceph-node2:~# ceph osd dump
epoch 11
fsid c30e1410-fe1a-4924-9112-c7a5d789d273
created 2014-05-23 15:16:57.772981
modified 2014-05-23 15:18:17.022152
flags
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags
hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 owner 0 flags hashpspool
stripe_width 0 max_osd 3
osd.0 up in weight 1 up_from 4 up_thru 5 down_at 0
last_clean_interval [0,0) 192.168.123.49:6800/4714
192.168.123.49:6801/4714 192.168.123.49:6802/4714
192.168.123.49:6803/4714 exists,up bc991a4b-9e60-4759-b35a-7f58852aa804
osd.1 up in weight 1 up_from 8 up_thru 0 down_at 0
last_clean_interval [0,0) 192.168.123.50:6800/4685
192.168.123.50:6801/4685 192.168.123.50:6802/4685
192.168.123.50:6803/4685 exists,up bd099d83-2483-42b9-9dbc-7f4e4043ca60
osd.2 up in weight 1 up_from 11 up_thru 0 down_at 0
last_clean_interval [0,0) 192.168.123.53:6800/16807
192.168.123.53:6801/16807 192.168.123.53:6802/16807
192.168.123.53:6803/16807 exists,up 80a302d0-3493-4c39-b34b-5af233b32ba1
thanks
*Von:*ceph-users [mailto:[email protected]] *Im
Auftrag von *Michael
*Gesendet:* Freitag, 23. Mai 2014 12:36
*An:* [email protected] <mailto:[email protected]>
*Betreff:* Re: [ceph-users] pgs incomplete; pgs stuck inactive; pgs
stuck unclean
64 PG's per pool /shouldn't/ cause any issues while there's only 3
OSD's. It'll be something to pay attention to if a lot more get added
through.
Your replication setup is probably anything other than host.
You'll want to extract your crush map then decompile it and see if
your "step" is set to osd or rack.
If it's not host then change it to that and pull it in again.
Check the docs on crush maps
http://ceph.com/docs/master/rados/operations/crush-map/ for more info.
-Michael
On 23/05/2014 10:53, Karan Singh wrote:
Try increasing the placement groups for pools
ceph osd pool set data pg_num 128
ceph osd pool set data pgp_num 128
similarly for other 2 pools as well.
- karan -
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com