Can you post 'ceph osd dump --format=json-pretty'? I'm guessing that the
replication level or crush rules are such that a single host with 6 osds
can't satisfy it.
sage
On Tue, 27 Aug 2013, Johannes Klarenbeek wrote:
>
> Hi,
>
>
>
> It seems that all my pgs are stuck somewhat. I?m not sure what to do from
> here. I waited a day in the hope that ceph would find a way to deal with
> this? but nothing happened.
>
> I?m testing on a single ubuntu server 13.04 with dumpling 0.67.2. Below is
> my ceph status.
>
>
>
> root@cephnode2:/root# ceph -s
>
> cluster 9087eb7a-abe1-4d38-99dc-cb6b266f0f84
>
> health HEALTH_WARN 37 pgs degraded; 192 pgs stuck unclean
>
> monmap e1: 1 mons at {cephnode2=172.16.1.2:6789/0}, election epoch 1,
> quorum 0 cephnode2
>
> osdmap e38: 6 osds: 6 up, 6 in
>
> pgmap v65: 192 pgs: 155 active+remapped, 37 active+degraded; 0 bytes
> data, 213 MB used, 11172 GB / 11172 GB avail
>
> mdsmap e1: 0/0/1 up
>
>
>
> root@cephnode2:/root# ceph osd tree
>
> # id weight type name up/down reweight
>
> -1 10.92 root default
>
> -2 10.92 host cephnode2
>
> 0 1.82 osd.0 up 1
>
> 1 1.82 osd.1 up 1
>
> 2 1.82 osd.2 up 1
>
> 3 1.82 osd.3 up 1
>
> 4 1.82 osd.4 up 1
>
> 5 1.82 osd.5 up 1
>
>
>
> root@cephnode2:/root#ceph health detail
>
> HEALTH_WARN 37 pgs degraded; 192 pgs stuck unclean
>
> pg 0.3f is stuck unclean since forever, current state active+remapped, last
> acting [2,0]
>
> pg 1.3e is stuck unclean since forever, current state active+remapped, last
> acting [2,0]
>
> pg 2.3d is stuck unclean since forever, current state active+remapped, last
> acting [2,0]
>
> pg 0.3e is stuck unclean since forever, current state active+remapped, last
> acting [4,0]
>
> pg 1.3f is stuck unclean since forever, current state active+remapped, last
> acting [1,0]
>
> pg 2.3c is stuck unclean since forever, current state active+remapped, last
> acting [4,0]
>
> pg 0.3d is stuck unclean since forever, current state active+degraded, last
> acting [0]
>
> pg 1.3c is stuck unclean since forever, current state active+degraded, last
> acting [0]
>
> pg 2.3f is stuck unclean since forever, current state active+remapped, last
> acting [4,1]
>
> pg 0.3c is stuck unclean since forever, current state active+remapped, last
> acting [3,1]
>
> pg 1.3d is stuck unclean since forever, current state active+remapped, last
> acting [4,0]
>
> pg 2.3e is stuck unclean since forever, current state active+remapped, last
> acting [1,0]
>
> pg 0.3b is stuck unclean since forever, current state active+degraded, last
> acting [0]
>
> pg 1.3a is stuck unclean since forever, current state active+degraded, last
> acting [0]
>
> pg 2.39 is stuck unclean since forever, current state active+degraded, last
> acting [0]
>
> pg 0.3a is stuck unclean since forever, current state active+remapped, last
> acting [1,0]
>
> pg 1.3b is stuck unclean since forever, current state active+remapped, last
> acting [3,1]
>
> pg 2.38 is stuck unclean since forever, current state active+remapped, last
> acting [1,0]
>
> pg 0.39 is stuck unclean since forever, current state active+degraded, last
> acting [0]
>
> pg 1.38 is stuck unclean since forever, current state active+degraded, last
> acting [0]
>
> pg 2.3b is stuck unclean since forever, current state active+degraded, last
> acting [0]
>
> pg 0.38 is stuck unclean since forever, current state active+remapped, last
> acting [1,0]
>
> pg 1.39 is stuck unclean since forever, current state active+remapped, last
> acting [1,0]
>
> pg 2.3a is stuck unclean since forever, current state active+remapped, last
> acting [3,1]
>
> pg 0.37 is stuck unclean since forever, current state active+remapped, last
> acting [3,2]
>
> [?] and many more.
>
>
>
> I found one entry on the mailing list from someone that had a similar issue
> and he fixed it with the following commands:
>
>
>
> #ceph osd getcrushmap -o /tmp/crush
>
> #crushtool -i /tmp/crush --enable-unsafe-tunables
>
> --set-choose-local-tries 0 --set-choose-local-fallback-tries 0
>
> --set-choose-total-tries 50 -o /tmp/crush.new
>
> root@ceph-admin:/etc/ceph# ceph osd setcrushmap -i /tmp/crush.new
>
>
>
> but I?m not sure what he is trying to do here. Especially
> ?enable-unsafe-tunables seems a little ? unsafe.
>
>
>
> I also read
> thishttp://eu.ceph.com/docs/wip-3060/ops/manage/failures/osd/#failures-osd-unfo
> und link. But it doesn?t detail about any actions that one can do in order
> to fix it to a HEALTH_OK status.
>
>
>
>
>
> Regards,
>
> Johannes
>
>
>
> __________ Informatie van ESET Endpoint Antivirus, versie van database
> viruskenmerken 8733 (20130827) __________
>
> Het bericht is gecontroleerd door ESET Endpoint Antivirus.
>
> http://www.eset.com
>
>
> _______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com