thanks for the suggestion.

is a rolling reboot sufficient? or must all osd's be down at the same time ?
one is no problem.  the other takes some scheduling..

Ronny Aasen


On 01.11.2016 21:52, [email protected] wrote:
Hello Ronny,

if it is possible for you, try to Reboot all OSD Nodes.

I had this issue on my test Cluster and it become healthy after rebooting.

Hth
- Mehmet

Am 1. November 2016 19:55:07 MEZ, schrieb Ronny Aasen <[email protected]>:

    Hello.

    I have a cluster stuck with 2 pg's stuck undersized degraded, with 25
    unfound objects.

    # ceph health detail
    HEALTH_WARN 2 pgs degraded; 2 pgs recovering; 2 pgs stuck degraded; 2 pgs 
stuck unclean; 2 pgs stuck undersized; 2 pgs undersized; recovery 
294599/149522370 objects degraded (0.197%); recovery 640073/149522370 objects 
misplaced (0.428%); recovery 25/46579241 unfound (0.000%); noout flag(s) set
    pg 6.d4 is stuck unclean for 8893374.380079, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
    pg 6.ab is stuck unclean for 8896787.249470, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]
    pg 6.d4 is stuck undersized for 438122.427341, current state 
active+recovering+undersized+degraded+remapped, last acting [62]
    pg 6.ab is stuck undersized for 416947.461950, current state 
active+recovering+undersized+degraded+remapped, last acting [18,12]*pg 6.d4 is 
stuck degraded for 438122.427402, current state
    active+recovering+undersized+degraded+remapped, last acting [62]
    pg 6.ab is stuck degraded for 416947.462010, current state
    active+recovering+undersized+degraded+remapped, last acting
    [18,12] pg 6.d4 is active+recovering+undersized+degraded+remapped,
    acting [62], 25 unfound pg 6.ab is
    active+recovering+undersized+degraded+remapped, acting [18,12]
    recovery 294599/149522370 objects degraded (0.197%) recovery
    640073/149522370 objects misplaced (0.428%) recovery 25/46579241
    unfound (0.000%) noout flag(s) set have been following the
    troubleshooting guide at
    http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/
    but gets stuck without a resolution. luckily it is not critical
    data. so i wanted to mark the pg lost so it could become
    health-ok< br /> # ceph pg 6.d4 mark_unfound_lost delete Error
    EINVAL: pg has 25 unfound objects but we haven't probed all
    sources, not marking lost querying the pg i see that it would want
    osd.80 and osd 36 { "osd": "80", "status": "osd is down" }, trying
    to mark the osd's lost does not work either. since the osd's was
    removed from the cluster a long time ago. # ceph osd lost 80
    --yes-i-really-mean-it osd.80 is not down or doesn't exist # ceph
    osd lost 36 --yes-i-really-mean-it osd.36 is not down or doesn't
    exist and this is where i am stuck. have tried stopping and
    starting the 3 osd's but that did not have any effect. Anyone have
    any advice how to proceed ? full output at:
    http://paste.debian.net/hidden/be03a185/ this is hammer 0.94.9 on
    debian 8. kind regards Ronny Aasen
    ------------------------------------------------------------------------
    ceph-users mailing list [email protected]
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com *

**

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to