I think we are zero'ing in now on root cause for the stuck incomplete.  Looks 
like the common factor for all our stuck PGs is that they are all showing the 
removed OSD 8 in their "down_osds_we_would_probe" list (from "ceph pg <id> 
query").

For reference, I found a few archived threads of other people experiencing 
similar problems in the past:

  https://www.mail-archive.com/ceph-users@lists.ceph.com/msg13985.html
  
http://ceph-users.ceph.narkive.com/jJ2DyVw7/ceph-pgs-stuck-creating-after-running-force-create-pg
  http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-August/042338.html

The general consensus from those threads is that as long as 
down_osds_we_would_probe is pointing to any OSD that can't be reached, those 
PGs will remain stuck incomplete and can't be cured by force_create_pg or even 
"ceph osd lost".

Question: is there any command we can run to remove the old OSD from 
down_osds_we_would_probe?

I did try to create an new "fake" OSD.8 today (just created the OSD, but didn't 
bring it all the way up), and I was able to finally run "ceph osd lost 8".  Did 
not seem to have any impact.

If there is no command to removed the old OSD, I think our next step will be to 
bring up a new/real/empty OSD.8 and see if that will clear the log jam.  But 
seems like there should be a tool to deal with this kind of thing?

Thanks,

-- Dan


> On Sep 2, 2016, at 15:01, Dan Jakubiec <dan.jakub...@gmail.com> wrote:
> 
> Re-packaging this question which was buried in a larger, less-specific thread 
> from a couple of days ago.  Hoping this will be more useful here.
> 
> We have been working on restoring our Ceph cluster after losing a large 
> number of OSDs.  We have all PGs active now except for 80 PGs that are stuck 
> in the "incomplete" state.  These PGs are referencing OSD.8 which we removed 
> 2 weeks ago due to corruption.
> 
> We would like to abandon the "incomplete" PGs as they are not restorable.  We 
> have tried the following:
> 
> Per the docs, we made sure min_size on the corresponding pools was set to 1.  
> This did not clear the condition.
> Ceph would not let us issue "ceph osd lost N" because OSD.8 had already been 
> removed from the cluster.
> We also tried "ceph pg force_create_pg X" on all the PGs.  The 80 PGs moved 
> to "creating" for a few minutes but then all went back to "incomplete".
> 
> How do we abandon these PGs to allow recovery to continue?  Is there some way 
> to force individual PGs to be marked as "lost"?
> 
> 
> ====
> 
> Some miscellaneous data below:
> 
> djakubiec@dev:~$ ceph osd lost 8 --yes-i-really-mean-it
> osd.8 is not down or doesn't exist
> 
> 
> djakubiec@dev:~$ ceph osd tree
> ID WEIGHT   TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 58.19960 root default
> -2  7.27489     host node24
>  1  7.27489         osd.1        up  1.00000          1.00000
> -3  7.27489     host node25
>  2  7.27489         osd.2        up  1.00000          1.00000
> -4  7.27489     host node26
>  3  7.27489         osd.3        up  1.00000          1.00000
> -5  7.27489     host node27
>  4  7.27489         osd.4        up  1.00000          1.00000
> -6  7.27489     host node28
>  5  7.27489         osd.5        up  1.00000          1.00000
> -7  7.27489     host node29
>  6  7.27489         osd.6        up  1.00000          1.00000
> -8  7.27539     host node30
>  9  7.27539         osd.9        up  1.00000          1.00000
> -9  7.27489     host node31
>  7  7.27489         osd.7        up  1.00000          1.00000
> 
> BUT, even though OSD 8 no longer exists I see still lots of references to OSD 
> 8 in various ceph dumps and query's.
> 
> Interestingly, we do still see weird entries in the CRUSH map (should I do 
> something about these?):
> 
> # devices
> device 0 device0
> device 1 osd.1
> device 2 osd.2
> device 3 osd.3
> device 4 osd.4
> device 5 osd.5
> device 6 osd.6
> device 7 osd.7
> device 8 device8
> device 9 osd.9
> 
> 
> 
> And for what it is worth, here is the ceph -s:
> 
>     cluster 10d47013-8c2a-40c1-9b4a-214770414234
>      health HEALTH_ERR
>             212 pgs are stuck inactive for more than 300 seconds
>             93 pgs backfill_wait
>             1 pgs backfilling
>             101 pgs degraded
>             63 pgs down
>             80 pgs incomplete
>             89 pgs inconsistent
>             4 pgs recovery_wait
>             1 pgs repair
>             132 pgs stale
>             80 pgs stuck inactive
>             132 pgs stuck stale
>             103 pgs stuck unclean
>             97 pgs undersized
>             2 requests are blocked > 32 sec
>             recovery 4394354/46343776 objects degraded (9.482%)
>             recovery 4025310/46343776 objects misplaced (8.686%)
>             2157 scrub errors
>             mds cluster is degraded
>      monmap e1: 3 mons at 
> {core=10.0.1.249:6789/0,db=10.0.1.251:6789/0,dev=10.0.1.250:6789/0}
>             election epoch 266, quorum 0,1,2 core,dev,db
>       fsmap e3627: 1/1/1 up {0=core=up:replay}
>      osdmap e4293: 8 osds: 8 up, 8 in; 144 remapped pgs
>             flags sortbitwise
>       pgmap v1866639: 744 pgs, 10 pools, 7668 GB data, 20673 kobjects
>             8339 GB used, 51257 GB / 59596 GB avail
>             4394354/46343776 objects degraded (9.482%)
>             4025310/46343776 objects misplaced (8.686%)
>                  362 active+clean
>                  112 stale+active+clean
>                   89 active+undersized+degraded+remapped+wait_backfill
>                   66 active+clean+inconsistent
>                   63 down+incomplete
>                   19 stale+active+clean+inconsistent
>                   17 incomplete
>                    5 active+undersized+degraded+remapped
>                    4 active+recovery_wait+degraded
>                    2 
> active+undersized+degraded+remapped+inconsistent+wait_backfill
>                    1 stale+active+clean+scrubbing+deep+inconsistent+repair
>                    1 active+remapped+inconsistent+wait_backfill
>                    1 active+clean+scrubbing+deep
>                    1 active+remapped+wait_backfill
>                    1 active+undersized+degraded+remapped+backfilling
> 
> 
> 
> Thanks,
> 
> -- Dan
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to