Re: [ceph-users] pg incomplete, won't create

Craig Lewis Mon, 14 Apr 2014 14:55:40 -0700

Once the OSDs drained, the PG stayed in state incomplete.


When I stopped down the out OSDs, the PG went to state down+peering

After marking the down OSDs lost, the PG went to state down+incomplete

After running ceph pg force_create_pg 11.483, the PG went to statecreating. It stayed that way for a while, then switched back to incomplete.

I shutdown the cluster, and started everything back up (making sure tokeep the drained OSDs off).

The PG came back to incomplete. ceph pg force_create_pg flips it tocreating, then the cluster flips it back to incomplete.

It continues to try to probe down OSDs, despire them having been markedlost.

root@ceph0c:/etc/cron.d# ceph osd dump | awk '$1 ~ /^osd/ { print $1,$2, $3, $4, $5;}'

osd.0 up in weight 1
osd.2 up in weight 0.969986
osd.5 up in weight 0.949997
osd.6 up in weight 0.969986
osd.7 up in weight 1
osd.8 up in weight 1
osd.9 up in weight 1
osd.10 up in weight 1
osd.12 up in weight 1
osd.13 up in weight 1
osd.14 up in weight 1
osd.15 up in weight 1

root@ceph0c:/etc/cron.d# ceph pg 11.483 query | tail -11
          "probing_osds": [
                0,
                2,
                13],
          "down_osds_we_would_probe": [
                3,
                4,
                11],
          "peering_blocked_by": []},
        { "name": "Started",
          "enter_time": "2014-04-14 14:49:38.444888"}]}







*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>

*Central Desktop. Work together in ways you never thought possible.*

Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>


On 4/12/14 13:29 , Craig Lewis wrote:

From another discussion, I learned about ceph osd lost.
I'm draining osd 1 and 3 (ceph osd out). Once they're empty, I'llmark them lost and see if that helps.
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>
On 4/12/14 12:43 , Craig Lewis wrote:
I reformatted 2 OSDs, in a cluster with 2 replicas. I tried to getas much data off them as possible before hand, using ceph osd out,but I couldn't get it all.
I know I've lost data.
I have 1 incomplete PG, which is better than I expected. Followingprevious advice, I ran
ceph pg force_create_pg 11.483
The PG switches to 'creating' for a while, then goes back to'incomplete':2014-04-12 12:20:22.356297 mon.0 [INF] pgmap v5602996: 2592 pgs: 2035active+clean, 553 active+remapped+wait_backfill, 2active+recovery_wait, 1 active+remapped+backfilling, 1 incomplete;15086 GB data, 30576 GB used, 29011 GB / 59588 GB avail;4606075/41313663 objects degraded (11.149%); 24965 kB/s, 34 objects/srecovering2014-04-12 12:20:25.737277 mon.0 [INF] pgmap v5602997: 2592 pgs: 1creating, 2035 active+clean, 553 active+remapped+wait_backfill, 2active+recovery_wait, 1 active+remapped+backfilling; 15086 GB data,30576 GB used, 29011 GB / 59588 GB avail; 4606075/41313663 objectsdegraded (11.149%); 16179 kB/s, 22 objects/s recovering
<snip>
2014-04-12 12:21:29.141144 osd.3 [WRN] 3 slow requests, 1 includedbelow; oldest blocked for > 444.032652 secs2014-04-12 12:21:29.141148 osd.3 [WRN] slow request 30.377846 secondsold, received at 2014-04-12 12:20:58.763265:osd_op(client.57449388.0:1 .dir.us-west-1.51941060.1 [delete]11.7c96a483 e28552) v4 currently reached pg
<snip>
2014-04-12 12:23:33.160096 mon.0 [INF] osdmap e28553: 16 osds: 16 up,16 in2014-04-12 12:23:33.197448 mon.0 [INF] pgmap v5603063: 2592 pgs: 1creating, 2037 active+clean, 552 active+remapped+wait_backfill, 2active+remapped+backfilling; 15086 GB data, 30584 GB used, 29003 GB /59588 GB avail; 4597857/41313663 objects degraded (11.129%); 26137kB/s, 28 objects/s recovering2014-04-12 12:23:34.196847 mon.0 [INF] osdmap e28554: 16 osds: 16 up,16 in2014-04-12 12:23:34.224192 mon.0 [INF] pgmap v5603064: 2592 pgs: 2037active+clean, 552 active+remapped+wait_backfill, 2active+remapped+backfilling, 1 incomplete; 15086 GB data, 30585 GBused, 29002 GB / 59588 GB avail; 4597857/41313663 objects degraded(11.129%)
The blocked object is on the incomplete PG.
PG query is 2.3MiB:https://cd.centraldesktop.com/p/eAAAAAAADSsLAAAAAH2kja0
The query is from after the PG switched back to incomplete.

I'm running 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60).


How can I get this PG clean again?

Once it's clean, is there a RGW fsck/scrub I can run?

Any advice is appreciated.

--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg incomplete, won't create

Reply via email to