Once the OSDs drained, the PG stayed in state incomplete.
When I stopped down the out OSDs, the PG went to state down+peering
After marking the down OSDs lost, the PG went to state down+incomplete
After running ceph pg force_create_pg 11.483, the PG went to state
creating. It stayed that way for a while, then switched back to incomplete.
I shutdown the cluster, and started everything back up (making sure to
keep the drained OSDs off).
The PG came back to incomplete. ceph pg force_create_pg flips it to
creating, then the cluster flips it back to incomplete.
It continues to try to probe down OSDs, despire them having been marked
lost.
root@ceph0c:/etc/cron.d# ceph osd dump | awk '$1 ~ /^osd/ { print $1,
$2, $3, $4, $5;}'
osd.0 up in weight 1
osd.2 up in weight 0.969986
osd.5 up in weight 0.949997
osd.6 up in weight 0.969986
osd.7 up in weight 1
osd.8 up in weight 1
osd.9 up in weight 1
osd.10 up in weight 1
osd.12 up in weight 1
osd.13 up in weight 1
osd.14 up in weight 1
osd.15 up in weight 1
root@ceph0c:/etc/cron.d# ceph pg 11.483 query | tail -11
"probing_osds": [
0,
2,
13],
"down_osds_we_would_probe": [
3,
4,
11],
"peering_blocked_by": []},
{ "name": "Started",
"enter_time": "2014-04-14 14:49:38.444888"}]}
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
On 4/12/14 13:29 , Craig Lewis wrote:
From another discussion, I learned about ceph osd lost.
I'm draining osd 1 and 3 (ceph osd out). Once they're empty, I'll
mark them lost and see if that helps.
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
On 4/12/14 12:43 , Craig Lewis wrote:
I reformatted 2 OSDs, in a cluster with 2 replicas. I tried to get
as much data off them as possible before hand, using ceph osd out,
but I couldn't get it all.
I know I've lost data.
I have 1 incomplete PG, which is better than I expected. Following
previous advice, I ran
ceph pg force_create_pg 11.483
The PG switches to 'creating' for a while, then goes back to
'incomplete':
2014-04-12 12:20:22.356297 mon.0 [INF] pgmap v5602996: 2592 pgs: 2035
active+clean, 553 active+remapped+wait_backfill, 2
active+recovery_wait, 1 active+remapped+backfilling, 1 incomplete;
15086 GB data, 30576 GB used, 29011 GB / 59588 GB avail;
4606075/41313663 objects degraded (11.149%); 24965 kB/s, 34 objects/s
recovering
2014-04-12 12:20:25.737277 mon.0 [INF] pgmap v5602997: 2592 pgs: 1
creating, 2035 active+clean, 553 active+remapped+wait_backfill, 2
active+recovery_wait, 1 active+remapped+backfilling; 15086 GB data,
30576 GB used, 29011 GB / 59588 GB avail; 4606075/41313663 objects
degraded (11.149%); 16179 kB/s, 22 objects/s recovering
<snip>
2014-04-12 12:21:29.141144 osd.3 [WRN] 3 slow requests, 1 included
below; oldest blocked for > 444.032652 secs
2014-04-12 12:21:29.141148 osd.3 [WRN] slow request 30.377846 seconds
old, received at 2014-04-12 12:20:58.763265:
osd_op(client.57449388.0:1 .dir.us-west-1.51941060.1 [delete]
11.7c96a483 e28552) v4 currently reached pg
<snip>
2014-04-12 12:23:33.160096 mon.0 [INF] osdmap e28553: 16 osds: 16 up,
16 in
2014-04-12 12:23:33.197448 mon.0 [INF] pgmap v5603063: 2592 pgs: 1
creating, 2037 active+clean, 552 active+remapped+wait_backfill, 2
active+remapped+backfilling; 15086 GB data, 30584 GB used, 29003 GB /
59588 GB avail; 4597857/41313663 objects degraded (11.129%); 26137
kB/s, 28 objects/s recovering
2014-04-12 12:23:34.196847 mon.0 [INF] osdmap e28554: 16 osds: 16 up,
16 in
2014-04-12 12:23:34.224192 mon.0 [INF] pgmap v5603064: 2592 pgs: 2037
active+clean, 552 active+remapped+wait_backfill, 2
active+remapped+backfilling, 1 incomplete; 15086 GB data, 30585 GB
used, 29002 GB / 59588 GB avail; 4597857/41313663 objects degraded
(11.129%)
The blocked object is on the incomplete PG.
PG query is 2.3MiB:
https://cd.centraldesktop.com/p/eAAAAAAADSsLAAAAAH2kja0
The query is from after the PG switched back to incomplete.
I'm running 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60).
How can I get this PG clean again?
Once it's clean, is there a RGW fsck/scrub I can run?
Any advice is appreciated.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com