While I'm waiting for these OSDs to drain, is there any way to
prioritize certain PGs to recover/backfill first?
In this case, I'd prefer to prioritize the PGs that are on the two OSDs
that I'm draining.
There have been other times I've wanted to manually boost a recovery
though. Most times when I see an object blocked, I'd like to move it's
PG to the front of the recovery line. It seems to happen the most often
on RGW directories, since they get a lot of activity. When that
happens, RGW is effectively down until whenever the affected PG gets
around to recovering.
I have osd max backfills = 1. Maybe that comes into play here?
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
On 4/12/14 13:29 , Craig Lewis wrote:
From another discussion, I learned about ceph osd lost.
I'm draining osd 1 and 3 (ceph osd out). Once they're empty, I'll
mark them lost and see if that helps.
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
On 4/12/14 12:43 , Craig Lewis wrote:
I reformatted 2 OSDs, in a cluster with 2 replicas. I tried to get
as much data off them as possible before hand, using ceph osd out,
but I couldn't get it all.
I know I've lost data.
I have 1 incomplete PG, which is better than I expected. Following
previous advice, I ran
ceph pg force_create_pg 11.483
The PG switches to 'creating' for a while, then goes back to
'incomplete':
2014-04-12 12:20:22.356297 mon.0 [INF] pgmap v5602996: 2592 pgs: 2035
active+clean, 553 active+remapped+wait_backfill, 2
active+recovery_wait, 1 active+remapped+backfilling, 1 incomplete;
15086 GB data, 30576 GB used, 29011 GB / 59588 GB avail;
4606075/41313663 objects degraded (11.149%); 24965 kB/s, 34 objects/s
recovering
2014-04-12 12:20:25.737277 mon.0 [INF] pgmap v5602997: 2592 pgs: 1
creating, 2035 active+clean, 553 active+remapped+wait_backfill, 2
active+recovery_wait, 1 active+remapped+backfilling; 15086 GB data,
30576 GB used, 29011 GB / 59588 GB avail; 4606075/41313663 objects
degraded (11.149%); 16179 kB/s, 22 objects/s recovering
<snip>
2014-04-12 12:21:29.141144 osd.3 [WRN] 3 slow requests, 1 included
below; oldest blocked for > 444.032652 secs
2014-04-12 12:21:29.141148 osd.3 [WRN] slow request 30.377846 seconds
old, received at 2014-04-12 12:20:58.763265:
osd_op(client.57449388.0:1 .dir.us-west-1.51941060.1 [delete]
11.7c96a483 e28552) v4 currently reached pg
<snip>
2014-04-12 12:23:33.160096 mon.0 [INF] osdmap e28553: 16 osds: 16 up,
16 in
2014-04-12 12:23:33.197448 mon.0 [INF] pgmap v5603063: 2592 pgs: 1
creating, 2037 active+clean, 552 active+remapped+wait_backfill, 2
active+remapped+backfilling; 15086 GB data, 30584 GB used, 29003 GB /
59588 GB avail; 4597857/41313663 objects degraded (11.129%); 26137
kB/s, 28 objects/s recovering
2014-04-12 12:23:34.196847 mon.0 [INF] osdmap e28554: 16 osds: 16 up,
16 in
2014-04-12 12:23:34.224192 mon.0 [INF] pgmap v5603064: 2592 pgs: 2037
active+clean, 552 active+remapped+wait_backfill, 2
active+remapped+backfilling, 1 incomplete; 15086 GB data, 30585 GB
used, 29002 GB / 59588 GB avail; 4597857/41313663 objects degraded
(11.129%)
The blocked object is on the incomplete PG.
PG query is 2.3MiB:
https://cd.centraldesktop.com/p/eAAAAAAADSsLAAAAAH2kja0
The query is from after the PG switched back to incomplete.
I'm running 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60).
How can I get this PG clean again?
Once it's clean, is there a RGW fsck/scrub I can run?
Any advice is appreciated.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email [email protected] <mailto:[email protected]>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com