On Dec 29, 2014, Christian Eichelmann <christian.eichelm...@1und1.de> wrote:

> After we got everything up and running again, we still have 3 PGs in the
> state incomplete. I was checking one of them directly on the systems
> (replication factor is 3).

I have run into this myself at least twice before.  I had not lost or
replaced the OSDs altogether, though; I had just rolled too many of them
back to an earlier snapshots, which required them to be backfilled to
catch up.  It looks like an OSD won't get out of incomplete state, even
to backfill others, if this would keep the PG active size under the min
size for the pool.

In my case, I brought the current-ish snapshot of the OSD back up to
enable backfilling of enough replicas, so that I could then roll the
remaining OSDs back again and have them backfilled too.

However, I suspect that temporarily setting min size to a lower number
could be enough for the PGs to recover.  If "ceph osd pool <pool> set
min_size 1" doesn't get the PGs going, I suppose restarting at least one
of the OSDs involved in the recovery, so that they PG undergoes peering
again, would get you going again.

Once backfilling completes for all formerly-incomplete PGs, or maybe
even as soon as backfilling begins, bringing the pool min_size back up
to (presumably) 2 is advisable.  You don't want to be running too long
with a too-low min size :-)

I hope this helps,

Happy GNU Year,

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to