Re: [ceph-users] Auto recovering after loosing all copies of a PG(s)

Iain Buclaw Thu, 01 Sep 2016 08:46:01 -0700

On 16 August 2016 at 17:13, Wido den Hollander <[email protected]> wrote:
>
>> Op 16 augustus 2016 om 15:59 schreef Iain Buclaw <[email protected]>:
>>
>>
>> The desired behaviour for me would be for the client to get an instant
>> "not found" response from stat() operations.  For write() to recreate
>> unfound objects.  And for missing placement groups to be recreated on
>> an OSD that isn't overloaded.  Halting the entire cluster when 96% of
>> it can still be accessed is just not workable, I'm afraid.
>>
>
> Well, you can't make Ceph do that, but you can make librados do such a thing.
>
> I'm using the OSD and MON timeout settings in libvirt for example: 
> http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/storage/storage_backend_rbd.c;h=9665fbca3a18fbfc7e4caec3ee8e991e13513275;hb=HEAD#l157
>
> You can set these options:
> - client_mount_timeout
> - rados_mon_op_timeout
> - rados_osd_op_timeout
>
> Where I think only the last two should be sufficient in your case.
>
> You wel get ETIMEDOUT back as error when a operation times out.
>
> Wido
>


This seems to be fine.

Now what to do when a DR situation happens.


      pgmap v592589: 4096 pgs, 1 pools, 1889 GB data, 244 Mobjects
            2485 GB used, 10691 GB / 13263 GB avail
                3902 active+clean
                 128 creating
                  66 incomplete


These PGs just never seem to finish creating.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Auto recovering after loosing all copies of a PG(s)

Reply via email to