I definitely saw it on a Hammer cluster, though I decided to check my IRC logs 
for more context and found that in my specific cases it was due to PGs going 
incomplete. `ceph health detail` offered the following, for instance:

pg 8.31f is remapped+incomplete, acting [39] (reducing pool one min_size from 2 
may help; search ceph.com/docs for 'incomplete')

And I had to do it on at least a couple of occasions while managing that 
cluster. I don't remember ever having the issue again after going to Infernalis 
and beyond, though. FWIW it was a 60-disk cluster with an above-average failure 
rate cause many of my disks were donations from another project and were 
several years old already.

I guess my curiosity is sated - min_size is relevant when you're also 
considering the transient faults that may take disks down and up to prevent 
inconsistent state and lost writes. It's not so relevant when you're talking 
about complete disk failures, because if a replica is irretrievably lost all 
you can do is rebuild it anyway, and you're only $size badly-timed disk fails 
away from losing a PG entirely regardless of the setting of min_size.

On 21/03/17 23:14, Anthony D'Atri wrote:
> I’m fairly sure I saw it as recently as Hammer, definitely Firefly. YMMV.
> 
> 
>> On Mar 21, 2017, at 4:09 PM, Gregory Farnum <[email protected]> wrote:
>>
>> You shouldn't need to set min_size to 1 in order to heal any more. That was 
>> the case a long time ago but it's been several major LTS releases now. :)
>> So: just don't ever set min_size to 1.
>> -Greg
>> On Tue, Mar 21, 2017 at 6:04 PM Anthony D'Atri <[email protected]> wrote:
>>>> a min_size of 1 is dangerous though because it means you are 1 hard disk 
>>>> failure away from losing the objects within that placement group entirely. 
>>>> a min_size of 2 is generally considered the minimum you want but many 
>>>> people ignore that advice, some wish they hadn't.
>>>
>>> I admit I am having difficulty following why this is the case
>>
>> I think we have a case of fervently agreeing.
>>
>> Setting min_size on a specific pool to 1 to allow PG’s to heal is absolutely 
>> a normal thing in certain circumstances, but it’s important to
>>
>> 1) Know _exactly_ what you’re doing, to which pool, and why
>> 2) Do it very carefully, changing ‘size’ instead of ‘min_size’ on a busy 
>> pool with a bunch of PG’s and data can be quite the rude awakening.
>> 3) Most importantly, _only_ set it for the minimum time needed, with eyes 
>> watching the healing, and set it back immediately after all affected PG’s 
>> have peered and healed.
>>
>> The danger, which I think is what Wes was getting at, is in leaving it set 
>> to 1 all the time, or forgetting to revert it.  THAT is, as we used to say, 
>> begging to lose.
>>
>> — aad

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to