Hoi Dan,

That did the trick. Thanks!

Regards,
Erwin


> Op 24 feb. 2022, om 20:25 heeft Dan van der Ster <[email protected]> het 
> volgende geschreven:
> 
> Hi Erwin,
> 
> This may be one of the rare cases where the default choose_total_tries
> = 50 is too low.
> You can try increasing it to 75 or 100 and see if crush can find 3 up OSDs.
> 
> Here's the basic recipe:
> 
> # ceph osd getcrushmap -o crush.map
> # crushtool -d crush.map -o crush.txt
> # vi crush.txt  # and change to tunable choose_total_tries 100
> # crushtool -c crush.txt -o crush.map2
> # ceph osd setcrushmap -i crush.map2
> 
> Cheers, dan
> 
> On Thu, Feb 24, 2022 at 6:29 PM Erwin Lubbers <[email protected]> wrote:
>> 
>> Hi all,
>> 
>> I have one active+clean+remapped PG on a 152 OSD Octopus (15.2.15) cluster 
>> with equal balanced OSD's (around 40% usage). The cluster has three replicas 
>> spreaded around three datacenters (A+B+C).
>> 
>> All PGs are available in each datacenter (as defined in the crush map), but 
>> only this one (which is in a pool containing 2048 PGs) is up on OSD.34 and 
>> OSD.42 and acting on OSD.34, OSD.42 and OSD.38.
>> 
>> OSD.34 is located in datacenter A, 42 in B and 38 in A again, but it should 
>> be in C.
>> 
>> I did restart all OSD's, monitors, managers and servers. I did out the OSDs 
>> that the PG is acting on and bring it back in a minute later. In all cases 
>> the PG holds the same state after backfilling, but one of the A replicas 
>> switches to another OSD in the A datacenter. I did turn off and on the 
>> balancer. But nothing seems to recover the PG to active+clean.
>> 
>> Any suggestions?
>> 
>> Regards,
>> Erwin
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
> 

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to