Hi,

> On 7 Sep 2023, at 18:21, J-P Methot <[email protected]> wrote:
> 
> Since my post, we've been speaking with a member of the Ceph dev team. He 
> did, at first, believe it was an issue linked to the common performance 
> degradation after huge deletes operation. So we did do offline compactions on 
> all our OSDs. It fixed nothing and we are going through the logs to try and 
> figure this out.
> 
> To answer your question, no the OSD doesn't restart after it logs the 
> timeout. It manages to get back online by itself, at the cost of sluggish 
> performances for the cluster and high iowait on VMs.
> 
> We mostly run RBD workloads.
> 
> Deep scrubs or no deep scrubs doesn't appear to change anything. Deactivating 
> scrubs altogether did not impact performances in any way.
> 
> Furthermore, I'll stress that this is only happening since we upgraded to the 
> latest Pacific, yesterday.

What is your previous release version? What is your OSD drives models?
The timeout are always 15s? Not 7s, not 17s?


Thanks,
k
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to