Hi, > On 7 Sep 2023, at 18:21, J-P Methot <[email protected]> wrote: > > Since my post, we've been speaking with a member of the Ceph dev team. He > did, at first, believe it was an issue linked to the common performance > degradation after huge deletes operation. So we did do offline compactions on > all our OSDs. It fixed nothing and we are going through the logs to try and > figure this out. > > To answer your question, no the OSD doesn't restart after it logs the > timeout. It manages to get back online by itself, at the cost of sluggish > performances for the cluster and high iowait on VMs. > > We mostly run RBD workloads. > > Deep scrubs or no deep scrubs doesn't appear to change anything. Deactivating > scrubs altogether did not impact performances in any way. > > Furthermore, I'll stress that this is only happening since we upgraded to the > latest Pacific, yesterday.
What is your previous release version? What is your OSD drives models? The timeout are always 15s? Not 7s, not 17s? Thanks, k _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
