[ceph-users] PVE CEPH OSD heartbeat show

Peter Tue, 25 Apr 2023 12:04:26 -0700

Dear all,

We are experiencing with Ceph after deploying it by PVE with the network backed 
by a 10G Cisco switch with VPC feature on. We are encountering a slow OSD 
heartbeat and have not been able to identify any network traffic issues.


Upon checking, we found that the ping is around 0.1ms, and there is occasional 
2% packet loss when using flood ping, but not consistently. We also noticed a 
large number of UDP port 5405 packets and the 'corosync' process utilizing a 
significant amount of CPU.

When running the 'ceph -s' command, we observed a slow OSD heartbeat on the 
back and front, with the longest latency being 2250.54ms. We suspect that this 
may be a network issue, but we are unsure of how Ceph detects such long 
latency. Additionally, we are wondering if a 2% packet loss can significantly 
affect Ceph's performance and even cause the OSD process to fail sometimes.

We have heard about potential issues with rockdb 6 causing OSD process 
failures, and we are curious about how to check the rockdb version. 
Furthermore, we are wondering how severe traffic package loss and latency must 
be to cause OSD process crashes, and how the monitoring system determines that 
an OSD is offline.

We would greatly appreciate any assistance or insights you could provide on 
these matters.
Thanks,

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] PVE CEPH OSD heartbeat show

Reply via email to