[ceph-users] Re: Performance issues

Ron Gage Sat, 02 Aug 2025 13:25:13 -0700


On 8/2/2025 4:07 PM, Anthony D'Atri wrote:

Here is the actual performance of the NFS mounted drive:
[root@o01 ~]# dd if=/dev/sdc of=/dev/null bs=4k status=progress

Null writes aren’t a good test, as they may be optimized away by any layer in 
the stack.  I suggest repeating with /dev/urandom.

I'm confused - why would I use urandom for testing raw read performanceon a local drive at the OSD level?

The NVME is a Samsung 990 Plus.  Not exactly entrprise grade, but it should do 
fairly well.  It is also fairly new having picked it up yesterday.  It's not 
going to be 100 MBit - for sure.

Have you applied the most recent firmware?  Issues have been reported in the 
past.  Note that in a power loss situation you may corrupt or lose data due to 
the apparent lack of PLP.
With updated firmware this is still a 0.33 DPWD drive, fwiw.

No on the recent firmware. Again, the raw read data already posted andthe ceph tell bench data all support the idea that I'm getting over 2Gbon this drive at the endpoint. Also, the DWPD rating is not germane tothis subject at all. Yes, it is low for a 4TB device but this is a testbench, not a enterprise SAN.

The NVME connection path:
NVME -> USB C Interface -> Nas Server (ubuntu 24.04) -> 2.5 GBit Ethernet (NFS) 
-> ProxMox vmbr1

Why the deep stack?  Why not have the OSD drives in cluster nodes with Ceph 
deployed converged?  That would be a lot less complicated.

Yes, it would be a lot less complicated if I had the physical space toput it, plus using it as an NFS export makes possible future sharing tothe rest of this cluster.


* Are all of the OSDs exported from the same NAS?  SPoF

TestLab

* The USB layer introduces latency
* Since you’re exporting via NFS, I assume that the USB M.2 drives have a local 
filesystem built on the NAS node, and large files created to export?  The 
filesystem layer introduces additional latency.
* The NFS layer introduces latency
* Proxmox drive / network emulation introduces additional latency
* If your cluster network is virtualized through Proxmox SDN, that’s additional 
latency.  Remember that every write is farmed out to other OSDs and has to 
through all those layers.


With all theses factors honestly you’re getting better perf than I would have 
expected.

I am getting exactly what I was expecting - near/at saturation levels onmy 2.5G ethernet network.

I am going to move forward with (probably mistaken) premise that thebench function is the problem here. On to other testing - S3 performance.


Ron Gage


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Performance issues

Reply via email to