Hi, I have retested with 4K blocks - results are below.
I am currently using 4 OSDs per Optane 900P drive. This was based on some posts I found on Proxmox Forums, and what seems to be "tribal knowledge" there. I also saw this presentation <https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/M1205%20-Ceph%20BlueStore%20performance%20on%20latest%20Intel%20Server%20Platforms%20Distribution.pdf>, which mentions on page 14: 2-4 OSDs/NVMe SSD and 4-6 NVMe SSDs per node are sweet spots Has anybody done much testing with pure Optane drives for Ceph? (Paper above seems to use them mixed with traditional SSDs). Would increasing the number of OSDs help in this scenario? I am happy to try that - I assume I will need to blow away all the existing OSDs/Ceph setup and start again, of course. Here are the rados bench results with 4K - the write IOPS are still a tad short of 15,000 - is that what I should be aiming for? Write result: # rados bench -p proxmox_vms 60 write -b 4K -t 16 --no-cleanup Total time run: 60.001016 Total writes made: 726749 Write size: 4096 Object size: 4096 Bandwidth (MB/sec): 47.3136 Stddev Bandwidth: 2.16408 Max bandwidth (MB/sec): 48.7344 Min bandwidth (MB/sec): 38.5078 Average IOPS: 12112 Stddev IOPS: 554 Max IOPS: 12476 Min IOPS: 9858 Average Latency(s): 0.00132019 Stddev Latency(s): 0.000670617 Max latency(s): 0.065541 Min latency(s): 0.000689406 Sequential read result: # rados bench -p proxmox_vms 60 seq -t 16 Total time run: 17.098593 Total reads made: 726749 Read size: 4096 Object size: 4096 Bandwidth (MB/sec): 166.029 Average IOPS: 42503 Stddev IOPS: 218 Max IOPS: 42978 Min IOPS: 42192 Average Latency(s): 0.000369021 Max latency(s): 0.00543175 Min latency(s): 0.000170024 Random read result: # rados bench -p proxmox_vms 60 rand -t 16 Total time run: 60.000282 Total reads made: 2708799 Read size: 4096 Object size: 4096 Bandwidth (MB/sec): 176.353 Average IOPS: 45146 Stddev IOPS: 310 Max IOPS: 45754 Min IOPS: 44506 Average Latency(s): 0.000347637 Max latency(s): 0.00457886 Min latency(s): 0.000138381 I am happy to try with fio -ioengine =rbd (the reason I use rados bench is because that is what was used in the Proxmox Ceph benchmark paper <https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark>) however, is there a common community-suggested starting command line that makes it easy to compare results? (fio seems quite complex in terms of options). Thanks, Victor On Sun, Mar 10, 2019 at 6:15 AM Vitaliy Filippov <vita...@yourcmc.ru> wrote: > Welcome to our "slow ceph" party :))) > > However I have to note that: > > 1) 500000 iops is for 4 KB blocks. You're testing it with 4 MB ones. > That's kind of unfair comparison. > > 2) fio -ioengine=rbd is better than rados bench for testing. > > 3) You can't "compensate" for Ceph's overhead even by having infinitely > fast disks. > > At its simplest, imagine that disk I/O takes X microseconds and Ceph's > overhead is Y for a single operation. > > Suppose there is no parallelism. Then raw disk IOPS = 1000000/X and Ceph > IOPS = 1000000/(X+Y). Y is currently quite long, something around 400-800 > microseconds or so. So the best IOPS number you can squeeze out of a > single client thread (a DBMS, for example) is 1000000/400 = only ~2500 > iops. > > Parallel iops are of course better, but still you won't get anything > close > to 500000 iops from a single OSD. The expected number is around 15000. > Create multiple OSDs on a single NVMe and sacrifice your CPU usage if you > want better results. > > -- > With best regards, > Vitaliy Filippov >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com