[ceph-users] Improving write performance on ceph 17.6.2 HDDs + DB/WAL storage on nvme

alexey . blinkov Fri, 16 Jun 2023 11:59:09 -0700

Greetings. My cluster consists of 3 nodes. Each node has 4 OSD HDDs with a 
capacity of 6 TB each and 1 nvme for db/wal storage. 2 * 10Gbps network 
assembled in bond and some parameters changed in rc.local to improve 
performance. They are below:


#Set network interface buffer size
ethtool -G eno1 rx 4096 tx 4096; ethtool -G eno2 rx 4096 tx 4096

#Set txqueuelen eno1, eno2, bond0, vmbr0
ip link set eno1 txqueuelen 10000
ip link set eno2 txqueuelen 10000
ip link set bond0 txqueuelen 20000
ip link set vmbr0 txqueuelen 20000
ip link set vmbr0.4040 txqueuelen 10000
ip link set vmbr0.4043 txqueuelen 10000
ip link set vmbr0.4045 txqueuelen 10000
ip link set vmbr0.4053 txqueuelen 10000

#Disable Offload
ethtool -K eno1 gso off gro off lro off tso off
ethtool -K eno2 gso off gro off lro off tso off
ethtool -K eno1 rx on tx on sg on
ethtool -K eno2 rx on tx on sg on
ethtool -L eno1 combined 8
ethtool -L eno2 combined 8
ethtool -K eno1 rxhash on
ethtool -K eno2 rxhash on
rss-ladder eno1 0
rss-ladder eno2 1

With the default settings of ceph.conf, when testing a 1 GB file with the 
SEQ1MQ8T1 profile inside the kvm virtual machine, I get the following write 
throughput: 152 MB/s
The bandwidth of each of the disks, if you do not use CEPH, is at least 200 
MB/s, and I have 4 of them per node. In total, this is 800 MB/s. If we discard 
the overhead and abstraction levels, a loss of 25% of performance is possible, 
this should be +- 600 MB/s I may be wrong, but I'm purely theoretically 
guessing.
When testing and monitoring disk load with the nmon utility, I see write rates 
of 30 to 50 MB/s per node disk.
The network cannot be a bottleneck, since the nodes are empty and there is only 
1 virtual machine on them that generates the load.
iperf3 from one node to another passes at least 8 Gbit/s, which in the test 
will completely allow to drive 4.8 Gbit/s equal to 600 MB/s

Write caching in the virtual machine is disabled.

I started changing the [osd] parameters in the ceph.conf file with these 
changes, I managed to increase the read speed by 2 times from 195 MB/s to 550 
MB/s, but the write speed did not budge. The conversation now is not about 
IOPS, but about the write speed. What needs to be changed? Below is my ceph.conf

[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.50.251.0/24
         fsid = 7d00b675-2f1e-47ff-a71c-b95d1745bc39
         mon_allow_pool_delete = true
         mon_host = 10.50.250.1 10.50.250.2 10.50.250.3
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.50.250.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

#[osd]
#        bluestore_compression_mode = none
#        bluestore_cache_autotune = true
#        bluestore_cache_size = 3221225472
#        bluestore_cache_kv_ratio = 0.2
#        bluestore_cache_kv_max = 1610612736
#        bluestore_min_alloc_size_hdd = 65536
#        bluestore_max_alloc_size_hdd = 67108864
#        bluestore_min_alloc_size_ssd = 16384
#        bluestore_max_alloc_size_ssd = 33554432
#        bluestore_throttle_bytes = 268435456
#        bluestore_throttle_deferred_bytes = 1073741824
#        bluestore_rocksdb_options = 
write_buffer_size=268435456;max_write_buffer_number=4
#        osd_op_num_threads_per_shard = 4
#        osd_client_message_cap = 1024
#        osd_client_message_size_cap = 536870912

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.nd01]
         host = nd01
         mds_standby_for_name = pve

[mds.nd02]
         host = nd02
         mds_standby_for_name = pve

[mds.nd03]
         host = nd03
         mds_standby_for_name = pve

[mon.nd01]
         public_addr = 10.50.250.1

[mon.nd02]
         public_addr = 10.50.250.2

[mon.nd03]
         public_addr = 10.50.250.3

Perhaps I do not quite understand the philosophy, logic and functionality of 
CEPH? I am aware that there may be competition for resources and so on. But I 
see a clear 50-50 write throughput split when another VM runs a similar test. 
And at a speed of 150 MB / s, it is very scary that at some point there will be 
competition between virtual machines and the write speed may drop to very low
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Improving write performance on ceph 17.6.2 HDDs + DB/WAL storage on nvme

Reply via email to