Do you have BMC perf / thermal profiles set to performance ? Do you have TuneD 
installed with the latency-performance profile active to prevent deep C-states?

As people start deploying Crimson I’m curious how much these affect the 
experience.  

> On Aug 11, 2025, at 11:44 PM, Ki-taek Lee <ktlee4...@gmail.com> wrote:
> 
> Hello Ceph community,
> 
> I am evaluating Crimson OSD + Seastore performance for potential deployment
> in a distributed storage environment.
> With BlueStore, I have been able to achieve satisfying performance levels
> in my FIO tests for 4K random read/write IOPS.
> 
> However, when testing Crimson OSD + Seastore, I observed that 4K random
> read/write IOPS do not scale as expected when increasing the number of
> SSDs/OSDs. The performance plateaus beyond a certain point or is much lower
> than expected. (See attached test results.)
> 
> Test Environment:
> - Cluster: 8 clients, 1 OSD
> - Hardware: 40-core CPUs, 377 GiB DRAM
> - Image SHA (quay.io): e0543089a9e9cae97999761059eaccdf6bb22e9e
> - Configuration parameters:
>    osd_memory_target = 34359738368
>    crimson_osd_scheduler_concurrency = 0
>    seastore_max_concurrent_transactions = 16
>    crimson_osd_obc_lru_size = 8192
>    seastore_cache_lru_size = 16G
>    seastore_obj_data_write_amplification = 4
>    seastore_journal_batch_capacity = 1024
>    seastore_journal_batch_flush_size = 256M
>    seastore_journal_iodepth_limit = 16
>    seastore_journal_batch_preferred_fullness = 0.8
>    seastore_segment_size = 128M
>    seastore_device_size = 512G
>    seastore_block_create = true
>    seastore_default_object_metadata_reservation = 1073741824
>    rbd_cache = false
>    rbd_cache_writethrough_until_flush = true
>    rbd_op_threads = 16
> 
> Replication policy:
> - 4096 PGs, no replication (only 1 copy)
> 
> Test Results:
> 
> 1 SSD test (varying number of allocated CPUs, alien threads = 26-29, 36-39):
> num CPU | 4k randread | 4k randwrite | Allocated CPU sets
> 2       | 126772      | 14830        | 0-1
> 4       | 107860      | 16451        | 0-3
> 6       | 113741      | 17019        | 0-5
> 8       | 132060      | 16099        | 0-7
> 
> SSD scaling test (2 CPUs per SSD):
> OSD CPU mapping: OSD.0 (0-1), OSD.1 (10-11), OSD.2 (2-3), OSD.3 (12-13),
> ..., OSD.15 (34-35), Alien threads (26-29, 36-39)
> num SSD | 4k randread | 4k randwrite
> 4       | 861273      | 22360
> 8       | 1022793     | 22786
> 12      | 1019161     | 21211
> 16      | 927570      | 20502
> 
> SSD scaling test (1 CPU per SSD):
> OSD CPU mapping: OSD.0 (0), OSD.1 (10), OSD.2 (2), OSD.3 (12), ..., OSD.15
> (24), Alien CPUs: 1, 11, 3, 13, ..., 15, 25
> num SSD | 4k randread | 4k randwrite
> 4       | 936685      | 13730
> 8       | 1048204     | 18259
> 12      | 922727      | 23078
> 16      | 987838      | 30792
> 
> Questions:
> 1. Since Seastore is still under active development, are there any known
> unresolved performance issues that could explain this scaling behavior?
> 2. Are there recommended tuning parameters for improving small-block read
> scalability in multi-SSD configurations?
> 3. Regarding alien threads, are there best practices for CPU pinning or
> NUMA-aware placement that have shown measurable improvements?
> 4. Any additional guidance for maximizing IOPS with Crimson OSD + Seastore
> would be greatly appreciated.
> 
> My goal is to be ready to switch from BlueStore to Crimson + Seastore after
> it becomes stable and shows reasonable performance compared to BlueStore,
> so I’d like to understand the current limitations and tuning opportunities.
> 
> Thank you,
> Ki-taek Lee
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to