Hello Ceph community, I am evaluating Crimson OSD + Seastore performance for potential deployment in a distributed storage environment. With BlueStore, I have been able to achieve satisfying performance levels in my FIO tests for 4K random read/write IOPS.
However, when testing Crimson OSD + Seastore, I observed that 4K random read/write IOPS do not scale as expected when increasing the number of SSDs/OSDs. The performance plateaus beyond a certain point or is much lower than expected. (See attached test results.) Test Environment: - Cluster: 8 clients, 1 OSD - Hardware: 40-core CPUs, 377 GiB DRAM - Image SHA (quay.io): e0543089a9e9cae97999761059eaccdf6bb22e9e - Configuration parameters: osd_memory_target = 34359738368 crimson_osd_scheduler_concurrency = 0 seastore_max_concurrent_transactions = 16 crimson_osd_obc_lru_size = 8192 seastore_cache_lru_size = 16G seastore_obj_data_write_amplification = 4 seastore_journal_batch_capacity = 1024 seastore_journal_batch_flush_size = 256M seastore_journal_iodepth_limit = 16 seastore_journal_batch_preferred_fullness = 0.8 seastore_segment_size = 128M seastore_device_size = 512G seastore_block_create = true seastore_default_object_metadata_reservation = 1073741824 rbd_cache = false rbd_cache_writethrough_until_flush = true rbd_op_threads = 16 Replication policy: - 4096 PGs, no replication (only 1 copy) Test Results: 1 SSD test (varying number of allocated CPUs, alien threads = 26-29, 36-39): num CPU | 4k randread | 4k randwrite | Allocated CPU sets 2 | 126772 | 14830 | 0-1 4 | 107860 | 16451 | 0-3 6 | 113741 | 17019 | 0-5 8 | 132060 | 16099 | 0-7 SSD scaling test (2 CPUs per SSD): OSD CPU mapping: OSD.0 (0-1), OSD.1 (10-11), OSD.2 (2-3), OSD.3 (12-13), ..., OSD.15 (34-35), Alien threads (26-29, 36-39) num SSD | 4k randread | 4k randwrite 4 | 861273 | 22360 8 | 1022793 | 22786 12 | 1019161 | 21211 16 | 927570 | 20502 SSD scaling test (1 CPU per SSD): OSD CPU mapping: OSD.0 (0), OSD.1 (10), OSD.2 (2), OSD.3 (12), ..., OSD.15 (24), Alien CPUs: 1, 11, 3, 13, ..., 15, 25 num SSD | 4k randread | 4k randwrite 4 | 936685 | 13730 8 | 1048204 | 18259 12 | 922727 | 23078 16 | 987838 | 30792 Questions: 1. Since Seastore is still under active development, are there any known unresolved performance issues that could explain this scaling behavior? 2. Are there recommended tuning parameters for improving small-block read scalability in multi-SSD configurations? 3. Regarding alien threads, are there best practices for CPU pinning or NUMA-aware placement that have shown measurable improvements? 4. Any additional guidance for maximizing IOPS with Crimson OSD + Seastore would be greatly appreciated. My goal is to be ready to switch from BlueStore to Crimson + Seastore after it becomes stable and shows reasonable performance compared to BlueStore, so I’d like to understand the current limitations and tuning opportunities. Thank you, Ki-taek Lee _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io