Hi Ki-taek,
The low write performance is a known issue and Sam's actively working on
it afaik. I believe there are some significant changes to the write
path coming, but for the moment it's expected that Seastore is slower
than BlueStore for small random writes.
Out of curiosity, would you mind sharing what benchmark you are using
and how you are invoking it?
Thanks,
Mark
On 8/11/25 23:42, Ki-taek Lee wrote:
Hello Ceph community,
I am evaluating Crimson OSD + Seastore performance for potential deployment
in a distributed storage environment.
With BlueStore, I have been able to achieve satisfying performance levels
in my FIO tests for 4K random read/write IOPS.
However, when testing Crimson OSD + Seastore, I observed that 4K random
read/write IOPS do not scale as expected when increasing the number of
SSDs/OSDs. The performance plateaus beyond a certain point or is much lower
than expected. (See attached test results.)
Test Environment:
- Cluster: 8 clients, 1 OSD
- Hardware: 40-core CPUs, 377 GiB DRAM
- Image SHA (quay.io): e0543089a9e9cae97999761059eaccdf6bb22e9e
- Configuration parameters:
osd_memory_target = 34359738368
crimson_osd_scheduler_concurrency = 0
seastore_max_concurrent_transactions = 16
crimson_osd_obc_lru_size = 8192
seastore_cache_lru_size = 16G
seastore_obj_data_write_amplification = 4
seastore_journal_batch_capacity = 1024
seastore_journal_batch_flush_size = 256M
seastore_journal_iodepth_limit = 16
seastore_journal_batch_preferred_fullness = 0.8
seastore_segment_size = 128M
seastore_device_size = 512G
seastore_block_create = true
seastore_default_object_metadata_reservation = 1073741824
rbd_cache = false
rbd_cache_writethrough_until_flush = true
rbd_op_threads = 16
Replication policy:
- 4096 PGs, no replication (only 1 copy)
Test Results:
1 SSD test (varying number of allocated CPUs, alien threads = 26-29, 36-39):
num CPU | 4k randread | 4k randwrite | Allocated CPU sets
2 | 126772 | 14830 | 0-1
4 | 107860 | 16451 | 0-3
6 | 113741 | 17019 | 0-5
8 | 132060 | 16099 | 0-7
SSD scaling test (2 CPUs per SSD):
OSD CPU mapping: OSD.0 (0-1), OSD.1 (10-11), OSD.2 (2-3), OSD.3 (12-13),
..., OSD.15 (34-35), Alien threads (26-29, 36-39)
num SSD | 4k randread | 4k randwrite
4 | 861273 | 22360
8 | 1022793 | 22786
12 | 1019161 | 21211
16 | 927570 | 20502
SSD scaling test (1 CPU per SSD):
OSD CPU mapping: OSD.0 (0), OSD.1 (10), OSD.2 (2), OSD.3 (12), ..., OSD.15
(24), Alien CPUs: 1, 11, 3, 13, ..., 15, 25
num SSD | 4k randread | 4k randwrite
4 | 936685 | 13730
8 | 1048204 | 18259
12 | 922727 | 23078
16 | 987838 | 30792
Questions:
1. Since Seastore is still under active development, are there any known
unresolved performance issues that could explain this scaling behavior?
2. Are there recommended tuning parameters for improving small-block read
scalability in multi-SSD configurations?
3. Regarding alien threads, are there best practices for CPU pinning or
NUMA-aware placement that have shown measurable improvements?
4. Any additional guidance for maximizing IOPS with Crimson OSD + Seastore
would be greatly appreciated.
My goal is to be ready to switch from BlueStore to Crimson + Seastore after
it becomes stable and shows reasonable performance compared to BlueStore,
so I’d like to understand the current limitations and tuning opportunities.
Thank you,
Ki-taek Lee
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
--
Best Regards,
Mark Nelson
Head of Research and Development
Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com
We are hiring: https://www.clyso.com/jobs/
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io