[ceph-users] Re: Performance issues RGW (S3)

Anthony D'Atri Mon, 10 Jun 2024 08:43:29 -0700

>>> - 20 spinning SAS disks per node.
>> Don't use legacy HDDs if you care about performance.
> 
> You are right here, but we use Ceph mainly for RBD. It performs 'good enough' 
> for our RBD load.

You use RBD for archival?


>>> - Some nodes have 256GB RAM, some nodes 128GB.
>> 128GB is on the low side for 20 OSDs.
> 
> Agreed, but with 20 OSD's x osd_memory_target 4GB (80GB) it is enough. We 
> haven't had any server that OOM'ed yet.

Remember that's a *target* not a *limit*.  Say one or more of your failure 
domains goes offline or you have some other large topology change.  Your OSDs 
might want up to 2x osd_memory_target, then you OOM and it cascades.  I've been 
there, had to do an emergency upgrade of 24xOSD nodes from 128GB to 192GB.

>>> - CPU varies between Intel E5-2650 and Intel Gold 5317.
>> E5-2650 is underpowered for 20 OSDs.  5317 isn't the ideal fit, it'd make a 
>> decent MDS system but assuming a dual socket system, you have ~2 threads per 
>> OSD, which is maybe acceptable for HDDs, but I assume you have mon/mgr/rgw 
>> on some of them too.
> 
> The (CPU) load on the OSD nodes is quite low. Our MON/MGR/RGW aren't hosted 
> on the OSD nodes and are running on modern hardware.

You didn't list additional nodes so I assumed.  You might still do well to have 
a larger number of RGWs, wherever they run.  RGWs often scale better 
horizontally than vertically.

> 
>> rados bench is a useful for smoke testing but not always a reflection of E2E 
>> experience.
>>> Unfortunately not getting the same performance with Rados Gateway (S3).
>>> - 1x HAProxy with 3 backend RGW's.
>> Run an RGW on every node.
> 
> On every OSD node?

Yep, why not?


>>> I am using Minio Warp for benchmarking (PUT). I am 1 Warp server and 5 Warp 
>>> clients. Benchmarking towards the HAProxy.
>>> Results:
>>> - Using 10MB object size, I am hitting the 10Gbit/s link of the HAProxy 
>>> server. Thats good.
>>> - Using 500K object size, I am getting a throughput of 70 up to 150 MB/s 
>>> with 140 up to 300 obj/s.
>> Tiny objects are the devil of any object storage deployment.  The HDDs are 
>> killing you here, especially for the index pool.  You might get a bit better 
>> by upping pg_num from the party line.
> 
> I would expect high write await times, but all OSD/disks have write await 
> times of 1 ms up to 3 ms.

There are still serializations in the OSD and PG code.  You have 240 OSDs, does 
your index pool have *at least* 256 PGs?


> 
>> You might also disable Nagle on the RGW nodes.
> 
> I need to lookup what that exactly is and does.
> 
>>> It depends on the concurrency setting of Warp.
>>> It look likes the objects/s is the bottleneck, not the throughput.
>>> Max memory usage is about 80-90GB per node. CPU's are quite idling.
>>> Is it reasonable to expect more IOps / objects/s for RGW with my setup? At 
>>> this moment I am not able to find the bottleneck what is causing the low 
>>> obj/s.
>> HDDs are a false economy.
> 
> Got it :)
> 
>>> Ceph version is 15.2.
>>> Thanks!
>>> _______________________________________________
>>> ceph-users mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
[ceph-users] Re: Performance issues RGW (S3)

Reply via email to