I've just set up a Ceph cluster and I'm accessing it via object gateway with S3 
API.

One thing I don't see documented anywhere is - how does Ceph performance scale 
with S3 key prefixes?

In AWS S3, performance scales linearly with key prefix (see: 
https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html). I 
see the keys as a nested hash table or nodes of a prefix tree, where each 
prefix is stored in closer proximity at a hardware level - you want to spread 
reads evenly over prefixes to avoid parallel I/O being concentrated on the same 
hot spots.

So for example if my access pattern regularly involves scanning data through 
multiple dates for a single city, this key structure will be more effective: 
`yyyymmdd/city/data.csv`. Whereas if my access pattern involves scanning 
through different cities on a single date, `city/yyyymmdd/data.csv` would be 
more effective.

How about Ceph? Does naming convention of the key prefixes have an effect on 
Ceph's object gateway performance or does it treat the full object "paths" as a 
completely flat namespace?
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to