Thank you Igor,

Yeah the 25K waste per rados object seems reasonable, couple of questions 
though:

1. Is the story of blobs re-using empty sub-sections of already allocated 
"min_alloc_size"ed blocks, just for RBD/FS? I read some blogs about 
onode->extent->blob->min_alloc->pextent->disk flow and how a write smaller than 
min_alloc_size counts as a small write and if any other small write comes by 
later, which is fit to this empty area of the blob, it will use that area; I 
expected this re-use behavior in rados overally.
Is my assumption of how allocation works totally wrong or it just doesn't apply 
for s3? (maybe because object hints are immutable?) and do you know any 
documentation about allocation details? I couldn't find much official data 
about it.

2. We have a ceph cluster that was updated to pacific but the OSDs were from a 
previous octopus cluster with bluestore_min_alloc_size_hdd = 64KB, 
bluestore_prefer_deferred_size_hdd = 65536, and both bluestore_allocator and 
bluefs_allocator are bitmap and OSDs were not re-deployed afterward. We were 
concerned that re-deploying HDD OSDs with bluestore_min_alloc_size_hdd = 4KB 
might cause i/o performance issues since the number of blocks and hence write 
operations will increase. Do you know how it might affect the cluster?

Many thanks
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to