Re: [ceph-users] Thoughts on rocksdb and erasurecode

Christian Wuerdig Wed, 26 Jun 2019 13:35:17 -0700

Hm, according to https://tracker.ceph.com/issues/24025 snappy compression
should be available out of the box at least since luminous. What ceph
version are you running?


On Wed, 26 Jun 2019 at 21:51, Rafał Wądołowski <[email protected]>
wrote:

> We changed these settings. Our config now is:
>
> bluestore_rocksdb_options =
> "compression=kSnappyCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=3,recycle_log_file_num=16,compaction_style=kCompactionStyleLevel,write_buffer_size=50331648,target_file_size_base=50331648,max_background_compactions=31,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,num_levels=5,max_bytes_for_level_base=603979776,max_bytes_for_level_multiplier=10,compaction_threads=32,flusher_threads=8"
>
> It could be changed without redeploy. It changes the sst files, when
> compaction is triggered. The additional improvement is Snappy compression.
> We rebuild ceph with support for it. I can create PR with it, if you want :)
>
>
> Best Regards,
>
> Rafał Wądołowski
> Cloud & Security Engineer
>
> On 25.06.2019 22:16, Christian Wuerdig wrote:
>
> The sizes are determined by rocksdb settings - some details can be found
> here: https://tracker.ceph.com/issues/24361
> One thing to note, in this thread
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030775.html
> it's noted that rocksdb could use up to 100% extra space during compaction
> so if you want to avoid spill over during compaction then safer values
> would be 6/60/600 GB
>
> You can change max_bytes_for_level_base and max_bytes_for_level_multiplier
> to suit your needs better but I'm not sure if that can be changed on the
> fly or if you have to re-create OSDs in order to make them apply
>
> On Tue, 25 Jun 2019 at 18:06, Rafał Wądołowski <[email protected]>
> wrote:
>
>> Why are you selected this specific sizes? Are there any tests/research on
>> it?
>>
>>
>> Best Regards,
>>
>> Rafał Wądołowski
>>
>> On 24.06.2019 13:05, Konstantin Shalygin wrote:
>>
>> Hi
>>
>> Have been thinking a bit about rocksdb and EC pools:
>>
>> Since a RADOS object written to a EC(k+m) pool is split into several
>> minor pieces, then the OSD will receive many more smaller objects,
>> compared to the amount it would receive in a replicated setup.
>>
>> This must mean that the rocksdb will also need to handle this more
>> entries, and will grow faster. This will have an impact when using
>> bluestore for slow HDD with DB on SSD drives, where the faster growing
>> rocksdb might result in spillover to slow store - if not taken into
>> consideration when designing the disk layout.
>>
>> Are my thoughts on the right track or am I missing something?
>>
>> Has somebody done any measurement on rocksdb growth, comparing replica
>> vs EC ?
>>
>> If you want to be not affected on spillover of block.db - use 3/30/300 GB
>> partition for your block.db.
>>
>>
>>
>> k
>>
>> _______________________________________________
>> ceph-users mailing 
>> [email protected]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Thoughts on rocksdb and erasurecode

Reply via email to