[ceph-users] Slow Requests when deep scrubbing PGs that hold Bucket Index

Christian Wimmer Tue, 10 Jul 2018 21:31:44 -0700

Hi,

I'm using ceph primarily for block storage (which works quite well) and as
an object gateway using the S3 API.


Here is some info about my system:
Ceph: 12.2.4, OS: Ubuntu 18.04
OSD: Bluestore
6 servers in total, about 60 OSDs, 2TB SSDs each, no HDDs, CFQ scheduler
20 GBit private network
20 GBit public network
Block storage and object storage runs on separate disks

Main use case:
Saving small (30KB - 2MB) objects in rgw buckets.
- dynamic bucket index resharding is disabled for now but I keep the index
objects per shard at about 100k.
- data pool: EC4+2
- index pool: replicated (3)
- atm around 500k objects in each bucket

My problem:
Sometimes, I get "slow request" warnings like so:
"[WRN] Health check update: 7 slow requests are blocked > 32 sec
(REQUEST_SLOW)"

It turned out that these warnings appear, whenever specific PGs are being
deep scrubbed.
After further investigation, I figured out that these PG's hold the bucket
index of the rados gateway.

I already tried some configuration changes like:
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_priority 0'
ceph tell osd.* injectargs '--osd_disk_thread_ioprio_class idle'
ceph tell osd.* injectargs '--osd_scrub_sleep 1';
ceph tell osd.* injectargs '--osd_deep_scrub_stride 1048576'
ceph tell osd.* injectargs '--osd_scrub_chunk_max 1'
ceph tell osd.* injectargs '--osd_scrub_chunk_min 1'

This helped a lot to mitigate the effects but the problem is still there.

Does anybody else have this issue?

I have a few questions to better understand what's going on:

As far as I know, the bucket index is stored in rocksdb and the (empty)
objects in the index pool are just references to the data in rocksdb. Is
that correct?

How does a deep scrub affect rocksdb?
Does the index pool even need deep scrubbing or could I just disable it?

Also:

Does it make sense to create more index shards to get the objects per shard
down to let's say 50k or 20k?

Right now, I have about 500k objects per bucket. I want to increase that
number to a couple of hundred million objects. Do you see any problems with
that, provided that the bucket index is sharded appropriately?

Any help is appreciated. Let me know if you need anything like logs,
configs, etc.

Thanks!

Christian

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Slow Requests when deep scrubbing PGs that hold Bucket Index

Reply via email to