Re: [ceph-users] How to avoid deep-scrubbing performance hit?

Dan Van Der Ster Tue, 10 Jun 2014 02:59:25 -0700

Hi,
I’m just starting to get interested in this topic, since today we’ve found that 
a weekly peak in latency correlates with a bulk (~30) of deep scrubbing PGs.


One idea I had was to check the behaviour under different disk io schedulers, 
trying exploit thread io priorities with cfq. So I have a question for the 
developers about using ionice or ioprio_set to lower the IO priorities of the 
threads responsible for scrubbing:
  - Are there dedicated threads always used for scrubbing only, and never for 
client IOs? If so, can an admin identify the thread IDs so he can ionice those?
  - If OTOH a disk/op thread is switching between scrubbing and client IO 
responsibilities, could Ceph use ioprio_set to change the io priorities on the 
fly??

Cheers, Dan

-- Dan van der Ster || Data & Storage Services || CERN IT Department --


On 10 Jun 2014, at 00:22, Craig Lewis 
<[email protected]<mailto:[email protected]>> wrote:

I've correlated a large deep scrubbing operation to cluster stability problems.

My primary cluster does a small amount of deep scrubs all the time, spread out 
over the whole week.  It has no stability problems.

My secondary cluster doesn't spread them out.  It saves them up, and tries to 
do all of the deep scrubs over the weekend.  The secondary starts loosing OSDs 
about an hour after these deep scrubs start.

To avoid this, I'm thinking of writing a script that continuously scrubs the 
oldest outstanding PG.  In psuedo-bash:
# Sort by the deep-scrub timestamp, taking the single oldest PG
while ceph pg dump | awk '$1 ~ /[0-9a-f]+\.[0-9a-f]+/ {print $20, $21, $1}' | 
sort | head -1 | read date time pg
 do
  ceph pg deep-scrub ${pg}
  while ceph status | grep scrubbing+deep
   do
    sleep 5
  done
  sleep 30
done


Does anybody think this will solve my problem?

I'm also considering disabling deep-scrubbing until the secondary finishes 
replicating from the primary.  Once it's caught up, the write load should drop 
enough that opportunistic deep scrubs should have a chance to run.  It should 
only take another week or two to catch up.
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How to avoid deep-scrubbing performance hit?

Reply via email to