Re: [ceph-users] RADOS + deep scrubbing performance issues in production environment

Guang Mon, 03 Feb 2014 05:40:45 -0800

+ceph-users.

Does anybody have the similar experience of scrubbing / deep-scrubbing?


Thanks,
Guang

On Jan 29, 2014, at 10:35 AM, Guang <[email protected]> wrote:

> Glad to see there are some discussion around scrubbing / deep-scrubbing.
> 
> We are experiencing the same that scrubbing could affect latency quite a bit 
> and so far I found two slow patterns (dump_historic_ops): 1) waiting from 
> being dispatched 2) waiting in the op working queue to be fetched by an 
> available op thread. For the first slow pattern, it looks like there is lock 
> (as dispatcher stop working for 2 seconds and then resume, same for scrubber 
> thread), that needs further investigation. For the second slow pattern, as 
> scrubbing brings more ops (for scrubbing check), that make the op thread's 
> work load increase (client op has a lower priority), I think that could be 
> improved by increasing the op thread number, I will confirm this analysis by 
> adding more op threads and turn on scrubbing on OSD basis.
> 
> Does the above observation and analysis make sense?
> 
> Thanks,
> Guang
> 
> On Jan 29, 2014, at 2:13 AM, Filippos Giannakos <[email protected]> wrote:
> 
>> On Mon, Jan 27, 2014 at 10:45:48AM -0800, Sage Weil wrote:
>>> There is also 
>>> 
>>> ceph osd set noscrub
>>> 
>>> and then later
>>> 
>>> ceph osd unset noscrub
>>> 
>>> I forget whether this pauses an in-progress PG scrub or just makes it stop 
>>> when it gets to the next PG boundary.
>>> 
>>> sage
>> 
>> I bumped into those settings but I couldn't find any documentation about 
>> them.
>> When I first tried them, they didn't do anything immediately, so I thought 
>> they
>> weren't the answer. After your mention, I tried them again, and after a while
>> the deep-scrubbing stopped. So I'm guessing they stop scrubbing on the next 
>> PG
>> boundary.
>> 
>> I see from this thread and others before, that some people think it is a 
>> spindle
>> issue. I'm not sure that it is just that. Replicating it to an idle cluster 
>> that
>> can do more than 250MiB/seconds and pausing for 4-5 seconds on a single 
>> request,
>> sounds like an issue by itself. Maybe there is too much locking or not enough
>> priority to the actual I/O ? Plus, that idea of throttling deep scrubbing 
>> based
>> on the iops sounds appealing.
>> 
>> Kind Regards,
>> -- 
>> Filippos
>> <[email protected]>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to [email protected]
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RADOS + deep scrubbing performance issues in production environment

Reply via email to