Re: Random blocks when accessing rbd images

Stratos Psomadakis Thu, 15 Dec 2011 09:25:26 -0800

On 12/15/2011 06:44 PM, Guido Winkelmann wrote:
> Am Donnerstag, 15. Dezember 2011, 08:30:26 schrieben Sie:
>> 'ceph pg dump' will tell you the status (active/clean/scrubbing/etc)
>> for each pg.  Does the same pg remain in state active+clean+scrubbing
>> for more than 10 minutes?
> Well, I used ceph -s, which only gave me a summary, but there definitely was 
> a 
> PG that was in active+clean+scrubbing for a long time (a lot longer than 10 
> minutes), and remained so until I restarted one of the osds.
>
> Unfortunately I don't know how to reliably reproduce the problem, so I can't 
> check now...
When I hit that bug, I was able to trigger it (more easily) by setting:
    osd scrub max interval = 120
in the [osd] section in ceph.conf, forcing the cluster to send pg scrubs
more often.


Now, if you stress the cluster a bit (some heavy I/O), coupled with
singe OSD restarts, I think you could be able to trigger it.

Btw, I was using the rbd in-kernel driver.

Some info from the debugging I did, I think that at some point after
setting finalizing_scrub = true, it turns out that (last_update_applied
!= info.last_update), but the scrub operation is never requeued by
op_applied for some reason, and so the PG is stuck as scrubbing.

>       Guido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
-- 
Stratos Psomadakis
<[email protected]>

signature.asc
Description: OpenPGP digital signature

Re: Random blocks when accessing rbd images

Reply via email to