Re: [ceph-users] stalls caused by scrub on jewel

Dan Jakubiec Fri, 02 Dec 2016 08:16:28 -0800

For what it's worth... this sounds like the condition we hit we re-enabled 
scrub on our 16 OSDs (after 6 to 8 weeks of noscrub).  They flapped for about 
30 minutes as most of the OSDs randomly hit suicide timeouts here and there.


This settled down after about an hour and the OSDs stopped dying.  We have 
since left scrub enabled for about 4 days and have only seen three small spurts 
of OSD flapping since then (which quickly resolved themselves).

-- Dan

> On Dec 1, 2016, at 14:38, Frédéric Nass <frederic.n...@univ-lorraine.fr> 
> wrote:
> 
> Hi Yoann,
> 
> Thank you for your input. I was just told by RH support that it’s gonna make 
> it to RHCS 2.0 (10.2.3). Thank you guys for the fix !
> 
> We thought about increasing the number of PGs just after changing the 
> merge/split threshold values but this would have led to a _lot_ of data 
> movements (1.2 billion of XFS files) over weeks, without any possibility to 
> scrub / deep-scrub to ensure data consistency. Still as soon as we get the 
> fix, we will increase the number of PGs.
> 
> Regards,
> 
> Frederic.
> 
> 
> 
>> Le 1 déc. 2016 à 16:47, Yoann Moulin <yoann.mou...@epfl.ch> a écrit :
>> 
>> Hello,
>> 
>>> We're impacted by this bug (case 01725311). Our cluster is running RHCS 2.0 
>>> and is no more capable to scrub neither deep-scrub.
>>> 
>>> [1] http://tracker.ceph.com/issues/17859
>>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007
>>> [3] https://github.com/ceph/ceph/pull/11898
>>> 
>>> I'm worried we'll have to live with a cluster that can't scrub/deep-scrub 
>>> until March 2017 (ETA for RHCS 2.2 running Jewel 10.2.4).
>>> 
>>> Can we have this fix any sooner ?
>> 
>> As far as I know about that bug, it appears if you have big PGs, a 
>> workaround could be increasing the pg_num of the pool that has the biggest 
>> PGs.
>> 
>> -- 
>> Yoann Moulin
>> EPFL IC-IT
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] stalls caused by scrub on jewel

Reply via email to