Re: OSD suicide after being down/in for one day as it needs to search large amount of objects

Sage Weil Wed, 20 Aug 2014 08:20:58 -0700

On Wed, 20 Aug 2014, Guang Yang wrote:
> Thanks Greg.
> On Aug 20, 2014, at 6:09 AM, Gregory Farnum <[email protected]> wrote:
> 
> > On Mon, Aug 18, 2014 at 11:30 PM, Guang Yang <[email protected]> wrote:
> >> Hi ceph-devel,
> >> David (cc?ed) reported a bug (http://tracker.ceph.com/issues/9128) which 
> >> we came across in our test cluster during our failure testing, basically 
> >> the way to reproduce it was to leave one OSD daemon down and in for a day, 
> >> at the same time, keep giving write traffic. When the OSD daemon was 
> >> started again, it hit suicide timeout and kill itself.
> >> 
> >> After some analysis (details in the bug), David found that the op thread 
> >> was busy searching for missing objects and once the volume to search 
> >> increase, the thread is expected to work that long time, please refer to 
> >> the bug for detailed logs.
> > 
> > Can you talk a little more about what's going on here? At a quick
> > naive glance, I'm not seeing why leaving an OSD down and in should
> > require work based on the amount of write traffic. Perhaps if the rest
> > of the cluster was changing mappings??
> We increased the down to out time interval from 5 minutes to 2 days to 
> avoid migrating data back and forth which could increase latency, so 
> that we target to mark OSD out manually. To achieve such, we are testing 
> against some boundary cases to let the OSD down and in for like 1 day, 
> however, when we try to bring it up again, it always failed due to hit 
> the suicide timeout.


Looking at the log snippet I see the PG had log range

        5481'28667,5646'34066

Which is ~5500 log events.  The default max is 10k.  search_for_missing is 
basically going to iterate over this list and check if the object is 
present locally.

If that's slow enough to trigger a suicide (which it seems to be), teh 
fix is simple: as Greg says we just need to make it probe the internel 
heartbeat code to indicate progress.  In most contexts this is done by 
passing a ThreadPool::TPHandle &handle into each method and then 
calling handle.reset_tp_timeout() on each iteration.  The same needs to be 
done for search_for_missing...

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: OSD suicide after being down/in for one day as it needs to search large amount of objects

Reply via email to