> -----Original Message-----
> From: Mark Nelson [mailto:[email protected]]
> Sent: 11 May 2016 13:16
> To: Somnath Roy <[email protected]>; Nick Fisk
> <[email protected]>; Ben England <[email protected]>; Kyle Bader
> <[email protected]>
> Cc: Sage Weil <[email protected]>; Samuel Just <[email protected]>; ceph-
> [email protected]
> Subject: Re: Weighted Priority Queue testing
> 
> > 1. First scenario, only 4 node scenario and since it is chassis level
> > replication single node remaining on the chassis taking all the traffic.
> > It seems that is a bottleneck as for the host level replication on the
> > similar setup recovery time is much less (data is not in this table).
> >
> >
> >
> > 2. In the second scenario , I kept everything else same but doubled
> > the node/chassis. Recovery time is also half.
> >
> >
> >
> > 3.  For the third scenario, increased cluster data and also now I have
> > doubled the number of  OSDs in the cluster (since each drive size is
> > 4TB now). Recovery time came down further.
> >
> >
> >
> > 4. Moved to Jewel keeping everything else same, got further
> improvement.
> > Mostly because of improved write performance in jewel (?).
> >
> >
> >
> > 5. Last scenario is interesting. I got improved recovery speed than
> > any other scenario with this WPQ. Degraded PG % came down to 2% within
> > 3 hours , ~0.6% within 4 hours and 15 min , but *last 0.6% took ~4
> > hours* hurting overall time for recovery.
> >
> > 6. In fact, this long tail latency is hurting the overall recovery
> > time for every other scenarios. Related tracker I found is
> > http://tracker.ceph.com/issues/15763
> >
> >
> >
> > Any feedback much appreciated. We can discuss this in tomorrow’s
> > performance call if needed.
> 
> Hi Somnath,
> 
> Thanks for these!  Interesting results.  Did you have a client load going at 
> the
> same time as recovery?  It would be interesting to know how client IO
> performance was affected in each case.  Too bad about the long tail on WPQ.
> I wonder if the long tail is consistently higher with WPQ or it just happened 
> to
> be higher in that test.
> 
> Anyway, thanks for the results!  Glad to see the recovery time in general is
> lower in hammer.

I've also been running with the weighted queue for a week, but testing from 
more of a stability point of view than performance. I've taken a few OSD's out 
and let it recover and I haven't seen any negative effects on our normal 
workloads.

> 
> Mark

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to