RE: Cache pool latency impact

Sage Weil Wed, 14 Jan 2015 09:14:40 -0800

On Wed, 14 Jan 2015, Pavan Rallabhandi wrote:
> Thanks for the reply Sage; please ignore the same subject mails on 
> ceph-users, they seem to have got delivered today.
> 
> > Hmm, we could have a 'noagent' option (similar to noout, nobackfill, 
> > noscrub, etc.) that lets the admin tell the system to stop tiering 
> > movements, but I'm not sure that's wht you're asking for...
> 
> Was not aware of 'notieragent' flag but I was hinting at a flow control 
> type of mechanism that would help throttling the client IOs versus the 
> service time of the tiering agent to flush/evict.


There is also

        osd_agent_max_ops = 4

which is a coarse control but may be sufficient for you?

sage



> 

> Thanks,
> -Pavan.
> 
> -----Original Message-----
> From: Sage Weil [mailto:[email protected]]
> Sent: Tuesday, January 13, 2015 7:31 PM
> To: Pavan Rallabhandi
> Cc: Ceph Development
> Subject: Re: Cache pool latency impact
> 
> On Tue, 13 Jan 2015, Pavan Rallabhandi wrote:
> > Hi,
> >
> > This is regarding cache pools and the impact of the flush/evict on the
> > client IO latencies.
> >
> > Am seeing a direct impact on the client IO latencies (making them
> > worse) when flush/evict is triggered on the cache pool. In a constant
> > ingress of IOs on the cache pool, the write performance is no better
> > than without cache pool, because it is limited to the speed at which
> > objects can be flushed/evicted to the backend pool.
> 
> Yeah, this is always going to be true in general.  It is a lot for work to 
> write into the cache, read it back, write it again into the base pool, and 
> then delete it from the cache than it is to write directly to the base pool.
> 
> > > The questions I have are:
> >
> > 1) When the flush/evict is in progress, are the writes on the cache
> > pool blocked, either at the PG or at object granularity? Though I see
> > a blocking flag honored per object context in
> > ReplicatedPG::start_flush() and most of the callers seem to set the flag to 
> > be false.
> 
> Normally they are not blocked.  The agent starts working (finding objects to 
> flush or evict) long before we hit the cut cutoff where it starts blocking.  
> Once it does hit that threshold, though, things can get slow, because new 
> cache creates aren't allowed until some eviction completes.  You don't want 
> to be in this situation.  :)
> 
> In general, if you have a lot of data inject, caching (at least in
> firefly) isn't a terribly good idea.  The exception would probably be when 
> you have a high skew toward recent data (say you are injecting market data, 
> and do tons of analytics on the last 24 hours, but then the data gets colder).
> 
> I can't tell if you're in the situation where the cache pool is full and the 
> agent is flushing/evicing anything and everything and writes are crawling 
> (you should see a message in 'ceph health' when this happens) or that the 
> agent is alive but working with low effort and the impact is still high.  If 
> it's the latter I'm not sure yet what is going wrong..
> perhaps you can capture a few minutes of log from one of your OSDs?
> (debug ms = 1, debug osd = 20).
> 
> > 2) Is there any mechanism (that I might have overlooked) to avoid this
> > situation, by throttling the flush/evict operations on the fly? If
> > not, shouldn't there be one?
> 
> Hmm, we could have a 'noagent' option (similar to noout, nobackfill, noscrub, 
> etc.) that lets the admin tell the system to stop tiering movements, but I'm 
> not sure that's wht you're asking for...
> 
> sage
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail message is 
> intended only for the use of the designated recipient(s) named above. If the 
> reader of this message is not the intended recipient, you are hereby notified 
> that you have received this message in error and that any review, 
> dissemination, distribution, or copying of this message is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender by telephone or e-mail (as shown above) immediately and destroy 
> any and all copies of this message in your possession (whether hard copies or 
> electronically stored copies).
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Cache pool latency impact

Reply via email to