Sage Weil wrote:
> [Adding ceph-devel]
>
> On Mon, 21 Jul 2014, Wang, Zhiqiang wrote:
>> Sage,
>>
>> I agree with you that promotion on the 2nd read could improve cache
>> tiering's performance for some kinds of workloads. The general idea here
>> is to implement some kinds of policies in the cache tier to measure the
>> warmness of the data. If the cache tier is aware of the data warmness,
>> it could even initiate data movement between the cache tier and the base
>> tier. This means data could be prefetched into the cache tier before
>> reading or writing. But I think this is something we could do in the
>> future.
>
> Yeah. I suspect it will be challenging to put this sort of prefetching
> intelligence directly into the OSDs, though. It could possibly be done by
> an external agent, maybe, or could be driven by explicit hints from
> clients ("I will probably access this data soon").
>
>> The 'promotion on 2nd read' policy is straightforward. Sure it will
>> benefit some kinds of workload, but not all. If it is implemented as a
>> cache tier option, the user needs to decide to turn it on or not. But
>> I'm afraid most of the users don't have the idea of this. This increases
>> the difficulty of using cache tiering.
>
> I suspect the 2nd read behavior will be something we'll want to do by
> default... but yeah, there will be a new pool option (or options) that
> controls the behavior.
>
>> One question for the implementation of 'promotion on 2nd read': what do
>> we do for the 1st read? Does the cache tier read the object from base
>> tier but not doing replication, or just redirecting it?
>
> For the first read, we just redirect the client. The on the second read,
> we call promote_object(). See maybe_handle_cache() in ReplicatedPG.cc.
> We can pretty easily tell the difference by checking the in-memory HitSet
> for a match.
>
> Perhaps the option in the pool would be something like
> min_read_recency_for_promote? If we measure "recency" as "(avg) seconds
> since last access" (loosely), 0 would mean it would promote on first read,
> and anything <= the HitSet interval would mean promote if the object is in
> the current HitSet. > than that would mean we'd need to keep additional
> previous HitSets in RAM.
>
> ...which leads us to a separate question of how to describe access
> frequency vs recency. We keep N HitSets, each covering a time period of T
> seconds. Normally we only keep the most recent HitSet in memory, unless
> the agent is active (flushing data). So what I described above is
> checking how recently the last access was (within how many multiples of T
> seconds). Additionally, though, we could describe the frequency of
> access: was the object accesssed at least once in every N interval of T
> seconds? Or some fraction of them? That is probably best described as
> "temperature?" I'm not to fond of the term "recency," tho I can't
> think of anything better right now.
>
> Anyway, for the read promote behavior, recency is probably sufficient, but
> for the tiering agent flush/evict behavior temperature might be a good
> thing to consider...
>
> sage
It might be worth looking at the MQ (Multi-Queue) caching policy[1], which
was explicitly designed for second-level caches (which applies here) - the
client is very likely to be doing caching, whether they use CephFS
(FSCache), RBD (client caching), or RADOS (application-level); that causes
some interesting changes in terms of the statistical behavior of the second-
level cache.
[1]
https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou_html/node9.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html