[
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832901#comment-13832901
]
Rick Branson commented on CASSANDRA-5357:
-----------------------------------------
Perhaps an anecdote from a production system might help find a simple, yet
useful improvement to the row cache. Facebook's TAO distributed storage system
supports a data model called "assocs" which are basically just graph edges, and
nodes assigned to a given assoc ID hold a write-through cache of the state. The
assoc storage can be roughly considered a more use-case specific CF. For large
assocs with many thousands of edges, TAO only maintains the tail of the assoc
in memory, as those tend to be the most "interesting" portions of data. More of
the details are discussed in the linked paper[1].
Perhaps instead of a total overhaul, what's really needed to evolve the row
cache by modifying it to only cache the head of the row and it's bounds. In
contrast to the complexity of trying to match queries & mutations to a set of
serialized query filter objects, the cache only needs to maintain one interval
for each row at most. This would provide a very simple write-through story.
After reviewing our production wide row use cases, they seem to fall into two
camps. The first and most read-performance sensitive is vastly skewed towards
reads on the head of the row (>90% of the time) with a fixed limit. The second
is randomly distributed slice queries which would not seem to provide a very
good cache hit rate either way.
[1] https://www.usenix.org/conference/atc13/technical-sessions/papers/bronson)
> Query cache
> -----------
>
> Key: CASSANDRA-5357
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Assignee: Vijay
>
> I think that most people expect the row cache to act like a query cache,
> because that's a reasonable model. Caching the entire partition is, in
> retrospect, not really reasonable, so it's not surprising that it catches
> people off guard, especially given the confusion we've inflicted on ourselves
> as to what a "row" constitutes.
> I propose replacing it with a true query cache.
--
This message was sent by Atlassian JIRA
(v6.1#6144)