[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832901#comment-13832901
 ] 

Rick Branson commented on CASSANDRA-5357:
-----------------------------------------

Perhaps an anecdote from a production system might help find a simple, yet 
useful improvement to the row cache. Facebook's TAO distributed storage system 
supports a data model called "assocs" which are basically just graph edges, and 
nodes assigned to a given assoc ID hold a write-through cache of the state. The 
assoc storage can be roughly considered a more use-case specific CF. For large 
assocs with many thousands of edges, TAO only maintains the tail of the assoc 
in memory, as those tend to be the most "interesting" portions of data. More of 
the details are discussed in the linked paper[1].

Perhaps instead of a total overhaul, what's really needed to evolve the row 
cache by modifying it to only cache the head of the row and it's bounds. In 
contrast to the complexity of trying to match queries & mutations to a set of 
serialized query filter objects, the cache only needs to maintain one interval 
for each row at most. This would provide a very simple write-through story. 
After reviewing our production wide row use cases, they seem to fall into two 
camps. The first and most read-performance sensitive is vastly skewed towards 
reads on the head of the row (>90% of the time) with a fixed limit. The second 
is randomly distributed slice queries which would not seem to provide a very 
good cache hit rate either way.

[1] https://www.usenix.org/conference/atc13/technical-sessions/papers/bronson)

> Query cache
> -----------
>
>                 Key: CASSANDRA-5357
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>
> I think that most people expect the row cache to act like a query cache, 
> because that's a reasonable model.  Caching the entire partition is, in 
> retrospect, not really reasonable, so it's not surprising that it catches 
> people off guard, especially given the confusion we've inflicted on ourselves 
> as to what a "row" constitutes.
> I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to