[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Sylvain Lebresne (Commented) (JIRA) Fri, 10 Feb 2012 10:41:24 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205619#comment-13205619
 ]


Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. So for instance, you couldn't cache both oldest entries and newest, from 
the same row in a CF.

Well, as I said earlier, that would just require to have a handful of query 
per-CF rather than just one. Granted, this may blur a little bit the actual 
complexity difference with respect to a query cache, but it's still likely 
simpler and with less overhead.

I think that for a good part what bothers me with a pure query cache is that I 
think picking what to cache is difficult to do automatically, and looking each 
query in isolation (which is what a query cache does) is not necessarily the 
right thing. The typical example is when the right thing to do is to cache the 
whole row while you'll never query the full row (but maybe on part of the code 
query the firstname and lastname, another query the email, another the phone). 
We've mentioned the idea of having the 'cache the full row' as a special case 
but that doesn't sound very convenient. And it makes me wonder if we won't have 
the same problem for other situation, where the query cache actually play 
against you because it just don't see the big picture. While for the user 
usually know that big picture. In any case, what I meant by my previous comment 
is that pining a full row into the cache is imho something we should keep 
(without forcing the user to always query the full row to get that), and that's 
not handle by a true query cache.

{quote}
What I mean by this is that select * from users where birth_date = 1980 is a 
query that people could reasonably want to cache, that we can't fit into your 3 
categories of "full row, head/tail, handful of named columns."

At a more sophisticated stage from that, a "true" query cache could update that 
cached resultsets whenever someone updates the birth_date value to or from 
1980, so the query stays fast without having to be recalculated. (We already 
have the perfect place in the code for this where index maintenance happens in 
Table.apply.)
{quote}

That's a good point. I agree that in that case a query cache is what we want, 
because the query "spans" multiple CF (or in other words the query doesn't map 
directly to what's on disk but compute the result). But I'm still not sold on 
the query cache on direct queries of rows, because of the reasons above. 
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 
> 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
> 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
> 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. 
> We currently have to warn against using the row cache with wide rows, where 
> the read pattern is typically a peek at the head, but this usecase would be 
> perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is 
> likely to have some gotchas for weird usage patterns, and it requires the 
> list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a 
> secondary index to lookup cache entries by rowkey so that you can keep them 
> in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Reply via email to