[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Sylvain Lebresne (Commented) (JIRA) Fri, 06 Jan 2012 00:43:31 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181200#comment-13181200
 ]


Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq. That, and "I want to cache a specific set of known-ahead-of-time columns 
[maybe the entire row]," which is what today's row cache is mostly used for.

That is trivially handled by the filter-per-cf approach I'm advocating, 
contrarily to the query cache solution. 

bq. I think it's a huge, huge win for a design to be able to handle both of 
these, without requiring it to be specified in the schema.

Again, I really don't think specifying it in the schema is such a big deal in 
that case (I insist on the "in that case", I'm *not* pretending hand-tuning is 
never a big deal), nor does it feel a hard one to get right.

Now don't get me wrong, I agree that self-tuning is great, but only if we know 
how to do it correctly. Typically, and to refer to some ideas above, I think 
that if users have to think about what query they should do to have good 
caching (like using select * when really they want select x, y but want to keep 
the full row in cache, or to be careful that if they use too many different 
queries for a given row it won't play well with the cache), then 1) it's still 
hand-tuning and 2) one that is imo far less convenient/intuitive.

Basically what I'm saying is that with a query cache, I see a number of 
unknowns, of added difficulties (what about the space taken by all those filter 
per query? how do we make sure to cache the full row when it's the right thing 
to do without any user intervention? etc...) and of cases where it will be less 
efficient that the filter-per-cf alternative unless the user is super careful 
(will that be a problem in real life ? maybe not, but maybe). On the other 
side, adding a simple per-cf filter is a nice simple increment over what we 
have and we stay in known territory while solving the problem we want to solve.

Besides, if specifying a filter with the schema is that much of a problem, 
maybe we can do that choice automatically. We have stats on the rows avg and 
max size, and we can easily start gathering some simple stats on queries, at 
least enough to be able to say if it's the head or tail that we need to keep in 
cache for wide rows. Though honestly, even if we do that, my preference would 
largely go to still allow the user to override whatever automatic choice we 
came up with if they wish so.


                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 
> 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
> 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. 
> We currently have to warn against using the row cache with wide rows, where 
> the read pattern is typically a peek at the head, but this usecase would be 
> perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is 
> likely to have some gotchas for weird usage patterns, and it requires the 
> list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a 
> secondary index to lookup cache entries by rowkey so that you can keep them 
> in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Reply via email to