[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Sylvain Lebresne (Commented) (JIRA) Thu, 05 Jan 2012 03:42:35 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180297#comment-13180297
 ]


Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

Thinking about this a bit more, I'm not sure I'm convinced by a query cache. Or 
rather, I think that a cache + filter defined with the schema could be much 
simpler and imo likely good enough.

More precisely, with a query cache, when you update a (cached) row, you have to 
check every queries for that row to see if it should be updated. I'm afraid 
there will be cases where this will be inefficient and this will put the burden 
on user to make sure they don't make query that hit those inefficiencies. I'm 
also really not fan of having of putting the burden on user to query full row 
if they want it cached fully for equivalent reasons.

It seems to me that what we want to handle here is exactly the 'cache head or 
tail of row' problem. If so, it seems to me that simply adding a per-cf 
(optional) filter to the cache has the following advantages:
- It handles the head/tail use case, as well as the current cache all row case.
- You don't have to care about the problems I mention above
- There is no in-memory overhead of filters problem. We just keep one filter 
per cf. We can allow more than one filter per-cf in the future *if* that proves 
useful, which I'm not even too sure it will.
- There is no upgrade headache at all, no new cache that use will have to 
switch to, nothing we'd have to deprecate. No new mental model for the user of 
how things are cached, just a imho very natural new option of being able to 
select what part of the row is cached.
- No question of having two solutions. The current cache will just be the case 
were there is no filter configured (or the filter is the identity filter, 
whether "optimize" the no filter case or use the identity filter is really a 
implementation detail).

Now the only downside I could see to that compared to a query cache is the fact 
that you have to define the filter with the schema. I really see this as almost 
anecdotal. Doesn't seem very complicated (and certainly more simple than having 
to change your query to make sure what you want is cached) to write something 
along the line of:
{noformat}
CREATE TABLE timeline (
    userid uuid,
    timestamp time,
    action text,
    PRIMARY KEY (userid, timestamp)
) WITH COMPACT STORAGE AND CACHING FIRST 100;
{noformat}
(note the use of our tentative syntax of CASSANDRA-2474 for wide rows) or even
{noformat}
CREATE TABLE users (
    userid uuid PRIMARY KEY,
    firstname text,
    lastname text,
    age int,
    email text,
    picture binary,
) WITH CACHING (firstname, lastname, email);
{noformat}
if one is so inclined to do that (because he don't want to cache profile 
pictures for instance).

                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 
> 0001-re-factor-row-cache.patch, 0001-row-cache-filter.patch, 
> 0002-1956-updates-to-thrift-and-avro-v0.patch, 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. 
> We currently have to warn against using the row cache with wide rows, where 
> the read pattern is typically a peek at the head, but this usecase would be 
> perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is 
> likely to have some gotchas for weird usage patterns, and it requires the 
> list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a 
> secondary index to lookup cache entries by rowkey so that you can keep them 
> in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

Reply via email to