[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204371#comment-13204371
 ] 

Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------

bq.  it should also solve the problems which we are discussing in this ticket

What are those?

I'd like us to be a little scientific on that issue. What is it we are trying 
to do in the first place? My take on that (and please feel free to correct me 
if I'm missing something) is that the kind of caching that I can really see 
useful in practice are:
# Caching a row entirely; that's what we do and I think we agree we should keep 
that feature because sometimes that's what you want.
# Caching the head or the tail of a row for wide rows.
# I could also imagine cases where you want to only pin a few columns (by name) 
into the cache without keeping the row entirely.

And well, that's it. I try to think of other type of (not far fetched 
hypothetical) workload where caching could be a notable win but are not handled 
by the 3 cases above and I don't really find one. Now I apparently am stupid 
and miss 90% of situations since:

bq. but I see a true query cache as being better than the row cache in 90% of 
situations

because the 3 cases above are perfectly handled by the idea of just adding a 
filter per-cf to our current row cache (which btw could easily be extended to 
2-3 filters per-cf if that proves necessary). So please let's share those cases 
that are not above and that we want to handle as part of this ticket.

But if what's above does sum up the problem we want to solve, then I continue 
to think that simply adding a per-cf filter alongside our current row cache is 
the best solution:
* there is *no* memory overhead.
* all 3 caching use case above are handled without any drawback that I can 
think of.
* it's an incremental change of the existing, not a completely new thing, thus 
lowering then risk of introducing new bugs. Typically, I can easily see how 
CASSANDRA-3862 will translate to that solution; but I suspect thing may get 
more complicated for say a query cache.

The only criticism that I've seen so far on that solution is the question of 
the user configuration of the cache, while for the query cache there wouldn't 
be a configuration (which remains to be proven btw if we want to support the 
'stick a row entirely in cache always' case). If someone consider that 
auto-configuration should be an absolute priority then let's discuss that, 
because I disagree with that (to sum up, I think any auto-configuration of 
caches will have drawbacks so I think users should be able to override the 
default and so I think it's more sane to start with a cache that user can make 
do what they want and then evaluate how to make that configuration mostly 
automatic, which I think can be done).

So before considering other solutions, I'd like to understand first more 
clearly why we're discarding that per-cf filter idea. Because currently it 
seems to strike a pretty nice balance of fixing what seems to be the problem 
versus the added complexity.
                
> Convert row cache to row+filter cache
> -------------------------------------
>
>                 Key: CASSANDRA-1956
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: 0001-1956-cache-updates-v0.patch, 
> 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
> 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
> 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful. 
> We currently have to warn against using the row cache with wide rows, where 
> the read pattern is typically a peek at the head, but this usecase would be 
> perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is 
> likely to have some gotchas for weird usage patterns, and it requires the 
> list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a 
> secondary index to lookup cache entries by rowkey so that you can keep them 
> in sync with the memtable
> * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to