[
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204371#comment-13204371
]
Sylvain Lebresne commented on CASSANDRA-1956:
---------------------------------------------
bq. it should also solve the problems which we are discussing in this ticket
What are those?
I'd like us to be a little scientific on that issue. What is it we are trying
to do in the first place? My take on that (and please feel free to correct me
if I'm missing something) is that the kind of caching that I can really see
useful in practice are:
# Caching a row entirely; that's what we do and I think we agree we should keep
that feature because sometimes that's what you want.
# Caching the head or the tail of a row for wide rows.
# I could also imagine cases where you want to only pin a few columns (by name)
into the cache without keeping the row entirely.
And well, that's it. I try to think of other type of (not far fetched
hypothetical) workload where caching could be a notable win but are not handled
by the 3 cases above and I don't really find one. Now I apparently am stupid
and miss 90% of situations since:
bq. but I see a true query cache as being better than the row cache in 90% of
situations
because the 3 cases above are perfectly handled by the idea of just adding a
filter per-cf to our current row cache (which btw could easily be extended to
2-3 filters per-cf if that proves necessary). So please let's share those cases
that are not above and that we want to handle as part of this ticket.
But if what's above does sum up the problem we want to solve, then I continue
to think that simply adding a per-cf filter alongside our current row cache is
the best solution:
* there is *no* memory overhead.
* all 3 caching use case above are handled without any drawback that I can
think of.
* it's an incremental change of the existing, not a completely new thing, thus
lowering then risk of introducing new bugs. Typically, I can easily see how
CASSANDRA-3862 will translate to that solution; but I suspect thing may get
more complicated for say a query cache.
The only criticism that I've seen so far on that solution is the question of
the user configuration of the cache, while for the query cache there wouldn't
be a configuration (which remains to be proven btw if we want to support the
'stick a row entirely in cache always' case). If someone consider that
auto-configuration should be an absolute priority then let's discuss that,
because I disagree with that (to sum up, I think any auto-configuration of
caches will have drawbacks so I think users should be able to override the
default and so I think it's more sane to start with a cache that user can make
do what they want and then evaluate how to make that configuration mostly
automatic, which I think can be done).
So before considering other solutions, I'd like to understand first more
clearly why we're discarding that per-cf filter idea. Because currently it
seems to strike a pretty nice balance of fixing what seems to be the problem
versus the added complexity.
> Convert row cache to row+filter cache
> -------------------------------------
>
> Key: CASSANDRA-1956
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Stu Hood
> Assignee: Vijay
> Priority: Minor
> Fix For: 1.2
>
> Attachments: 0001-1956-cache-updates-v0.patch,
> 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch,
> 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch,
> 0002-add-query-cache.patch
>
>
> Changing the row cache to a row+filter cache would make it much more useful.
> We currently have to warn against using the row cache with wide rows, where
> the read pattern is typically a peek at the head, but this usecase would be
> perfect supported by a cache that stored only columns matching the filter.
> Possible implementations:
> * (copout) Cache a single filter per row, and leave the cache key as is
> * Cache a list of filters per row, leaving the cache key as is: this is
> likely to have some gotchas for weird usage patterns, and it requires the
> list overheard
> * Change the cache key to "rowkey+filterid": basically ideal, but you need a
> secondary index to lookup cache entries by rowkey so that you can keep them
> in sync with the memtable
> * others?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira