[
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884030#comment-13884030
]
Sylvain Lebresne commented on CASSANDRA-5357:
---------------------------------------------
A bunch of rermaks on the attached patch:
* I don't think we can count CQL rows as the patch does. Namely, we can't rely
on 'the number of cells' * 'the number of declared CQL columns' as being the
number of CQL rows, since many rows may not have all columns set. Typically,
the cacheFilter in getThroughCache might finish in the middle of a CQL row
(since it blindly asks for cells, not CQL rows) which would be bad. Also, for
the same reason, cacheFilter might fetch less CQL rows than requested by the
user when we use it to query data. Lastly, 'the number of declared CQL columns'
can change at runtime, which would broke the cache (for that last one, we could
invalidate the cache when new columns are added/removed, but it's better if we
don't have to).
So anyway, we really should always count CQL rows, which implies using
SliceQueryFilter.lastCounted() and ColumnCounter. This does mean that unless we
actually store the row count with each cached CF object, we might have to
re-count every time we have a cache hit, but this is probably ok for now (we
can optimize later).
* In CFS.getThroughCache, on a cache hit, I believe the condition to return the
cached value should be something like:
{noformat}
if (wholePartitionCached || (filter.isHeadFilter() && filter.filter.getCount()
* (metadata.regularColumns().size() + 1) <= cachedCf.getColumnCount()))
{noformat}
i.e. unless the whole partition is cached we should not return the cached value
for anything other than a head filter.
* In CFS.getThroughCache, on a cache miss and in the case where
{{filter.filter.getCount() >
metadata.getRowsPerPartitionToCache().rowsToCache}}, we shouldn't cache the
result if the filter 'finish' is not empty and the query has yield less rows
than the cache capacity.
* From a naming point of view, I'll note that rows_per_partition_to_cache is
probably confusing on the thrift side. Maybe for thrift it's worth naming it
something like columns_per_rows_to_cache. It won't be entirely correct
technically because internally we'll store a number of CQL rows per partitions,
but in most cases this will be the same thing anyway so it's maybe fine.
* For SliceQueryFilter.isHeadFilter, we want to exclude queries with multiple
ColumnSlice for now, and while the implementation of the patch does that, it
looks more like a side effect than the actual intent. I think something like
{noformat}
public boolean isHeadFilter()
{
return slices.length == 1 && slice[0].start.isEmpty();
}
{noformat}
would make the intent more clear.
* In RowIteratorFactory, we could keep using the cache if we cache ALL. Or even
really if the cache is big enough to satisfy the filter, like we do in
getThroughCache.
* In ColumnFamilyMetrics, any reason why only rowCacheHitOutOfRange is removed
on release()?
* Nit: seems like shouldCache() in NamesQueryFilter is not used.
As a side note, maybe this is a good time to rename the 'row cache' into
'partition cache' (I'm thinking options in cassandra.yaml, metrics names, ...)?
Could be done in a separate issue though.
> Query cache / partition head cache
> ----------------------------------
>
> Key: CASSANDRA-5357
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Jonathan Ellis
> Assignee: Marcus Eriksson
> Fix For: 2.1
>
> Attachments: 0001-Cache-a-configurable-amount-of-columns.patch
>
>
> I think that most people expect the row cache to act like a query cache,
> because that's a reasonable model. Caching the entire partition is, in
> retrospect, not really reasonable, so it's not surprising that it catches
> people off guard, especially given the confusion we've inflicted on ourselves
> as to what a "row" constitutes.
> I propose replacing it with a true query cache.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)