[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884030#comment-13884030
 ] 

Sylvain Lebresne commented on CASSANDRA-5357:
---------------------------------------------

A bunch of rermaks on the attached patch:
* I don't think we can count CQL rows as the patch does. Namely, we can't rely 
on 'the number of cells' * 'the number of declared CQL columns' as being the 
number of CQL rows, since many rows may not have all columns set. Typically, 
the cacheFilter in getThroughCache might finish in the middle of a CQL row 
(since it blindly asks for cells, not CQL rows) which would be bad. Also, for 
the same reason, cacheFilter might fetch less CQL rows than requested by the 
user when we use it to query data. Lastly, 'the number of declared CQL columns' 
can change at runtime, which would broke the cache (for that last one, we could 
invalidate the cache when new columns are added/removed, but it's better if we 
don't have to).
So anyway, we really should always count CQL rows, which implies using 
SliceQueryFilter.lastCounted() and ColumnCounter. This does mean that unless we 
actually store the row count with each cached CF object, we might have to 
re-count every time we have a cache hit, but this is probably ok for now (we 
can optimize later).
* In CFS.getThroughCache, on a cache hit, I believe the condition to return the 
cached value should be something like:
{noformat}
if (wholePartitionCached || (filter.isHeadFilter() && filter.filter.getCount() 
* (metadata.regularColumns().size() + 1)  <= cachedCf.getColumnCount()))
{noformat}
i.e. unless the whole partition is cached we should not return the cached value 
for anything other than a head filter.
* In CFS.getThroughCache, on a cache miss and in the case where 
{{filter.filter.getCount() > 
metadata.getRowsPerPartitionToCache().rowsToCache}}, we shouldn't cache the 
result if the filter 'finish' is not empty and the query has yield less rows 
than the cache capacity.
* From a naming point of view, I'll note that rows_per_partition_to_cache is 
probably confusing on the thrift side.  Maybe for thrift it's worth naming it 
something like columns_per_rows_to_cache.  It won't be entirely correct 
technically because internally we'll store a number of CQL rows per partitions, 
but in most cases this will be the same thing anyway so it's maybe fine.
* For SliceQueryFilter.isHeadFilter, we want to exclude queries with multiple 
ColumnSlice for now, and while the implementation of the patch does that, it 
looks more like a side effect than the actual intent. I think something like
{noformat}
public boolean isHeadFilter()
{
    return slices.length == 1 && slice[0].start.isEmpty();
}
{noformat}
would make the intent more clear.
* In RowIteratorFactory, we could keep using the cache if we cache ALL. Or even 
really if the cache is big enough to satisfy the filter, like we do in 
getThroughCache.
* In ColumnFamilyMetrics, any reason why only rowCacheHitOutOfRange is removed 
on release()?
* Nit: seems like shouldCache() in NamesQueryFilter is not used.

As a side note, maybe this is a good time to rename the 'row cache' into 
'partition cache' (I'm thinking options in cassandra.yaml, metrics names, ...)? 
Could be done in a separate issue though.



> Query cache / partition head cache
> ----------------------------------
>
>                 Key: CASSANDRA-5357
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>             Fix For: 2.1
>
>         Attachments: 0001-Cache-a-configurable-amount-of-columns.patch
>
>
> I think that most people expect the row cache to act like a query cache, 
> because that's a reasonable model.  Caching the entire partition is, in 
> retrospect, not really reasonable, so it's not surprising that it catches 
> people off guard, especially given the confusion we've inflicted on ourselves 
> as to what a "row" constitutes.
> I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to